Application of Reference - Based Lossless Genome Compression

Main Article Content

Heba Afify

Abstract

Genomic data technology has advanced by using many algorithms that not only facilitate a meaningful analysis of these data but also aid in efficient compression, storage, retrieval, updating, and transmission of huge volumes of the generated data. This has necessitated the development of novel bioinformatics approaches and generic compression tools. In recent years, many efforts have been madeto use a reference genome for storage that based on encode the differences between sequence and the reference genome. We used the difference compression to update the compressed set of similar sequences.In addition, we found that there is similarity degree between different organisms, so we used difference compression to compress data set from two different species. It used to determine which species can compress related to another species, and which reference is appropriate for data set.Results show that the entropy, which is an indicator of the compression efficiency, and a measure of relatedness, is much lower with variable reference that based on minimum entropy than that with the single fixed Cambridge reference sequence. It noted that execution time for encoding huge data set by using Cambridge reference less rather execution time for data set by using entropy to select reference.

Article Details

How to Cite
, H. A. (2015). Application of Reference - Based Lossless Genome Compression. International Journal on Recent and Innovation Trends in Computing and Communication, 3(12), 6503–6506. https://doi.org/10.17762/ijritcc.v3i12.5083
Section
Articles