TY - JOUR
T1 - Trends in genome compression
AU - Wandelt, Sebastian
AU - Bux, Marc
AU - Leser, Ulf
PY - 2014
Y1 - 2014
N2 - Technological advancements in high throughput sequencing have led to a tremendous increase in the amount of genomic data produced. With the cost being down to 2,000 USD for a single human genome, sequencing dozens of individuals is an undertaking that is feasible even for a smaller projects or organizations established. However, generating the sequence is only one issue; another one is storing, managing, and analyzing it. These tasks become more and more challenging due to the sheer size of the data sets and are increasingly considered to be the major bottlenecks in larger genome projects. One possible countermeasure is to compress the data; compression reduces costs in terms of requiring less hard disk storage and in terms of requiring less bandwidth if data is shipped to large compute clusters for parallel analysis. Accordingly, sequence compression has recently attracted much interest in the scientific community. In this paper, we explain the different basic techniques for sequence compression, point to distinctions between different compression tasks (e.g., genome compression versus read compression), and present a comparison of current approaches and tools. To further stimulate progress in genome compression research, we also identify key challenges for future systems.
AB - Technological advancements in high throughput sequencing have led to a tremendous increase in the amount of genomic data produced. With the cost being down to 2,000 USD for a single human genome, sequencing dozens of individuals is an undertaking that is feasible even for a smaller projects or organizations established. However, generating the sequence is only one issue; another one is storing, managing, and analyzing it. These tasks become more and more challenging due to the sheer size of the data sets and are increasingly considered to be the major bottlenecks in larger genome projects. One possible countermeasure is to compress the data; compression reduces costs in terms of requiring less hard disk storage and in terms of requiring less bandwidth if data is shipped to large compute clusters for parallel analysis. Accordingly, sequence compression has recently attracted much interest in the scientific community. In this paper, we explain the different basic techniques for sequence compression, point to distinctions between different compression tasks (e.g., genome compression versus read compression), and present a comparison of current approaches and tools. To further stimulate progress in genome compression research, we also identify key challenges for future systems.
KW - Genome compression
KW - Read compression
KW - Survey
UR - https://www.scopus.com/pages/publications/84904764719
U2 - 10.2174/1574893609666140516010143
DO - 10.2174/1574893609666140516010143
M3 - 文章
AN - SCOPUS:84904764719
SN - 1574-8936
VL - 9
SP - 315
EP - 326
JO - Current Bioinformatics
JF - Current Bioinformatics
IS - 3
ER -