Skip to main navigation Skip to search Skip to main content

String searching in referentially compressed genomes

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Background: Improved sequencing techniques have led to large amounts of biological sequence data. One of the challenges in managing sequence data is efficient storage. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. However, so far sequences always have to be decompressed prior to an analysis. There is a need for algorithms working on compressed data directly, avoiding costly decompression. Summary: In our work, we address this problem by proposing an algorithm for exact string search over compressed data. The algorithm works directly on referentially compressed genome sequences, without needing an index for each genome and only using partial decompression. Results: Our string search algorithm for referentially compressed genomes performs exact string matching for large sets of genomes faster than using an index structure, e.g. suffix trees, for each genome, especially for short queries. We think that this is an important step towards space and runtime efficient management of large biological data sets.

Original languageEnglish
Title of host publicationKDIR 2012 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval
Pages95-102
Number of pages8
StatePublished - 2012
Externally publishedYes
Event4th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2012 - Barcelona, Spain
Duration: 4 Oct 20127 Oct 2012

Publication series

NameKDIR 2012 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval

Conference

Conference4th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2012
Country/TerritorySpain
CityBarcelona
Period4/10/127/10/12

Keywords

  • Genome compression
  • Referential compression
  • String search

Fingerprint

Dive into the research topics of 'String searching in referentially compressed genomes'. Together they form a unique fingerprint.

Cite this