RRCA: Ultra-fast multiple in-species genome alignments

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multiple sequence alignment is an important method in Bioinformatics, for instance, to reconstruct phylogenetic trees or for identifying functional domains within genes. Finding an optimal MSA is computationally intractable, and therefore many alignment heuristics were proposed. However, computing MSA for sequences at chromosome/genome scale in a reasonable time with good alignment results remains an open challenge. In this paper we propose RRCA, a very fast method to compute high-quality in-species MSAs at genome scale. RRCA uses referential compression to efficiently find long common subsequences in to-be-aligned sequences. A colinear sub collection of these subsequences is used for an initial alignment and the not yet covered subsequences are aligned following the same approach recursively. Our evaluation shows that RRCA achieves MSAs at similar quality as current state-of-the-art methods, while often being orders of magnitude faster for all our datasets. For instance, RRCA aligns eight human Chromosome 22 (around 50 MB each) within one minute on a consumer computer; a task that takes hours to days with competitors.

Original languageEnglish
Title of host publicationAlgorithms for Computational Biology - First International Conference, AlCoB 2014, Proceedings
PublisherSpringer Verlag
Pages247-261
Number of pages15
ISBN (Print)9783319079523
DOIs
StatePublished - 2014
Externally publishedYes
Event1st International Conference on Algorithms for Computational Biology, AlCoB 2014 - Tarragona, Spain
Duration: 1 Jul 20143 Jul 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8542 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference1st International Conference on Algorithms for Computational Biology, AlCoB 2014
Country/TerritorySpain
CityTarragona
Period1/07/143/07/14

Keywords

  • Multiple sequence alignment
  • referential compression

Fingerprint

Dive into the research topics of 'RRCA: Ultra-fast multiple in-species genome alignments'. Together they form a unique fingerprint.

Cite this