跳到主要导航 跳到搜索 跳到主要内容

Accelerating De Novo Assembler WTDBG2 on Commodity Servers

  • Beihang University
  • Beijing University of Technology

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

De novo genome assembly reconstructs the chromosomes from massive relatively short fragmented reads and serves as fundamental for studying new species where there is no reference genome. Wtdbg2 is a de novo assembler for long reads that is up to hundreds of kilobases. It is based on fuzzy-Bruijn graph (FBG) and is ten times faster than the cutting-edge assemblers such as Canu. However, the performance of wtdbg2 still requires further improvement: 1) it requires up to terabytes of memory to compute the assembly, which is infeasible to run on commodity server; 2) it requires tens of hours for assembling on large datasets such as genomes of homo sapiens. To address the above drawbacks, we propose several optimization techniques for accelerating wtdbg2 on commodity server, including a memory auto-tuning scheme, sequence alignment optimization and intermediate result elimination in the output procedure. We compare the optimized wtdbg2 with the original implementation and two cutting-edge assemblers on real-world datasets. The experiment results demonstrate that optimized wtdbg2 achieves maximum and average speedup of 2.31× and 1.54× respectively. In addition, our proposed optimization reduces the memory usage of wtdbg2 by 39.5% without affecting the correctness.

源语言英语
主期刊名Algorithms and Architectures for Parallel Processing - 20th International Conference, ICA3PP 2020, Proceedings
编辑Meikang Qiu
出版商Springer Science and Business Media Deutschland GmbH
232-246
页数15
ISBN(印刷版)9783030602444
DOI
出版状态已出版 - 2020
活动20th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2020 - New York, 美国
期限: 2 10月 20204 10月 2020

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12452 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议20th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2020
国家/地区美国
New York
时期2/10/204/10/20

指纹

探究 'Accelerating De Novo Assembler WTDBG2 on Commodity Servers' 的科研主题。它们共同构成独一无二的指纹。

引用此