SOAPdenovo2: an empirically improved memory-efficient short-read <i>de novo</i> assembler

Ruibang Luo(Guangzhou HKUST Fok Ying Tung Research Institute), Binghang Liu(Guangzhou HKUST Fok Ying Tung Research Institute), Yinlong Xie(Guangzhou HKUST Fok Ying Tung Research Institute), Zhenyu Li(Guangzhou HKUST Fok Ying Tung Research Institute), Weihua Huang(Guangzhou HKUST Fok Ying Tung Research Institute), Jianying Yuan(Guangzhou HKUST Fok Ying Tung Research Institute), Guangzhu He(Guangzhou HKUST Fok Ying Tung Research Institute), Yanxiang Chen(Guangzhou HKUST Fok Ying Tung Research Institute), Qi Pan(Guangzhou HKUST Fok Ying Tung Research Institute), Yunjie Liu(Guangzhou HKUST Fok Ying Tung Research Institute), Jingbo Tang(Guangzhou HKUST Fok Ying Tung Research Institute), Gengxiong Wu(Guangzhou HKUST Fok Ying Tung Research Institute), Hao Zhang(Guangzhou HKUST Fok Ying Tung Research Institute), Yujian Shi(Guangzhou HKUST Fok Ying Tung Research Institute), Yong Liu(Guangzhou HKUST Fok Ying Tung Research Institute), Chang Yu(Guangzhou HKUST Fok Ying Tung Research Institute), Bo Wang(Guangzhou HKUST Fok Ying Tung Research Institute), Yao Lu(Guangzhou HKUST Fok Ying Tung Research Institute), Changlei Han(Guangzhou HKUST Fok Ying Tung Research Institute), David W. Cheung(University of Hong Kong), Siu‐Ming Yiu(University of Hong Kong), Shaoliang Peng(National University of Defense Technology), Zhu Xiaoqian(National University of Defense Technology), Guangming Liu(National University of Defense Technology), Xiangke Liao(National University of Defense Technology), Yingrui Li(Guangzhou HKUST Fok Ying Tung Research Institute), Huanming Yang(Guangzhou HKUST Fok Ying Tung Research Institute), Jian Wang(Guangzhou HKUST Fok Ying Tung Research Institute), Tak‐Wah Lam(University of Hong Kong), Jun Wang(Guangzhou HKUST Fok Ying Tung Research Institute)
GigaScience
December 1, 2012
Cited by 5,599Open Access
Full Text

Abstract

BACKGROUND: There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions. FINDINGS: To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome. CONCLUSIONS: Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.


Related Papers

No related papers found

Powered by citation graph analysis