Comprehensive assessment of 11 de novo HiFi assemblers on complex eukaryotic genomes and metagenomes

Wenjuan Yu(Agricultural Genomics Institute at Shenzhen), Haohui Luo(Agricultural Genomics Institute at Shenzhen), Jinbao Yang(Huazhong Agricultural University), Shengchen Zhang(Huazhong Agricultural University), Heling Jiang(Agricultural Genomics Institute at Shenzhen), Xianjia Zhao(Zhengzhou University), Xingqi Hui(Zhengzhou University), Da Sun(Agricultural Genomics Institute at Shenzhen), Liang Li(Fujian Academy of Agricultural Sciences), Xiuqing Wei(Fujian Academy of Agricultural Sciences), Stefano Lonardi(University of California, Riverside), Weihua Pan(Agricultural Genomics Institute at Shenzhen)
Genome Research
February 1, 2024
Cited by 29Open Access
Full Text

Abstract

Pacific Biosciences (PacBio) HiFi sequencing technology generates long reads (>10 kbp) with very high accuracy (<0.01% sequencing error). Although several de novo assembly tools are available for HiFi reads, there are no comprehensive studies on the evaluation of these assemblers. We evaluated the performance of 11 de novo HiFi assemblers on (1) real data for three eukaryotic genomes; (2) 34 synthetic data sets with different ploidy, sequencing coverage levels, heterozygosity rates, and sequencing error rates; (3) one real metagenomic data set; and (4) five synthetic metagenomic data sets with different composition abundance and heterozygosity rates. The 11 assemblers were evaluated using quality assessment tool (QUAST) and benchmarking universal single-copy ortholog (BUSCO). We also used several additional criteria, namely, completion rate, single-copy completion rate, duplicated completion rate, average proportion of largest category, average distance difference, quality value, run-time, and memory utilization. Results show that hifiasm and hifiasm-meta should be the first choice for assembling eukaryotic genomes and metagenomes with HiFi data. We performed a comprehensive benchmarking study of commonly used assemblers on complex eukaryotic genomes and metagenomes. Our study will help the research community to choose the most appropriate assembler for their data and identify possible improvements in assembly algorithms.


Related Papers

No related papers found

Powered by citation graph analysis