RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions

Jinbao Yang(Huazhong Agricultural University), Xianjia Zhao(Zhengzhou University), Heling Jiang(Agricultural Genomics Institute at Shenzhen), Yingxue Yang(Agricultural Genomics Institute at Shenzhen), Yuze Hou(Agricultural Genomics Institute at Shenzhen), Weihua Pan(Huazhong Agricultural University)
Horticulture Research
December 29, 2022
Cited by 4Open Access
Full Text

Abstract

Abstract Telomere to telomere (T2T) assembly relies on the correctness of sequence alignments. However, the existing aligners tend to generate a high proportion of false-positive alignments in repetitive genomic regions which impedes the generation of T2T-level reference genomes for more important species. In this paper, we present an automatic algorithm called RAfilter for removing the false-positives in the outputs of existing aligners. RAfilter takes advantage of rare k-mers representing the copy-specific features to differentiate false-positive alignments from the correct ones. Considering the huge numbers of rare k-mers in large eukaryotic genomes, a series of high-performance computing techniques such as multi-threading and bit operation are used to improve the time and space efficiencies. The experimental results on tandem repeats and interspersed repeats show that RAfilter was able to filter 60%–90% false-positive HiFi alignments with almost no correct ones removed, while the sensitivities and precisions on ONT datasets were about 80% and 50% respectively.


Related Papers

No related papers found

Powered by citation graph analysis