Evaluation of strategies for evidence-driven genome annotation using long-read RNA-seq

Alejandro Paniagua(Universitat de València), Cristina Agustin-García(National Academies of Sciences, Engineering, and Medicine), Francisco J. Pardo-Palacios(National Academies of Sciences, Engineering, and Medicine), Tom Brown(Berlin Center for Genomics in Biodiversity Research), Maite De María(University of Florida), Nancy D. Denslow(University of Florida), Camila J. Mazzoni(Berlin Center for Genomics in Biodiversity Research), Ana Conesa(National Academies of Sciences, Engineering, and Medicine)
Genome Research
December 23, 2024
Cited by 7Open Access
Full Text

Abstract

While the production of a draft genome has become more accessible due to long-read sequencing, the annotation of these new genomes has not been developed at the same pace. Long-read RNA sequencing offers a promising solution for enhancing gene annotation. In this study, we explore how sequencing platforms, Oxford Nanopore R9.4.1 chemistry or Pacific Biosciences (PacBio) Sequel II CCS, and data processing methods influence evidence-driven genome annotation using long reads. Incorporating PacBio transcripts into our annotation pipeline significantly outperformed traditional methods, such as ab initio predictions and short-read-based annotations. We applied this strategy to a nonmodel species, the Florida manatee, and compared our results to existing short-read-based annotation. At the loci level, both annotations were highly concordant, with 90% agreement. However, at the transcript level, the agreement was only 35%. We identified 4906 novel loci, represented by 5707 isoforms, with 64% of these isoforms matching known sequences in other mammalian species. Overall, our findings underscore the importance of using high-quality curated transcript models in combination with ab initio methods for effective genome annotation.


Related Papers

No related papers found

Powered by citation graph analysis