Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Francisco J. Pardo-Palacios(Consejo Superior de Investigaciones Científicas), Dingjie Wang(University of Michigan), Fairlie Reese(University of California, Irvine), Mark Diekhans(University of California, Santa Cruz), Sílvia Carbonell Sala(Centre for Genomic Regulation), Brian A. Williams(California Institute of Technology), Jane Loveland(European Bioinformatics Institute), Maite De María(University of Florida), Matthew S. Adams(University of California, Santa Cruz), Gabriela Balderrama-Gutierrez(University of California, Irvine), Amit K. Behera(University of California, Santa Cruz), José M. González(European Bioinformatics Institute), Toby Hunt(European Bioinformatics Institute), Julien Lagarde(Centre for Genomic Regulation), Cindy Liang(University of California, Santa Cruz), Haoran Li(University of Michigan), Marcus J. Meade(University of Virginia), David A. Moraga Amador(University of Florida), Andrey D. Prjibelski(University of Helsinki), İnanç Birol(Canada's Michael Smith Genome Sciences Centre), Hamed Bostan(National Institute of Environmental Health Sciences), Ashley Brooks(University of California, Santa Cruz), Ashley Brooks(University of California, Santa Cruz), Muhammed Hasan Çelik(Agency for Science, Technology and Research), Ying Chen(Agency for Science, Technology and Research), Mei R. M. Du(University of California, Santa Cruz), Colette Felton(Agency for Science, Technology and Research), Jonathan Göke(Agency for Science, Technology and Research), Saber Hafezqorani(Canada's Michael Smith Genome Sciences Centre), Ralf Herwig(Tokyo Metropolitan Institute of Medical Science), Hideya Kawaji(Agency for Science, Technology and Research), Joseph Lee(Agency for Science, Technology and Research), Jian‐Liang Li(National Institute of Environmental Health Sciences), Matthias Lienhard(Max Planck Institute for Molecular Genetics), Alla Mikheenko(University of California, Santa Cruz), Dennis Mulligan(University of California, Santa Cruz), Ka Ming Nip(Johns Hopkins University), Mihaela Pertea(Johns Hopkins University), Matthew E. Ritchie(Agency for Science, Technology and Research), Andre Sim(Agency for Science, Technology and Research), Alison D. Tang(Agency for Science, Technology and Research), Yuk Kei Wan(Agency for Science, Technology and Research), Changqing Wang(Johns Hopkins University), Brandon Wong(Johns Hopkins University), Chen Yang(European Bioinformatics Institute), If Barnes(European Bioinformatics Institute), Andrew Berry(European Bioinformatics Institute), Salvador Capella-Gutiérrez(University of California, Santa Cruz), Namrita Dhillon(University of California, Santa Cruz), Jose M. Fernandez-Gonzalez(Consejo Superior de Investigaciones Científicas), Luis Ferrández-Peral(Consejo Superior de Investigaciones Científicas), Natàlia García‐Reyero(U.S. Army Engineer Research and Development Center), Stefan Goetz(Barcelona Supercomputing Center), Carles Hernandéz-Ferrer(Barcelona Supercomputing Center), Liudmyla Kondratova(University of Florida), Tianyuan Liu(Consejo Superior de Investigaciones Científicas), Alessandra Martinez-Martin(Consejo Superior de Investigaciones Científicas), Carlos Menor(Consejo Superior de Investigaciones Científicas), Jorge Mestre‐Tomás(European Bioinformatics Institute), Jonathan M. Mudge(European Bioinformatics Institute), Nedka G. Panayotova(Consejo Superior de Investigaciones Científicas), Alejandro Paniagua(Consejo Superior de Investigaciones Científicas), Dmitry Repchevsky(University of Louisville), Eric C. Rouchka(University of Louisville), Brandon Saint-John(European Bioinformatics Institute), Enrique Sapena(European Bioinformatics Institute), Leon Sheynkman(University of Louisville), Melissa Smith(European Bioinformatics Institute), Marie‐Marthe Suner(European Bioinformatics Institute), Hazuki Takahashi(RIKEN Center for Integrative Medical Sciences), Ingrid Youngworth(Human Technopole), Piero Carninci(University of Florida), Nancy D. Denslow(Universitat Pompeu Fabra), Roderic Guigó(United States Geological Survey), Margaret E. Hunter(United States Geological Survey), Hagen Tilgner(California Institute of Technology), B Wold(California Institute of Technology), Christopher Vollmers(European Bioinformatics Institute), Adam Frankish(European Bioinformatics Institute), Kin Fai Au(University of Michigan), Gloria Sheynkman(University of California, Irvine), A Mortazavi(Consejo Superior de Investigaciones Científicas), Ana Conesa(Consejo Superior de Investigaciones Científicas), Angela N. Brooks(University of California, Santa Cruz), Angela N. Brooks(University of California, Santa Cruz)
bioRxiv (Cold Spring Harbor Laboratory)
July 27, 2023
Cited by 27Open Access
Full Text

Abstract

Abstract The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.


Related Papers