Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Francisco J. Pardo-Palacios(Consejo Superior de Investigaciones Científicas), Dingjie Wang(University of Michigan), Fairlie Reese(University of California, Irvine), Mark Diekhans(University of California, Santa Cruz), Sílvia Carbonell Sala(Centre for Genomic Regulation), Brian A. Williams(California Institute of Technology), Jane Loveland(European Bioinformatics Institute), Maite De María(Cherokee Nation), Matthew S. Adams(University of California, Santa Cruz), Gabriela Balderrama-Gutierrez(University of California, Irvine), Amit K. Behera(University of California, Santa Cruz), José M. González(European Bioinformatics Institute), Toby Hunt(European Bioinformatics Institute), Julien Lagarde(SOM Biotech (Spain)), Cindy Liang(University of California, Santa Cruz), Haoran Li(University of Michigan), Marcus J. Meade(University of Virginia), David A. Moraga Amador(University of Florida), Andrey D. Prjibelski(University of Helsinki), İnanç Birol(Canada's Michael Smith Genome Sciences Centre), Hamed Bostan(National Institute of Environmental Health Sciences), Ashley M. Brooks(University of California, Santa Cruz), Muhammed Hasan Çelik(University of California, Irvine), Ying Chen(Agency for Science, Technology and Research), Mei R. M. Du(Walter and Eliza Hall Institute of Medical Research), Colette Felton(University of California, Santa Cruz), Jonathan Göke(Agency for Science, Technology and Research), Saber Hafezqorani(Canada's Michael Smith Genome Sciences Centre), Ralf Herwig(Max Planck Institute for Molecular Genetics), Hideya Kawaji(Tokyo Metropolitan Institute of Medical Science), Joseph Lee(Agency for Science, Technology and Research), Jian‐Liang Li(National Institute of Environmental Health Sciences), Matthias Lienhard(Max Planck Institute for Molecular Genetics), Alla Mikheenko(National Hospital for Neurology and Neurosurgery), Dennis Mulligan(University of California, Santa Cruz), Ka Ming Nip(Canada's Michael Smith Genome Sciences Centre), Mihaela Pertea(Johns Hopkins University), Matthew E. Ritchie(The University of Melbourne), Andre Sim(Agency for Science, Technology and Research), Alison D. Tang(University of California, Santa Cruz), Yuk Kei Wan(Agency for Science, Technology and Research), Changqing Wang(Walter and Eliza Hall Institute of Medical Research), Brandon Wong(Johns Hopkins University), Chen Yang(Canada's Michael Smith Genome Sciences Centre), If Barnes(European Bioinformatics Institute), Andrew Berry(European Bioinformatics Institute), Salvador Capella-Gutiérrez(Barcelona Supercomputing Center), Alyssa Cousineau(University of Massachusetts Chan Medical School), Namrita Dhillon(University of California, Santa Cruz), José M. Fernández(Barcelona Supercomputing Center), Luis Ferrández-Peral(Consejo Superior de Investigaciones Científicas), Natàlia Garcia-Reyero(Office of the Secretary of Defense), Stefan Götz, Carles Hernandéz-Ferrer(Barcelona Supercomputing Center), Liudmyla Kondratova(University of Florida), Tianyuan Liu(Cardiff University), Alessandra Martinez-Martin(Consejo Superior de Investigaciones Científicas), Carlos Menor, Jorge Mestre‐Tomás(Consejo Superior de Investigaciones Científicas), Jonathan M. Mudge(European Bioinformatics Institute), Nedka G. Panayotova(University of Florida), Alejandro Paniagua(Consejo Superior de Investigaciones Científicas), Dmitry Repchevsky(Barcelona Supercomputing Center), Xingjie Ren(University of California, San Francisco), Eric C. Rouchka(University of Louisville), Brandon Saint-John(University of California, Santa Cruz), Enrique Sapena(European Bioinformatics Institute), Leon Sheynkman(University of Virginia), Melissa Smith(University of Louisville), Marie‐Marthe Suner(European Bioinformatics Institute), Hazuki Takahashi(RIKEN Center for Integrative Medical Sciences), Ingrid Youngworth(Stanford University), Piero Carninci(Human Technopole), Nancy D. Denslow(University of Florida), Roderic Guigó(Universitat Pompeu Fabra), Margaret E. Hunter(United States Geological Survey), René Maehr(University of Massachusetts Chan Medical School), Yin Shen(University of California, San Francisco), Hagen Tilgner(Cornell University), B Wold(California Institute of Technology), Christopher Vollmers(University of California, Santa Cruz), Adam Frankish(European Bioinformatics Institute), Kin Fai Au(University of Michigan), Gloria Sheynkman(University of Virginia Health System), A Mortazavi(University of California, Irvine), Ana Conesa(University of Florida), Angela N. Brooks(University of California, Santa Cruz)
Nature Methods
June 7, 2024
Cited by 198Open Access
Full Text

Abstract

The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.


Related Papers