A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis

Runxuan Zhang(James Hutton Institute), Richard Kuo(Roslin Institute), Max Coulter(James Hutton Institute), Cristiane P. G. Calixto(James Hutton Institute), Juan Carlos Entizne(James Hutton Institute), Wenbin Guo(James Hutton Institute), Yamile Márquez(Centre for Genomic Regulation), Linda Milne(James Hutton Institute), Stefan Riegler(Institute of Science and Technology Austria), Akihiro Matsui(RIKEN Center for Sustainable Resource Science), Maho Tanaka(RIKEN Center for Sustainable Resource Science), Sarah Harvey(University of York), Yubang Gao(Fujian Agriculture and Forestry University), Theresa Wießner-Kroh(University of Tübingen), Alejandro Paniagua(Consejo Superior de Investigaciones Científicas), Martín Crespi(Centre National de la Recherche Scientifique), Katherine Denby(University of York), Asa Ben‐Hur(Colorado State University), Enamul Huq(The University of Texas at Austin), Michael F. Jantsch(Medical University of Vienna), Artur Jarmołowski(Adam Mickiewicz University in Poznań), Tino Koester(Bielefeld University), Sascha Laubinger(Carl von Ossietzky Universität Oldenburg), Qingshun Quinn Li(Xiamen University), Lianfeng Gu(Fujian Agriculture and Forestry University), Motoaki Seki(RIKEN Center for Sustainable Resource Science), Dorothee Staiger(Bielefeld University), Ramanjulu Sunkar(Oklahoma State University), Zofia Szweykowska-Kulińska(Adam Mickiewicz University in Poznań), Shih‐Long Tu(Institute of Plant and Microbial Biology, Academia Sinica), Andreas Wachter(Johannes Gutenberg University Mainz), Robbie Waugh(James Hutton Institute), Liming Xiong(Hong Kong Baptist University), Xiao‐Ning Zhang(St. Bonaventure University), Ana Conesa(Consejo Superior de Investigaciones Científicas), Anireddy S. N. Reddy(Colorado State University), Andrea Barta(Max Perutz Labs), Maria Kalyna(BOKU University), John W. Brown(James Hutton Institute)
Genome biology
July 7, 2022
Cited by 117Open Access
Full Text

Abstract

BACKGROUND: Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. RESULTS: We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts-twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. CONCLUSIONS: AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species.


Related Papers