SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditionsDespite the many approaches to study differential splicing from RNA-seq, many challenges remain unsolved, including computing capacity and sequencing depth requirements. Here we present SUPPA2, a new method that addresses these challenges, and enables streamlined analysis across multiple conditions taking into account biological variability. Using experimental and simulated data, we show that SUPPA2 achieves higher accuracy compared to other methods, especially at low sequencing depth and short read length. We use SUPPA2 to identify novel Transformer2-regulated exons, novel microexons induced during differentiation of bipolar neurons, and novel intron retention events during erythroblast differentiation.
A high quality Arabidopsis transcriptome for accurate transcript-level analysis of alternative splicingAlternative splicing generates multiple transcript and protein isoforms from the same gene and thus is important in gene expression regulation. To date, RNA-sequencing (RNA-seq) is the standard method for quantifying changes in alternative splicing on a genome-wide scale. Understanding the current limitations of RNA-seq is crucial for reliable analysis and the lack of high quality, comprehensive transcriptomes for most species, including model organisms such as Arabidopsis, is a major constraint in accurate quantification of transcript isoforms. To address this, we designed a novel pipeline with stringent filters and assembled a comprehensive Reference Transcript Dataset for Arabidopsis (AtRTD2) containing 82,190 non-redundant transcripts from 34 212 genes. Extensive experimental validation showed that AtRTD2 and its modified version, AtRTD2-QUASI, for use in Quantification of Alternatively Spliced Isoforms, outperform other available transcriptomes in RNA-seq analysis. This strategy can be implemented in other species to build a pipeline for transcript-level expression and alternative splicing analyses.
Rapid and Dynamic Alternative Splicing Impacts the Arabidopsis Cold Response TranscriptomeSuch factors likely drive cascades of AS of downstream genes that, alongside transcription, modulate transcriptome reprogramming that together govern the physiological and survival responses of plants to low temperature.
A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysisBACKGROUND: Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. RESULTS: We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts-twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. CONCLUSIONS: AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species.
<scp>BaRTv2</scp> : a highly resolved barley reference transcriptome for accurate transcript‐specific <scp>RNA</scp> ‐seq quantificationAccurate characterisation of splice junctions (SJs) as well as transcription start and end sites in reference transcriptomes allows precise quantification of transcripts from RNA-seq data, and enables detailed investigations of transcriptional and post-transcriptional regulation. Using novel computational methods and a combination of PacBio Iso-seq and Illumina short-read sequences from 20 diverse tissues and conditions, we generated a comprehensive and highly resolved barley reference transcript dataset from the European 2-row spring barley cultivar Barke (BaRTv2.18). Stringent and thorough filtering was carried out to maintain the quality and accuracy of the SJs and transcript start and end sites. BaRTv2.18 shows increased transcript diversity and completeness compared with an earlier version, BaRTv1.0. The accuracy of transcript level quantification, SJs and transcript start and end sites have been validated extensively using parallel technologies and analysis, including high-resolution reverse transcriptase-polymerase chain reaction and 5'-RACE. BaRTv2.18 contains 39 434 genes and 148 260 transcripts, representing the most comprehensive and resolved reference transcriptome in barley to date. It provides an important and high-quality resource for advanced transcriptomic analyses, including both transcriptional and post-transcriptional regulation, with exceptional resolution and precision.