Biogenesis, identification, and function of exonic circular <scp>RNAs</scp>Iju Chen, C Chen, Trees‐Juen Chuang|Wiley Interdisciplinary Reviews - RNA|2015 Circular RNAs (circRNAs) arise during post-transcriptional processes, in which a single-stranded RNA molecule forms a circle through covalent binding. Previously, circRNA products were often regarded to be splicing intermediates, by-products, or products of aberrant splicing. But recently, rapid advances in high-throughput RNA sequencing (RNA-seq) for global investigation of nonco-linear (NCL) RNAs, which comprised sequence segments that are topologically inconsistent with the reference genome, leads to renewed interest in this type of NCL RNA (i.e., circRNA), especially exonic circRNAs (ecircRNAs). Although the biogenesis and function of ecircRNAs are mostly unknown, some ecircRNAs are abundant, highly expressed, or evolutionarily conserved. Some ecircRNAs have been shown to affect microRNA regulation, and probably play roles in regulating parental gene transcription, cell proliferation, and RNA-binding proteins, indicating their functional potential for development as diagnostic tools. To date, thousands of ecircRNAs have been identified in multiple tissues/cell types from diverse species, through analyses of RNA-seq data. However, the detection of ecircRNA candidates involves several major challenges, including discrimination between ecircRNAs and other types of NCL RNAs (e.g., trans-spliced RNAs and genetic rearrangements); removal of sequencing errors, alignment errors, and in vitro artifacts; and the reconciliation of heterogeneous results arising from the use of different bioinformatics methods or sequencing data generated under different treatments. Such challenges may severely hamper the understanding of ecircRNAs. Herein, we review the biogenesis, identification, properties, and function of ecircRNAs, and discuss some unanswered questions regarding ecircRNAs. We also evaluate the accuracy (in terms of sensitivity and precision) of some well-known circRNA-detecting methods.
NCLscan: accurate identification of non-co-linear transcripts (fusion,<i>trans</i>-splicing and circular RNA) with a good balance between sensitivity and precisionAnalysis of RNA-seq data often detects numerous 'non-co-linear' (NCL) transcripts, which comprised sequence segments that are topologically inconsistent with their corresponding DNA sequences in the reference genome. However, detection of NCL transcripts involves two major challenges: removal of false positives arising from alignment artifacts and discrimination between different types of NCL transcripts (trans-spliced, circular or fusion transcripts). Here, we developed a new NCL-transcript-detecting method ('NCLscan'), which utilized a stepwise alignment strategy to almost completely eliminate false calls (>98% precision) without sacrificing true positives, enabling NCLscan outperform 18 other publicly-available tools (including fusion- and circular-RNA-detecting tools) in terms of sensitivity and precision, regardless of the generation strategy of simulated dataset, type of intragenic or intergenic NCL event, read depth of coverage, read length or expression level of NCL transcript. With the high accuracy, NCLscan was applied to distinguishing between trans-spliced, circular and fusion transcripts on the basis of poly(A)- and nonpoly(A)-selected RNA-seq data. We showed that circular RNAs were expressed more ubiquitously, more abundantly and less cell type-specifically than trans-spliced and fusion transcripts. Our study thus describes a robust pipeline for the discovery of NCL transcripts, and sheds light on the fundamental biology of these non-canonical RNA events in human transcriptome.
Integrative transcriptome sequencing identifies <i>trans</i>-splicing events with important roles in human embryonic stem cell pluripotencyTrans-splicing is a post-transcriptional event that joins exons from separate pre-mRNAs. Detection of trans-splicing is usually severely hampered by experimental artifacts and genetic rearrangements. Here, we develop a new computational pipeline, TSscan, which integrates different types of high-throughput long-/short-read transcriptome sequencing of different human embryonic stem cell (hESC) lines to effectively minimize false positives while detecting trans-splicing. Combining TSscan screening with multiple experimental validation steps revealed that most chimeric RNA products were platform-dependent experimental artifacts of RNA sequencing. We successfully identified and confirmed four trans-spliced RNAs, including the first reported trans-spliced large intergenic noncoding RNA ("tsRMST"). We showed that these trans-spliced RNAs were all highly expressed in human pluripotent stem cells and differentially expressed during hESC differentiation. Our results further indicated that tsRMST can contribute to pluripotency maintenance of hESCs by suppressing lineage-specific gene expression through the recruitment of NANOG and the PRC2 complex factor, SUZ12. Taken together, our findings provide important insights into the role of trans-splicing in pluripotency maintenance of hESCs and help to facilitate future studies into trans-splicing, opening up this important but understudied class of post-transcriptional events for comprehensive characterization.
Large-scale benchmarking of circRNA detection tools reveals large differences in sensitivity but not in precisionIntegrative transcriptome sequencing reveals extensive alternative <i>trans</i>-splicing and <i>cis</i>-backsplicing in human cellsTranscriptionally non-co-linear (NCL) transcripts can originate from trans-splicing (trans-spliced RNA; 'tsRNA') or cis-backsplicing (circular RNA; 'circRNA'). While numerous circRNAs have been detected in various species, tsRNAs remain largely uninvestigated. Here, we utilize integrative transcriptome sequencing of poly(A)- and non-poly(A)-selected RNA-seq data from diverse human cell lines to distinguish between tsRNAs and circRNAs. We identified 24,498 NCL events and found that a considerable proportion (20-35%) of them arise from both tsRNAs and circRNAs, representing extensive alternative trans-splicing and cis-backsplicing in human cells. We show that sequence generalities of exon circularization are also observed in tsRNAs. Recapitulation of NCL RNAs further shows that inverted Alu repeats can simultaneously promote the formation of tsRNAs and circRNAs. However, tsRNAs and circRNAs exhibit quite different, or even opposite, expression patterns, in terms of correlation with the expression of their co-linear counterparts, expression breadth/abundance, transcript stability, and subcellular localization preference. These results indicate that tsRNAs and circRNAs may play different regulatory roles and analysis of NCL events should take the joint effects of different NCL-splicing types and joint effects of multiple NCL events into consideration. This study describes the first transcriptome-wide analysis of trans-splicing and cis-backsplicing, expanding our understanding of the complexity of the human transcriptome.