A high quality Arabidopsis transcriptome for accurate transcript-level analysis of alternative splicingAlternative splicing generates multiple transcript and protein isoforms from the same gene and thus is important in gene expression regulation. To date, RNA-sequencing (RNA-seq) is the standard method for quantifying changes in alternative splicing on a genome-wide scale. Understanding the current limitations of RNA-seq is crucial for reliable analysis and the lack of high quality, comprehensive transcriptomes for most species, including model organisms such as Arabidopsis, is a major constraint in accurate quantification of transcript isoforms. To address this, we designed a novel pipeline with stringent filters and assembled a comprehensive Reference Transcript Dataset for Arabidopsis (AtRTD2) containing 82,190 non-redundant transcripts from 34 212 genes. Extensive experimental validation showed that AtRTD2 and its modified version, AtRTD2-QUASI, for use in Quantification of Alternatively Spliced Isoforms, outperform other available transcriptomes in RNA-seq analysis. This strategy can be implemented in other species to build a pipeline for transcript-level expression and alternative splicing analyses.
Rapid and Dynamic Alternative Splicing Impacts the Arabidopsis Cold Response TranscriptomeSuch factors likely drive cascades of AS of downstream genes that, alongside transcription, modulate transcriptome reprogramming that together govern the physiological and survival responses of plants to low temperature.
Physiological, biochemical and molecular responses of the potato (<i><scp>S</scp>olanum tuberosum</i> <scp>L</scp>.) plant to moderately elevated temperatureAlthough significant work has been undertaken regarding the response of model and crop plants to heat shock during the acclimatory phase, few studies have examined the steady-state response to the mild heat stress encountered in temperate agriculture. In the present work, we therefore exposed tuberizing potato plants to mildly elevated temperatures (30/20 °C, day/night) for up to 5 weeks and compared tuber yield, physiological and biochemical responses, and leaf and tuber metabolomes and transcriptomes with plants grown under optimal conditions (22/16 °C). Growth at elevated temperature reduced tuber yield despite an increase in net foliar photosynthesis. This was associated with major shifts in leaf and tuber metabolite profiles, a significant decrease in leaf glutathione redox state and decreased starch synthesis in tubers. Furthermore, growth at elevated temperature had a profound impact on leaf and tuber transcript expression with large numbers of transcripts displaying a rhythmic oscillation at the higher growth temperature. RT-PCR revealed perturbation in the expression of circadian clock transcripts including StSP6A, previously identified as a tuberization signal. Our data indicate that potato plants grown at moderately elevated temperatures do not exhibit classic symptoms of abiotic stress but that tuber development responds via a diversity of biochemical and molecular signals.
Multicategory Classification Using An Extreme Learning Machine for Microarray Gene Expression Cancer DiagnosisRunxuan Zhang, Guang-Bin Huang, N. Sundararajan et al.|IEEE/ACM Transactions on Computational Biology and Bioinformatics|2007 In this paper, the recently developed Extreme Learning Machine (ELM) is used for direct multicategory classification problems in the cancer diagnosis area. ELM avoids problems like local minima, improper learning rate and overfitting commonly faced by iterative learning methods and completes the training very fast. We have evaluated the multi-category classification performance of ELM on three benchmark microarray datasets for cancer diagnosis, namely, the GCM dataset, the Lung dataset and the Lymphoma dataset. The results indicate that ELM produces comparable or better classification accuracies with reduced training time and implementation complexity compared to artificial neural networks methods like conventional back-propagation ANN, Linder's SANN, and Support Vector Machine methods like SVM-OVO and Ramaswamy's SVM-OVA. ELM also achieves better accuracies for classification of individual categories.
Illuminating the dark side of the human transcriptome with long read transcript sequencingBACKGROUND: The human transcriptome annotation is regarded as one of the most complete of any eukaryotic species. However, limitations in sequencing technologies have biased the annotation toward multi-exonic protein coding genes. Accurate high-throughput long read transcript sequencing can now provide additional evidence for rare transcripts and genes such as mono-exonic and non-coding genes that were previously either undetectable or impossible to differentiate from sequencing noise. RESULTS: We developed the Transcriptome Annotation by Modular Algorithms (TAMA) software to leverage the power of long read transcript sequencing and address the issues with current data processing pipelines. TAMA achieved high sensitivity and precision for gene and transcript model predictions in both reference guided and unguided approaches in our benchmark tests using simulated Pacific Biosciences (PacBio) and Nanopore sequencing data and real PacBio datasets. By analyzing PacBio Sequel II Iso-Seq sequencing data of the Universal Human Reference RNA (UHRR) using TAMA and other commonly used tools, we found that the convention of using alignment identity to measure error correction performance does not reflect actual gain in accuracy of predicted transcript models. In addition, inter-read error correction can cause major changes to read mapping, resulting in potentially over 6 K erroneous gene model predictions in the Iso-Seq based human genome annotation. Using TAMA's genome assembly based error correction and gene feature evidence, we predicted 2566 putative novel non-coding genes and 1557 putative novel protein coding gene models. CONCLUSIONS: Long read transcript sequencing data has the power to identify novel genes within the highly annotated human genome. The use of parameter tuning and extensive output information of the TAMA software package allows for in depth exploration of eukaryotic transcriptomes. We have found long read data based evidence for thousands of unannotated genes within the human genome. More development in sequencing library preparation and data processing are required for differentiating sequencing noise from real genes in long read RNA sequencing data.