iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data

Aziz M. Mezlini; Eric Smith; Marc Fiume; Orion J. Buske; Gleb L. Savich; Sohrab P. Shah; Samuel Aparício; Derek Y. Chiang; Anna Goldenberg; Michael Brudno

doi:10.1101/gr.142232.112

iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data

Aziz M. Mezlini(University of Toronto), Eric Smith(University of Toronto), Marc Fiume(University of Toronto), Orion J. Buske(University of Toronto), Gleb L. Savich(University of North Carolina at Chapel Hill), Sohrab P. Shah(BC Cancer Agency), Samuel Aparício(BC Cancer Agency), Derek Y. Chiang(University of North Carolina at Chapel Hill), Anna Goldenberg(University of Toronto), Michael Brudno(University of Toronto)

Genome Research

November 29, 2012

10.1101/gr.142232.112

Cited by 137Open Access

Full Text

Abstract

High-throughput RNA sequencing (RNA-seq) promises to revolutionize our understanding of genes and their role in human disease by characterizing the RNA content of tissues and cells. The realization of this promise, however, is conditional on the development of effective computational methods for the identification and quantification of transcripts from incomplete and noisy data. In this article, we introduce iReckon, a method for simultaneous determination of the isoforms and estimation of their abundances. Our probabilistic approach incorporates multiple biological and technical phenomena, including novel isoforms, intron retention, unspliced pre-mRNA, PCR amplification biases, and multimapped reads. iReckon utilizes regularized expectation-maximization to accurately estimate the abundances of known and novel isoforms. Our results on simulated and real data demonstrate a superior ability to discover novel isoforms with a significantly reduced number of false-positive predictions, and our abundance accuracy prediction outmatches that of other state-of-the-art tools. Furthermore, we have applied iReckon to two cancer transcriptome data sets, a triple-negative breast cancer patient sample and the MCF7 breast cancer cell line, and show that iReckon is able to reconstruct the complex splicing changes that were not previously identified. QT-PCR validations of the isoforms detected in the MCF7 cell line confirmed all of iReckon's predictions and also showed strong agreement (r(2) = 0.94) with the predicted abundances.

Heng Li, Richard Durbin|Bioinformatics|2009|62.5k

RNA-Seq: a revolutionary tool for transcriptomics

Zhong Wang, Mark Gerstein, M Snyder|Nature Reviews Genetics|2008|13.3k

TopHat: discovering splice junctions with RNA-Seq

Cole Trapnell, Lior Pachter, Steven L. Salzberg|Bioinformatics|2009|12.1k

Finite Mixture Models

Geoffrey J. McLachlan, Sharon X. Lee, Suren I. Rathnayake|Wiley series in probability and statistics|2000|7.4k

Algorithm 778: L-BFGS-B

Ciyou Zhu, Richard H. Byrd, Peihuang Lu et al.|ACM Transactions on Mathematical Software|1997|3.4k

iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data

Abstract

Related Papers