University of York
ORCID: 0000-0003-3539-804XPublishes on Genomics and Phylogenetic Studies, RNA modifications and cancer, Gut microbiota and health. 30 papers and 20.3k citations.
Add your photo, update your bio, and get notified when your ranking changes.
The biochemistry of RNA-Seq library preparation results in cDNA fragments that are not uniformly distributed within the transcripts they represent. This non-uniformity must be accounted for when estimating expression levels, and we show how to perform the needed corrections using a likelihood based approach. We find improvements in expression estimates as measured by correlation with independently performed qRT-PCR and show that correction of bias leads to improved replicability of results across libraries and sequencing technologies.
SUMMARY: We describe a new 'reference annotation based transcript assembly' problem for RNA-Seq data that involves assembling novel transcripts in the context of an existing annotation. This problem arises in the analysis of expression in model organisms, where it is desirable to leverage existing annotations for discovering novel transcripts. We present an algorithm for reference annotation-based transcript assembly and show how it can be used to rapidly investigate novel transcripts revealed by RNA-Seq in comparison with a reference annotation. AVAILABILITY: The methods described in this article are implemented in the Cufflinks suite of software for RNA-Seq, freely available from http://bio.math.berkeley.edu/cufflinks. The software is released under the BOOST license. CONTACT: cole@broadinstitute.org; lpachter@math.berkeley.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.