Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts

Liang Sun; Haitao Luo; Dechao Bu; Guoguang Zhao; Kuntao Yu; Changhai Zhang; Yuanning Liu; Runsheng Chen; Yi Zhao

doi:10.1093/nar/gkt646

Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts

Liang Sun(Chinese Academy of Sciences), Haitao Luo(Chinese Academy of Sciences), Dechao Bu(Chinese Academy of Sciences), Guoguang Zhao(Chinese Academy of Sciences), Kuntao Yu(Chinese Academy of Sciences), Changhai Zhang(Chinese Academy of Sciences), Yuanning Liu(Chinese Academy of Sciences), Runsheng Chen(Chinese Academy of Sciences), Yi Zhao(Chinese Academy of Sciences)

Nucleic Acids Research

July 27, 2013

10.1093/nar/gkt646

Cited by 2,333Open Access

Full Text

Abstract

It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense-antisense pairs. The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates, and invertebrates, or between plants, and provided a long non-coding RNA catalog of orangutan. CNCI software is available at http://www.bioinfo.org/software/cnci.

Related Papers

LIBSVM

Chih-Chung Chang, Chih‐Jen Lin|ACM Transactions on Intelligent Systems and Technology|2011|41.3k

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation

Cole Trapnell, Brian A. Williams, Geo Pertea et al.|Nature Biotechnology|2010|16.4k

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks

Cole Trapnell, Adam Roberts, Loyal A. Goff et al.|Nature Protocols|2012|12.9k

TopHat: discovering splice junctions with RNA-Seq

Cole Trapnell, Lior Pachter, Steven L. Salzberg|Bioinformatics|2009|12.1k

The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression

Thomas Derrien, Rory Johnson, Giovanni Bussotti et al.|Genome Research|2012|5.2k