Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56 419 completely sequenced and manually annotated full-length cDNAs

Jun‐ichi Takeda(Japan Biological Informatics Consortium), Yutaka Suzuki(The University of Tokyo), Mitsuteru Nakao(Kazusa DNA Research Institute), Roberto A. Barrero(National Institute of Genetics), Kanako O. Koyanagi(Hokkaido University), Lihua Jin(National Institute of Genetics), Chie Motono(National Institute of Advanced Industrial Science and Technology), Hiroko Hata(The University of Tokyo), Takao Isogai(Kazusa DNA Research Institute), Keiichi Nagai(Kazusa DNA Research Institute), Tetsuji Otsuki(Kazusa DNA Research Institute), Vladimir Kuryshev(German Cancer Research Center), Masafumi Shionyu(Nagahama Institute of Bio-Science and Technology), Kei Yura(Japan Atomic Energy Agency), Mitiko Gō(Otsuka (Japan)), Jean Thierry‐Mieg(National Institutes of Health), Danielle Thierry‐Mieg(National Institutes of Health), Stefan Wiemann(German Cancer Research Center), Nobuo Nomura(National Institute of Advanced Industrial Science and Technology), Sumio Sugano(The University of Tokyo), Takashi Gojobori(National Institute of Genetics), Tadashi Imanishi(Hokkaido University)
Nucleic Acids Research
August 12, 2006
Cited by 49Open Access
Full Text

Abstract

We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56,419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37,670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants.


Related Papers

No related papers found

Powered by citation graph analysis