Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAsOnly a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
Collection, Mapping, and Annotation of Over 28,000 cDNA Clones from <i>japonica</i> RiceWe collected and completely sequenced 28,469 full-length complementary DNA clones from Oryza sativa L. ssp. japonica cv. Nipponbare. Through homology searches of publicly available sequence data, we assigned tentative protein functions to 21,596 clones (75.86%). Mapping of the cDNA clones to genomic DNA revealed that there are 19,000 to 20,500 transcription units in the rice genome. Protein informatics analysis against the InterPro database revealed the existence of proteins presented in rice but not in Arabidopsis. Sixty-four percent of our cDNAs are homologous to Arabidopsis proteins.
Functional annotation of a full-length mouse cDNA collectionFunctional Annotation of a Full-Length <i>Arabidopsis</i> cDNA CollectionFull-length complementary DNAs (cDNAs) are essential for the correct annotation of genomic sequences and for the functional analysis of genes and their products. We isolated 155,144 RIKEN Arabidopsis full-length (RAFL) cDNA clones. The 3'-end expressed sequence tags (ESTs) of 155,144 RAFL cDNAs were clustered into 14,668 nonredundant cDNA groups, about 60% of predicted genes. We also obtained 5' ESTs from 14,034 nonredundant cDNA groups and constructed a promoter database. The sequence database of the RAFL cDNAs is useful for promoter analysis and correct annotation of predicted transcription units and gene products. Furthermore, the full-length cDNAs are useful resources for analyses of the expression profiles, functions, and structures of plant proteins.
High-Efficiency Full-Length cDNA Cloning by Biotinylated CAP Trapper