Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

Tadashi Imanishi(National Institute of Advanced Industrial Science and Technology), Takeshi Itoh(Institute of Agrobiological Sciences), Yutaka Suzuki(National Institute of Genetics), Claire O’Donovan(European Bioinformatics Institute), Satoshi Fukuchi(National Institute of Genetics), Kanako O. Koyanagi(Nara Institute of Science and Technology), Roberto A. Barrero(National Institute of Genetics), Takuro Tamura(Japan Biological Informatics Consortium), Yumi Yamaguchi‐Kabata(National Institute of Advanced Industrial Science and Technology), Motohiko Tanino(Japan Biological Informatics Consortium), Kei Yura, Satoru Miyazaki(National Institute of Genetics), Kazuho Ikeo(National Institute of Genetics), Keiichi Homma(National Institute of Genetics), Arek Kasprzyk(European Bioinformatics Institute), Tetsuo Nishikawa(Hitachi (Japan)), Mika Hirakawa(Kyoto University), Jean Thierry‐Mieg(Centre National de la Recherche Scientifique), Danielle Thierry‐Mieg(Centre National de la Recherche Scientifique), Jennifer Ashurst(Wellcome Sanger Institute), Libin Jia(National Institutes of Health), Mitsuteru Nakao(The University of Tokyo), Michael A. Thomas(Idaho State University), Nicola Mulder(European Bioinformatics Institute), Youla Karavidopoulou(European Bioinformatics Institute), Lihua Jin(National Institute of Genetics), Sangsoo Kim(Korea Research Institute of Bioscience and Biotechnology), Tomohiro Yasuda(Hitachi (Japan)), Boris Lenhard(Karolinska Institutet), Éric Eveno(Centre National de la Recherche Scientifique), Yoshiyuki Suzuki(National Institute of Genetics), Chisato Yamasaki(National Institute of Advanced Industrial Science and Technology), Jun‐ichi Takeda(National Institute of Advanced Industrial Science and Technology), Craig A. Gough(Japan Biological Informatics Consortium), Phillip B. Hilton(Japan Biological Informatics Consortium), Yasuyuki Fujii(Japan Biological Informatics Consortium), Hiroaki Sakai(Kyowa Kirin (Japan)), Susumu Tanaka(Japan Biological Informatics Consortium), Clara Amid(Institute of Bioinformatics and Systems Biology), M. Bellgard(Murdoch University), Maria de Fátima Bonaldo(University of Iowa), Hidemasa Bono(RIKEN Center for Integrative Medical Sciences), Susan K. Bromberg(Medical College of Wisconsin), Anthony J. Brookes(Karolinska Institutet), Elspeth A. Bruford(University College London), Piero Carninci(RIKEN), Claude Chelala(Centre National de la Recherche Scientifique), C Couillault(Centre National de la Recherche Scientifique), Sandro J. de Souza(Instituto Paulo Gontijo), Marie-Anne Debily(Centre National de la Recherche Scientifique), Marie‐Dominique Devignes(Centre National de la Recherche Scientifique), Inna Dubchak(Lawrence Berkeley National Laboratory), Toshinori Endo(Tokyo Medical and Dental University), Anne Estreicher(SIB Swiss Institute of Bioinformatics), Eduardo Eyras(Wellcome Sanger Institute), Kaoru Fukami-Kobayashi(RIKEN BioResource Research Center), Gopal Gopinath(Cold Spring Harbor Laboratory), Esther Graudens(Centre National de la Recherche Scientifique), Yoonsoo Hahn(Korea Research Institute of Bioscience and Biotechnology), Michael Han(Institute of Bioinformatics and Systems Biology), Ze‐Guang Han(Chinese National Human Genome Center at Shanghai), Kousuke Hanada(National Institute of Genetics), Hideki Hanaoka(National Institute of Advanced Industrial Science and Technology), Erimi Harada(Japan Biological Informatics Consortium), Katsuyuki Hashimoto(National Institute of Infectious Diseases), Ursula Hinz(SIB Swiss Institute of Bioinformatics), Momoki Hirai(The University of Tokyo), Teruyoshi Hishiki(National Institute of Advanced Industrial Science and Technology), Ian Hopkinson(The Royal Free Hospital), Sandrine Imbeaud(Centre National de la Recherche Scientifique), Hidetoshi Inoko(Tokai University), Alexander Kanapin(European Bioinformatics Institute), Yayoi Kaneko(Japan Biological Informatics Consortium), Takeya Kasukawa(RIKEN Center for Integrative Medical Sciences), Janet Kelso(University of the Western Cape), Paul Kersey(European Bioinformatics Institute), Reiko Kikuno(Kazusa DNA Research Institute), Kouichi Kimura(Hitachi (Japan)), Bernhard Korn, Vladimir Kuryshev(German Cancer Research Center), Izabela Makałowska(Pennsylvania State University), Takashi Makino(National Institute of Genetics), Shuhei Mano(Tokai University), Régine Mariage‐Samson(Centre National de la Recherche Scientifique), Jun Mashima(National Institute of Genetics), Hideo Matsuda(The University of Osaka), Hans‐Werner Mewes(Institute of Bioinformatics and Systems Biology), Shinsei Minoshima(Keio University), Keiichi Nagai(Hitachi (Japan)), Hideki Nagasaki(National Institute of Advanced Industrial Science and Technology), Naoki Nagata(National Institute of Advanced Industrial Science and Technology), Rajni Nigam(Medical College of Wisconsin), Osamu Ogasawara(The University of Tokyo), Osamu Ohara(Kazusa DNA Research Institute), Masafumi Ohtsubo(Keio University), Norihiro Okada(Tokyo Institute of Technology), Toshihisa Okido(National Institute of Genetics), Satoshi Oota(RIKEN BioResource Research Center), Motonori Ota(Tokyo Institute of Technology), Toshio Ota(Kyowa Kirin (Japan)), Tetsuji Otsuki(Taisho Pharmaceutical (Japan)), Dominique Piatier‐Tonneau(Centre National de la Recherche Scientifique), Annemarie Poustka(German Cancer Research Center), Shuangxi Ren(Chinese National Human Genome Center at Shanghai), Naruya Saitou(National Institute of Genetics), Katsunaga Sakai(National Institute of Genetics), Shigetaka Sakamoto(National Institute of Genetics), Ryuichi Sakate(The University of Tokyo), Ingo Schupp(German Cancer Research Center), Florence Servant(European Bioinformatics Institute), Stephen T. Sherry(National Institutes of Health), Rie Shiba(Japan Biological Informatics Consortium), Nobuyoshi Shimizu(Keio University), Mary Shimoyama(Medical College of Wisconsin), Andrew J.G. Simpson(Instituto Paulo Gontijo), Bento Soares(University of Iowa), Charles A. Steward(Wellcome Sanger Institute), Makiko Suwa(National Institute of Advanced Industrial Science and Technology), Mami Suzuki(National Institute of Genetics), Aiko Takahashi(Japan Biological Informatics Consortium), Gen Tamiya(Tokai University), Hiroshi Tanaka(Tokyo Medical and Dental University), Todd D. Taylor(RIKEN Center for Integrative Medical Sciences), Joseph D. Terwilliger(New York Genome Center), Per Unneberg(KTH Royal Institute of Technology), Vamsi Veeramachaneni(Pennsylvania State University), Shinya Watanabe(The University of Tokyo), Laurens Wilming(Wellcome Sanger Institute), Norikazu Yasuda(Japan Biological Informatics Consortium), Hyang‐Sook Yoo(Korea Research Institute of Bioscience and Biotechnology), Marvin Stodolsky(United States Department of Energy), Wojciech Makałowski(Pennsylvania State University), Mitiko Gō(Nagahama Institute of Bio-Science and Technology), Kenta Nakai(The University of Tokyo), Toshihisa Takagi(The University of Tokyo), Minoru Kanehisa(Kyoto University), Yoshiyuki Sakaki(RIKEN Center for Integrative Medical Sciences), John Quackenbush, Yasushi Okazaki(RIKEN Center for Integrative Medical Sciences), Yoshihide Hayashizaki(RIKEN Center for Integrative Medical Sciences), Winston Hide(University of the Western Cape), Ranajit Chakraborty(University of Cincinnati), Ken Nishikawa(National Institute of Genetics), Hideaki Sugawara(National Institute of Genetics), Yoshio Tateno(National Institute of Genetics), Zhu Chen(Chinese National Human Genome Center at Shanghai), Michio Oishi(Kazusa DNA Research Institute), Peter J. Tonellato, Rolf Apweiler(European Bioinformatics Institute), Kousaku Okubo(National Institute of Genetics), Lukas Wagner(National Institutes of Health), Stefan Wiemann(German Cancer Research Center), Robert L. Strausberg(National Institutes of Health), Takao Isogai(University of Tsukuba), Charles Auffray(Centre National de la Recherche Scientifique), Nobuo Nomura(National Institute of Advanced Industrial Science and Technology), Takashi Gojobori(National Institute of Genetics), Sumio Sugano(National Institute of Advanced Industrial Science and Technology)
PLoS Biology
April 19, 2004
Cited by 334Open Access
Full Text

Abstract

The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.


Related Papers

No related papers found

Powered by citation graph analysis