A critical assessment of Mus musculusgene function prediction using integrated genomic evidence

Lourdes Peña‐Castillo(University of Toronto), Murat Taşan(Harvard University), Chad L. Myers(Princeton University), Hyunju Lee(Gwangju Institute of Science and Technology), Trupti Joshi(University of Missouri), Chao Zhang(University of Missouri), Yuanfang Guan(Princeton University), Michele Leone(Institute for Scientific Interchange), Andrea Pagnani(Institute for Scientific Interchange), Wankyu Kim(The University of Texas at Austin), Chase Krumpelman(The University of Texas at Austin), Weidong Tian(Harvard University), Guillaume Obozinski(University of California, Berkeley), Yanjun Qi(Carnegie Mellon University), Sara Mostafavi(University of Toronto), Guan Ning Lin(University of Missouri), Gabriel F. Berriz(Harvard University), Francis D. Gibbons(Harvard University), Gert Lanckriet(University of California San Diego), Jian Qiu(University of Washington), Charles E. Grant(University of Washington), Zafer Barutçuoğlu(Princeton University), David P. Hill(Jackson Laboratory), David Warde-Farley(University of Toronto), Chris Grouios(University of Toronto), Debajyoti Ray(Oxford Centre for Computational Neuroscience), Judith A. Blake(Jackson Laboratory), Minghua Deng(Peking University), Michael I. Jordan(University of California, Berkeley), William Stafford Noble(University of Washington), Quaid Morris(University of Toronto), Judith Klein‐Seetharaman(University of Pittsburgh), Ziv Bar‐Joseph(Carnegie Mellon University), Ting Chen(University of Southern California), Fengzhu Sun(University of Southern California), Olga G. Troyanskaya(Princeton University), Edward M. Marcotte(The University of Texas at Austin), Dong Xu(University of Missouri), Timothy R. Hughes(University of Toronto), Frederick P. Roth(Harvard University)
Genome biology
June 27, 2008
Cited by 258Open Access
Full Text

Abstract

BACKGROUND: Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated. RESULTS: In this study, a standardized collection of mouse functional genomic data was assembled; nine bioinformatics teams used this data set to independently train classifiers and generate predictions of function, as defined by Gene Ontology (GO) terms, for 21,603 mouse genes; and the best performing submissions were combined in a single set of predictions. We identified strengths and weaknesses of current functional genomic data sets and compared the performance of function prediction algorithms. This analysis inferred functions for 76% of mouse genes, including 5,000 currently uncharacterized genes. At a recall rate of 20%, a unified set of predictions averaged 41% precision, with 26% of GO terms achieving a precision better than 90%. CONCLUSION: We performed a systematic evaluation of diverse, independently developed computational approaches for predicting gene function from heterogeneous data sources in mammals. The results show that currently available data for mammals allows predictions with both breadth and accuracy. Importantly, many highly novel predictions emerge for the 38% of mouse genes that remain uncharacterized.


Related Papers

No related papers found

Powered by citation graph analysis