An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Yuxiang Jiang(Indiana University Bloomington), Tal Oron(Buck Institute for Research on Aging), Wyatt T. Clark(Yale University), Asma Bankapur(Miami University), Daniel D’Andrea(Sapienza University of Rome), Rosalba Lepore(Sapienza University of Rome), Christopher S. Funk(University of Colorado Denver), Indika Kahanda(Colorado State University), Karin Verspoor(The University of Melbourne), Asa Ben‐Hur(Colorado State University), Da Chen Emily Koo(New York University), Duncan Penfold-Brown(Data & Society Research Institute), Dennis Shasha(New York University), Noah Youngs(Simons Foundation), Richard Bonneau(Simons Foundation), Alexandra J. Lin(University of California, Berkeley), Sayed Mohammad Ebrahim Sahraeian(University of California, Berkeley), Pier Luigi Martelli(University of Bologna), Giuseppe Profiti(University of Bologna), Rita Casadio(University of Bologna), Renzhi Cao(University of Missouri), Zhaolong Zhong(University of Missouri), Jianlin Cheng(University of Missouri), Adrian Altenhoff(SIB Swiss Institute of Bioinformatics), Nives Škunca(SIB Swiss Institute of Bioinformatics), Christophe Dessimoz(SIB Swiss Institute of Bioinformatics), Tunca Doğan(European Bioinformatics Institute), Kai Hakala(University of Turku), Suwisa Kaewphan(University of Turku), Farrokh Mehryary(University of Turku), Tapio Salakoski(University of Turku), Filip Ginter(University of Turku), Hai Fang(University of Bristol), Ben Smithers(University of Bristol), Matt E. Oates(University of Bristol), Julian Gough(University of Bristol), Petri Törönen(University of Helsinki), Patrik Koskinen(University of Helsinki), Liisa Holm(University of Helsinki), Ching-Tai Chen(Institute of Information Science, Academia Sinica), Wen−Lian Hsu(Institute of Information Science, Academia Sinica), Kevin Bryson(University College London), Domenico Cozzetto(University College London), Federico Minneci(University College London), David T. Jones(University College London), Samuel Chapman(North Carolina Agricultural and Technical State University), Dukka Bkc(North Carolina Agricultural and Technical State University), Ishita Khan(Purdue University West Lafayette), Daisuke Kihara(Purdue University West Lafayette), Dan Ofer(Hebrew University of Jerusalem), Nadav Rappoport(Hebrew University of Jerusalem), Amos Stern(Hebrew University of Jerusalem), Elena Cibrián–Uhalte(European Bioinformatics Institute), Paul Denny(University College London), Rebecca E. Foulger(University College London), Reija Hieta(European Bioinformatics Institute), Duncan Legge(European Bioinformatics Institute), Ruth C. Lovering(University College London), Michele Magrane(European Bioinformatics Institute), Anna N. Melidoni(University College London), Prudence Mutowo(European Bioinformatics Institute), Klemens Pichler(European Bioinformatics Institute), Aleksandra Shypitsyna(European Bioinformatics Institute), Biao Li(Buck Institute for Research on Aging), Pooya Zakeri(iMinds), Sarah ElShal(iMinds), Léon-Charles Tranchevent(Université Claude Bernard Lyon 1), Sayoni Das(Institute of Structural and Molecular Biology), Natalie L. Dawson(Institute of Structural and Molecular Biology), David Lee(Institute of Structural and Molecular Biology), Jonathan Lees(Institute of Structural and Molecular Biology), Ian Sillitoe(Institute of Structural and Molecular Biology), Prajwal Bhat, Tamás Nepusz(Molde University College), Alfonso E. Romero(Royal Holloway University of London), Rajkumar Sasidharan(University of California, Los Angeles), Haixuan Yang(Ollscoil na Gaillimhe – University of Galway), Alberto Paccanaro(Royal Holloway University of London), Jesse Gillis(Cold Spring Harbor Laboratory), Adriana E. Sedeño-Cortés(University of British Columbia), Paul Pavlidis(University of British Columbia), Shou Feng(Indiana University Bloomington), Juan Miguel Cejuela(Technical University of Munich), Tatyana Goldberg(Technical University of Munich), Tobias Hamp(Technical University of Munich), Lothar Richter(Technical University of Munich), Asaf Salamov(Joint Genome Institute), Toni Gabaldón(Institució Catalana de Recerca i Estudis Avançats), Marina Marcet‐Houben(Universitat Pompeu Fabra), Fran Supek(Universitat Pompeu Fabra), Qingtian Gong(Fudan University), Wei Ning(Fudan University), Yuanpeng Zhou(Fudan University), Weidong Tian(Fudan University), Marco Falda(University of Padua), Paolo Fontana(Fondazione Edmund Mach), Enrico Lavezzo(University of Padua), Stefano Toppo(University of Padua), Carlo Ferrari(University of Padua), Manuel Giollo(University of Padua), Damiano Piovesan(University of Padua), Silvio C. E. Tosatto(University of Padua), Ángela del Pozo(Hospital Universitario La Paz), José M. Fernández(Spanish National Cancer Research Centre), Paolo Maietta(Spanish National Cancer Research Centre), Alfonso Valencia(Spanish National Cancer Research Centre), Michael L. Tress(Spanish National Cancer Research Centre), Alfredo Benso(Politecnico di Torino), Stefano Di Carlo(Politecnico di Torino), Gianfranco Politano(Politecnico di Torino), Alessandro Savino(Politecnico di Torino), Hafeez Ur Rehman(National University of Computer and Emerging Sciences), Matteo Ré(University of Milan), Marco Mesiti(University of Milan), Giorgio Valentini(University of Milan), Joachim W. Bargsten(Wageningen University & Research), Aalt D. J. van Dijk(Wageningen University & Research), Branislava Gemović(University of Belgrade), Sanja Glišić(University of Belgrade), Vladimir Perović(University of Belgrade), Veljko Veljković(University of Belgrade), Nevena Veljković(University of Belgrade), Danillo C. Almeida-e-Silva(Universidade de Ribeirão Preto), Ricardo Z. N. Vêncio(Universidade de Ribeirão Preto), Malvika Sharan(University of Würzburg), Jörg Vogel(University of Würzburg), Lakesh Kansakar(Temple University), Shanshan Zhang(Temple University), Slobodan Vučetić(Temple University), Zheng Wang(University of Southern Mississippi), Michael J.E. Sternberg(Imperial College London), Mark N. Wass(University of Kent), Rachael P. Huntley(European Bioinformatics Institute), María Martin(European Bioinformatics Institute), Claire O’Donovan(European Bioinformatics Institute), Peter N. Robinson(Charité - Universitätsmedizin Berlin), Yves Moreau(KU Leuven), Anna Tramontano(Sapienza University of Rome), Patricia C. Babbitt(QB3), Steven E. Brenner(University of California, Berkeley), Michal Linial(Hebrew University of Jerusalem), Christine Orengo(Institute of Structural and Molecular Biology), Burkhard Rost(Technical University of Munich), Casey S. Greene(Translational Therapeutics (United States)), Sean D. Mooney(University of Washington Medical Center), Iddo Friedberg(Miami University), Predrag Radivojac(Indiana University Bloomington)
Genome biology
September 7, 2016
Cited by 450Open Access
Full Text

Abstract

BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.


Related Papers

No related papers found

Powered by citation graph analysis