Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software

Alexander Sczyrba(Bielefeld University), Peter Hofmann(Helmholtz Centre for Infection Research), Peter Belmann(Bielefeld University), David Koslicki(Oregon State University), Stefan Janssen(University of California San Diego), Johannes Dröge(Helmholtz Centre for Infection Research), Ivan Gregor(Helmholtz Centre for Infection Research), Stephan Majda(Heinrich Heine University Düsseldorf), Jessika Fiedler(Helmholtz Centre for Infection Research), Eik Dahms(Helmholtz Centre for Infection Research), Andreas Bremges(Bielefeld University), Adrian Fritz(Helmholtz Centre for Infection Research), Rubén Garrido‐Oter(Helmholtz Centre for Infection Research), Tue Sparholt Jørgensen(Roskilde University), Nicole Shapiro(Joint Genome Institute), Philip D. Blood(Pittsburgh Supercomputing Center), Alexey Gurevich(St Petersburg University), Yang Bai(Max Planck Institute for Plant Breeding Research), Dmitrij Turaev(University of Vienna), Matthew Z. DeMaere(University of Technology Sydney), Rayan Chikhi(Centre National de la Recherche Scientifique), Niranjan Nagarajan(Genome Institute of Singapore), Christopher Quince(University of Warwick), Fernando Meyer(Helmholtz Centre for Infection Research), Monika Balvočiūtė(University of Tübingen), Lars Hestbjerg Hansen(Aarhus University), Søren J. Sørensen(University of Copenhagen), Burton Kuan Hui Chia(Genome Institute of Singapore), Bertrand Denis(Genome Institute of Singapore), Jeff Froula(Joint Genome Institute), Zhong Wang(Joint Genome Institute), Robert W. Egan(Joint Genome Institute), Dongwan Kang(Joint Genome Institute), Jeffrey Cook(Intel (United States)), Charles Deltel(Institut de Recherche en Informatique et Systèmes Aléatoires), Michael Beckstette(Helmholtz Centre for Infection Research), Claire Lemaitre(Institut de Recherche en Informatique et Systèmes Aléatoires), Pierre Peterlongo(Institut de Recherche en Informatique et Systèmes Aléatoires), Guillaume Rizk(Institut de Recherche en Informatique et Systèmes Aléatoires), Dominique Lavenier(Centre National de la Recherche Scientifique), Yu‐Wei Wu(Taipei Medical University), Steven W. Singer(Lawrence Berkeley National Laboratory), Chirag Jain(Georgia Institute of Technology), Marc Strous(University of Calgary), Heiner Klingenberg(University of Göttingen), Peter Meinicke(University of Göttingen), Michael D. Barton(Joint Genome Institute), Thomas Lingner(Genevention (Germany)), Hsin-Hung Lin(National Health Research Institutes), Yu-Chieh Liao(National Health Research Institutes), Genivaldo Gueiros Z. Silva(San Diego State University), Daniel Cuevas(San Diego State University), Robert A. Edwards(San Diego State University), Surya Saha(Cornell University), Vitor C. Piro(Robert Koch Institute), Bernhard Y. Renard(Robert Koch Institute), Mihai Pop(University of Maryland, College Park), Hans‐Peter Klenk(Newcastle University), Markus Göker(Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures), Nikos C. Kyrpides(Joint Genome Institute), Tanja Woyke(Joint Genome Institute), Julia A. Vorholt(ETH Zurich), Paul Schulze‐Lefert(Cluster of Excellence on Plant Sciences), Edward M. Rubin(Joint Genome Institute), Aaron E. Darling(University of Technology Sydney), Thomas Rattei(University of Vienna), Alice C. McHardy(Helmholtz Centre for Infection Research)
Nature Methods
October 2, 2017
Cited by 941Open Access
Full Text

Abstract

The Critical Assessment of Metagenome Interpretation (CAMI) community initiative presents results from its first challenge, a rigorous benchmarking of software for metagenome assembly, binning and taxonomic profiling. Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.


Related Papers

No related papers found

Powered by citation graph analysis