Unraveling the functional dark matter through global metagenomics

Georgios A. Pavlopoulos(Lawrence Berkeley National Laboratory), Fotis A. Baltoumas(Alexander Fleming Biomedical Sciences Research Center), Sirui Liu(Harvard University Press), Oğuz Selvitopi(Lawrence Berkeley National Laboratory), Antônio Pedro Camargo(Lawrence Berkeley National Laboratory), Stephen Nayfach(Lawrence Berkeley National Laboratory), Ariful Azad(Indiana University Bloomington), Simon Roux(Lawrence Berkeley National Laboratory), Lee Call(Lawrence Berkeley National Laboratory), Natalia Ivanova(Lawrence Berkeley National Laboratory), I. Min Chen(Lawrence Berkeley National Laboratory), David Páez-Espino(Lawrence Berkeley National Laboratory), Evangelos Karatzas(Alexander Fleming Biomedical Sciences Research Center), Silvia G. Acinas(Institut Català de Ciències del Clima), Nathan A. Ahlgren(Clark University), Graeme T. Attwood(AgResearch), Petr Baldrián(Czech Academy of Sciences, Institute of Microbiology), Timothy D. Berry(University of Wisconsin–Madison), Jennifer Bhatnagar(Boston University), Devaki Bhaya(Carnegie Institution for Science), Kay D. Bidle(Rutgers, The State University of New Jersey), Jeffrey L. Blanchard(University of Massachusetts Amherst), Eric S. Boyd(Montana State University), Jennifer L. Bowen(Northeastern University), Jeff S. Bowman(Scripps Institution of Oceanography), Susan H. Brawley(University of Maine), Eoin Brodie(Lawrence Berkeley National Laboratory), Andreas Brune(Max Planck Institute for Terrestrial Microbiology), Donald A. Bryant(Pennsylvania State University), Alison Buchan(University of Tennessee at Knoxville), Hinsby Cadillo‐Quiroz(Arizona State University), Barbara J. Campbell(Clemson University), Ricardo Cavicchioli(UNSW Sydney), Peter F. Chuckran(Northern Arizona University), Maureen L. Coleman(University of Chicago), Sean A. Crowe(University of British Columbia), Daniel R. Colman(San Diego State University), Cameron R. Currie(University of Wisconsin–Madison), Jeff Dangl(University of North Carolina at Chapel Hill), Nathalie Delherbe(San Diego State University), Vincent J. Denef(University of Michigan), Paul Dijkstra(Northern Arizona University), Daniel D. Distel(Northeastern University), Emiley A. Eloe‐Fadrosh(Lawrence Berkeley National Laboratory), Kirsten M. Fisher(California State University Los Angeles), Christopher Francis(Stanford University), Aaron Garoutte(Michigan State University), Amélie C. M. Gaudin(University of California, Davis), Lena Gerwick(Scripps Institution of Oceanography), Filipa Godoy‐Vitorino(University of Puerto Rico, Medical Sciences Campus), Peter Guerra, Jiarong Guo(Michigan State University), Mussie Y. Habteselassie(University of Georgia), Steven Hallam(University of British Columbia), Roland Hatzenpichler(Montana State University), Ute Hentschel(GEOMAR Helmholtz Centre for Ocean Research Kiel), Matthias Hess(University of California, Davis), Ann M. Hirsch(University of California, Los Angeles), Laura Hug(University of Waterloo), Jenni Hultman(University of Helsinki), Dana E. Hunt(Duke University), Marcel Huntemann(Lawrence Berkeley National Laboratory), William P. Inskeep(Montana State University), Timothy Y. James(University of Michigan), Janet Jansson(Pacific Northwest National Laboratory), Eric R. Johnston(Oak Ridge National Laboratory), Marina Kalyuzhnaya(University of North Carolina at Chapel Hill), Charlene N. Kelly(West Virginia University), Robert M. Kelly(North Carolina State University), Jonathan L. Klassen(University of Connecticut), Klaus Nüsslein(University of Massachusetts Amherst), Joel E. Kostka(Georgia Institute of Technology), Steven E. Lindow(University of California, Berkeley), Erik A. Lilleskov(Northern Research Station), Mackenzie M. Lynes(Montana State University), Rachel Mackelprang(California State University, Northridge), Francis Martin(Interactions Arbres-Microorganismes), Olivia U. Mason(Florida State University), R. Michael L. McKay(University of Windsor), Katherine D. McMahon(University of Wisconsin–Madison), David A. Mead(Varigen Biosciences (United States)), Mónica Medina(Pennsylvania State University), Laura K. Meredith(University of Arizona), Thomas Möck(University of East Anglia), William W. Mohn(University of British Columbia), Mary Ann Moran(University of Georgia), Alison E. Murray(Desert Research Institute), Josh D. Neufeld(University of Waterloo), Rebecca B. Neumann(University of Washington), Jeanette M. Norton(Utah State University), Laila P. Partida‐Martínez(Instituto Politécnico Nacional), Nicole Pietrasiak(New Mexico State University), Dale A. Pelletier(Oak Ridge National Laboratory), T. B. K. Reddy(Lawrence Berkeley National Laboratory), Brandi Kiel Reese(University of South Alabama), Nicholas J. Reichart(Montana State University), Rebecca A. Reiss(New Mexico Institute of Mining and Technology), Mak A. Saito(Woods Hole Oceanographic Institution), Daniel P. Schachtman(University of Nebraska–Lincoln), R. Seshadri(Lawrence Berkeley National Laboratory), Ashley Shade(Michigan State University), David R. Sherman(University of Michigan), Rachel L. Simister(University of British Columbia), Holly M. Simon(Oregon Health & Science University), James Stegen(Pacific Northwest National Laboratory), Ramūnas Stepanauskas(Bigelow Laboratory for Ocean Sciences), Matthew B. Sullivan(The Ohio State University), Dawn Y. Sumner(University of California, Davis), Hanno Teeling(Max Planck Institute for Marine Microbiology), Kimberlee Thamatrakoln(Rutgers, The State University of New Jersey), Kathleen K. Treseder(University of California, Irvine), Susannah G. Tringe(Lawrence Berkeley National Laboratory), Parag Vaishampayan(Ames Research Center), David L. Valentine(University of California, Santa Barbara), Nicholas B. Waldo(University of Washington), Mark P. Waldrop(Geology, Minerals, Energy, and Geophysics Science Center), David A. Walsh(Concordia University), David M. Ward(Montana State University), Michael J. Wilkins(Colorado State University), Thea Whitman(University of Wisconsin–Madison), Jamie Woolet(Colorado State University), Tanja Woyke(Lawrence Berkeley National Laboratory), Ioannis Iliopoulos(University of Crete), Konstantinos T. Konstantinidis(Georgia Institute of Technology), James M. Tiedje(Michigan State University), Jennifer Pett‐Ridge(Lawrence Livermore National Laboratory), David Baker(Howard Hughes Medical Institute), Axel Visel(Lawrence Berkeley National Laboratory), Christos Ouzounis(Lawrence Berkeley National Laboratory), Sergey Ovchinnikov(Harvard University Press), Aydın Buluç(Lawrence Berkeley National Laboratory), Nikos C. Kyrpides(Lawrence Berkeley National Laboratory)
Nature
October 11, 2023
Cited by 193Open Access
Full Text

Abstract

Abstract Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities 1,2 . Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database 3 . Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.


Related Papers

No related papers found

Powered by citation graph analysis