Evolutionary-scale prediction of atomic-level protein structure with a language model

Zeming Lin(The Metropolitan Opera (United States)), Halil Akin(The Metropolitan Opera (United States)), Roshan Rao(The Metropolitan Opera (United States)), Brian Hie(Palo Alto University), Zhongkai Zhu(The Metropolitan Opera (United States)), Wenting Lu(The Metropolitan Opera (United States)), Nikita Smetanin(The Metropolitan Opera (United States)), Robert Verkuil(The Metropolitan Opera (United States)), Ori Kabeli(The Metropolitan Opera (United States)), Yaniv Shmueli(The Metropolitan Opera (United States)), Allan dos Santos Costa(Massachusetts Institute of Technology), Maryam Fazel-Zarandi(The Metropolitan Opera (United States)), Tom Sercu(The Metropolitan Opera (United States)), Salvatore Candido(The Metropolitan Opera (United States)), Alexander Rives(The Metropolitan Opera (United States))
Science
March 16, 2023
Cited by 4,747Open Access
Full Text

Abstract

Recent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level protein structure from primary sequence using a large language model. As language models of protein sequences are scaled up to 15 billion parameters, an atomic-resolution picture of protein structure emerges in the learned representations. This results in an order-of-magnitude acceleration of high-resolution structure prediction, which enables large-scale structural characterization of metagenomic proteins. We apply this capability to construct the ESM Metagenomic Atlas by predicting structures for >617 million metagenomic protein sequences, including >225 million that are predicted with high confidence, which gives a view into the vast breadth and diversity of natural proteins.


Related Papers

No related papers found

Powered by citation graph analysis