Highly accurate protein structure prediction for the human proteome

Kathryn Tunyasuvunakool(Google DeepMind (United Kingdom)), Jonas Adler(Google DeepMind (United Kingdom)), Zachary Wu(Google DeepMind (United Kingdom)), Tim Green(Google DeepMind (United Kingdom)), Michał Zieliński(Google DeepMind (United Kingdom)), Augustin Žídek(Google DeepMind (United Kingdom)), Alex Bridgland(Google DeepMind (United Kingdom)), Andrew Cowie(Google DeepMind (United Kingdom)), Clemens Meyer(Google DeepMind (United Kingdom)), Agata Laydon(Google DeepMind (United Kingdom)), Sameer Velankar(European Bioinformatics Institute), Gerard J. Kleywegt(European Bioinformatics Institute), Alex Bateman(European Bioinformatics Institute), Richard Evans(Google DeepMind (United Kingdom)), Alexander Pritzel(Google DeepMind (United Kingdom)), Michael Figurnov(Google DeepMind (United Kingdom)), Olaf Ronneberger(Google DeepMind (United Kingdom)), Russ Bates(Google DeepMind (United Kingdom)), Simon Köhl(Google DeepMind (United Kingdom)), Anna Potapenko(Google DeepMind (United Kingdom)), Andrew J. Ballard(Google DeepMind (United Kingdom)), Bernardino Romera‐Paredes(Google DeepMind (United Kingdom)), Stanislav Nikolov(Google DeepMind (United Kingdom)), Rishub Jain(Google DeepMind (United Kingdom)), Ellen Clancy(Google DeepMind (United Kingdom)), David Reiman(Google DeepMind (United Kingdom)), Stig Petersen(Google DeepMind (United Kingdom)), Andrew Senior(Google DeepMind (United Kingdom)), Koray Kavukcuoglu(Google DeepMind (United Kingdom)), Ewan Birney(European Bioinformatics Institute), Pushmeet Kohli(Google DeepMind (United Kingdom)), John Jumper(Google DeepMind (United Kingdom)), Demis Hassabis(Google DeepMind (United Kingdom))
Nature
July 22, 2021
Cited by 3,186Open Access
Full Text

Abstract

Abstract Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure 1 . Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold 2 , at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.


Related Papers