Scalable emulation of protein equilibrium ensembles with generative deep learning

Sarah Lewis(Microsoft Research (United Kingdom)), Tim Hempel(Microsoft Research (United Kingdom)), José Jiménez-Luna(Microsoft Research (United Kingdom)), Michael Gastegger(Microsoft Research (United Kingdom)), Yu Xie(Microsoft Research (United Kingdom)), Andrew Y. K. Foong(Microsoft Research (United Kingdom)), Víctor García Satorras(Microsoft Research (United Kingdom)), Osama Abdin(Microsoft Research (United Kingdom)), Bastiaan S. Veeling(Microsoft Research (United Kingdom)), Iryna Zaporozhets(Microsoft Research (United Kingdom)), Yaoyi Chen(Microsoft Research (United Kingdom)), Soojung Yang(Microsoft Research (United Kingdom)), Arne Schneuing(Microsoft Research (United Kingdom)), Jigyasa Nigam(Microsoft Research (United Kingdom)), Federico Barbero(Microsoft Research (United Kingdom)), Vincent Stimper(Microsoft Research (United Kingdom)), Andrew M. Campbell(Microsoft Research (United Kingdom)), Jason Yim(Microsoft Research (United Kingdom)), Marten Lienen(Microsoft Research (United Kingdom)), Yu Shi(Microsoft Research (United Kingdom)), Shuxin Zheng(Microsoft Research (United Kingdom)), Hannes Schulz(Microsoft Research (United Kingdom)), Usman Munir(Microsoft Research (United Kingdom)), Ryota Tomioka(Microsoft Research (United Kingdom)), Cecilia Clementi(Microsoft Research (United Kingdom)), Frank Noé(Microsoft Research (United Kingdom))
bioRxiv (Cold Spring Harbor Laboratory)
December 5, 2024
Cited by 59Open Access
Full Text

Abstract

Following the sequence and structure revolutions, predicting the dynamical mechanisms of proteins that implement biological function remains an outstanding scientific challenge. Several experimental techniques and molecular dynamics (MD) simulations can, in principle, determine conformational states, binding configurations and their probabilities, but suffer from low throughput. Here we develop a Biomolecular Emulator (BioEmu), a generative deep learning system that can generate thousands of statistically independent samples from the protein structure ensemble per hour on a single graphics processing unit. By leveraging novel training methods and vast data of protein structures, over 200 milliseconds of MD simulation, and experimental protein stabilities, BioEmu's protein ensembles represent equilibrium in a range of challenging and practically relevant metrics. Qualitatively, BioEmu samples many functionally relevant conformational changes, ranging from formation of cryptic pockets, over unfolding of specific protein regions, to large-scale domain rearrangements. Quantitatively, BioEmu samples protein conformations with relative free energy errors around 1 kcal/mol, as validated against millisecond-timescale MD simulation and experimentally-measured protein stabilities. By simultaneously emulating structural ensembles and thermodynamic properties, BioEmu reveals mechanistic insights, such as the causes for fold destabilization of mutants, and can efficiently provide experimentally-testable hypotheses.


Related Papers

No related papers found

Powered by citation graph analysis