Scalable emulation of protein equilibrium ensembles with generative deep learning

Sarah Lewis(Microsoft Research (United Kingdom)), Tim Hempel(Microsoft Research (United Kingdom)), José Jiménez-Luna(Microsoft Research (United Kingdom)), Michael Gastegger(Microsoft Research (United Kingdom)), Yu Xie(Microsoft Research (United Kingdom)), Andrew Y. K. Foong(Microsoft Research (United Kingdom)), Víctor García Satorras(Microsoft Research (United Kingdom)), Osama Abdin(Microsoft Research (United Kingdom)), Bastiaan S. Veeling(Microsoft Research (United Kingdom)), Iryna Zaporozhets(Microsoft Research (United Kingdom)), Yaoyi Chen(Microsoft Research (United Kingdom)), Soojung Yang(Microsoft Research (United Kingdom)), Adam Foster(Microsoft Research (United Kingdom)), Arne Schneuing(Microsoft Research (United Kingdom)), Jigyasa Nigam(Microsoft Research (United Kingdom)), Federico Barbero(Microsoft Research (United Kingdom)), Vincent Stimper(Microsoft Research (United Kingdom)), Andrew M. Campbell(Microsoft Research (United Kingdom)), Jason Yim(Microsoft Research (United Kingdom)), Marten Lienen(Microsoft Research (United Kingdom)), Yu Shi(Microsoft Research (United Kingdom)), Shuxin Zheng(Microsoft Research (United Kingdom)), Hannes Schulz(Microsoft Research (United Kingdom)), Usman Munir(Microsoft Research (United Kingdom)), Roberto Sordillo(Microsoft Research (United Kingdom)), Ryota Tomioka(Microsoft Research (United Kingdom)), Cecilia Clementi(Microsoft Research (United Kingdom)), Frank Noé(Microsoft Research (United Kingdom))
Science
July 10, 2025
Cited by 195

Abstract

Following the sequence and structure revolutions, predicting functionally relevant protein structure changes at scale remains an outstanding challenge. We introduce BioEmu, a deep learning system that emulates protein equilibrium ensembles by generating thousands of statistically independent structures per hour on a single graphics processing unit (GPU). BioEmu integrates more than 200 milliseconds of molecular dynamics (MD) simulations, static structures, and experimental protein stabilities using new training algorithms. It captures diverse functional motions-including cryptic pocket formation, local unfolding, and domain rearrangements-and predicts relative free energies with 1 kilocalorie per mole accuracy compared with millisecond-scale MD and experimental data. BioEmu provides mechanistic insights by jointly modeling structural ensembles and thermodynamic properties. This approach amortizes the cost of MD and experimental data generation, demonstrating a scalable path toward understanding and designing protein function.


Related Papers