Scalable protein design using optimization in a relaxed sequence space

Christopher L. Frank(Technical University of Munich), Ali Khoshouei(Technical University of Munich), Lara Fuβ(Technical University of Munich), Dominik Schiwietz(Technical University of Munich), Dominik Putz(Technical University of Munich), Lara Weber(Technical University of Munich), Zhixuan Zhao(Fudan University), Motoyuki Hattori(Fudan University), Shihao Feng, Yosta de Stigter(Technical University of Munich), Sergey Ovchinnikov(Harvard University), Hendrik Dietz(Technical University of Munich)
Science
October 24, 2024
Cited by 68Open Access
Full Text

Abstract

Machine learning (ML)-based design approaches have advanced the field of de novo protein design, with diffusion-based generative methods increasingly dominating protein design pipelines. Here, we report a "hallucination"-based protein design approach that functions in relaxed sequence space, enabling the efficient design of high-quality protein backbones over multiple scales and with broad scope of application without the need for any form of retraining. We experimentally produced and characterized more than 100 proteins. Three high-resolution crystal structures and two cryo-electron microscopy density maps of designed single-chain proteins comprising up to 1000 amino acids validate the accuracy of the method. Our pipeline can also be used to design synthetic protein-protein interactions, as validated experimentally by a set of protein heterodimers. Relaxed sequence optimization offers attractive performance with respect to designability, scope of applicability for different design problems, and scalability across protein sizes.


Related Papers

<i>Coot</i>: model-building tools for molecular graphics
Paul Emsley, Kevin Cowtan|Acta Crystallographica Section D Biological Crystallography|2004|31.3k
Features and development of <i>Coot</i>
Paul Emsley, Bernhard Lohkamp, W. G. Scott et al.|Acta Crystallographica Section D Biological Crystallography|2010|29.4k
<i>XDS</i>
Wolfgang Kabsch|Acta Crystallographica Section D Biological Crystallography|2010|16.8k