Multistate and functional protein design using RoseTTAFold sequence space diffusion

Sidney Lisanza(University of Washington), Jacob Merle Gershon(University of Washington), S. Tipps(University of Washington), Jeremiah Nelson Sims(University of Washington), Lucas Arnoldt(University of Washington), Samuel J. Hendel(University of Washington), Miriam K. Simma(Georgia Institute of Technology), Ge Liu(University of Washington), Muna Yase(University of Washington), Hongwei Wu(Georgia Institute of Technology), Claire D. Tharp(Georgia Institute of Technology), Xinting Li(University of Washington), Alex Kang(University of Washington), Evans Brackenbrough(University of Washington), Asim K. Bera(University of Washington), Stacey Gerben(University of Washington), Bruce J. Wittmann(Microsoft (United States)), Andrew C. McShan(Georgia Institute of Technology), David Baker(Howard Hughes Medical Institute)
Nature Biotechnology
September 25, 2024
Cited by 104Open Access
Full Text

Abstract

Protein denoising diffusion probabilistic models are used for the de novo generation of protein backbones but are limited in their ability to guide generation of proteins with sequence-specific attributes and functional properties. To overcome this limitation, we developed ProteinGenerator (PG), a sequence space diffusion model based on RoseTTAFold that simultaneously generates protein sequences and structures. Beginning from a noised sequence representation, PG generates sequence and structure pairs by iterative denoising, guided by desired sequence and structural protein attributes. We designed thermostable proteins with varying amino acid compositions and internal sequence repeats and cage bioactive peptides, such as melittin. By averaging sequence logits between diffusion trajectories with distinct structural constraints, we designed multistate parent-child protein triples in which the same sequence folds to different supersecondary structures when intact in the parent versus split into two child domains. PG design trajectories can be guided by experimental sequence-activity data, providing a general approach for integrated computational and experimental optimization of protein function.


Related Papers

No related papers found

Powered by citation graph analysis