Deep learning methods for designing proteins scaffolding functional sites

Jue Wang(University of Washington), Sidney Lisanza(University of Washington), David Juergens(University of Washington), Doug Tischer(University of Washington), Ivan Anishchenko(University of Washington), Minkyung Baek(University of Washington), Joseph L. Watson(University of Washington), Jung Ho Chun(University of Washington), Lukas F. Milles(University of Washington), Justas Dauparas(University of Washington), Marc Expòsit(University of Washington), Wei Yang(University of Washington), Amijai Saragovi(University of Washington), Sergey Ovchinnikov(Harvard University), David Baker(Howard Hughes Medical Institute)
bioRxiv (Cold Spring Harbor Laboratory)
November 12, 2021
Cited by 36Open Access
Full Text

Abstract

Abstract Current approaches to de novo design of proteins harboring a desired binding or catalytic motif require pre-specification of an overall fold or secondary structure composition, and hence considerable trial and error can be required to identify protein structures capable of scaffolding an arbitrary functional site. Here we describe two complementary approaches to the general functional site design problem that employ the RosettaFold and AlphaFold neural networks which map input sequences to predicted structures. In the first “constrained hallucination” approach, we carry out gradient descent in sequence space to optimize a loss function which simultaneously rewards recapitulation of the desired functional site and the ideality of the surrounding scaffold, supplemented with problem-specific interaction terms, to design candidate immunogens presenting epitopes recognized by neutralizing antibodies, receptor traps for escape-resistant viral inhibition, metalloproteins and enzymes, and target binding proteins with designed interfaces expanding around known binding motifs. In the second “missing information recovery” approach, we start from the desired functional site and jointly fill in the missing sequence and structure information needed to complete the protein in a single forward pass through an updated RoseTTAFold trained to recover sequence from structure in addition to structure from sequence. We show that the two approaches have considerable synergy, and AlphaFold2 structure prediction calculations suggest that the approaches can accurately generate proteins containing a very wide array of functional sites.


Related Papers

No related papers found

Powered by citation graph analysis