P

Philip A. Romero

Duke University

ORCID: 0000-0002-2586-7263

Publishes on RNA and protein synthesis mechanisms, Protein Structure and Dynamics, Innovative Microfluidic and Catalytic Techniques Innovation. 76 papers and 4k citations.

76Publications
4kTotal Citations
#7in Protein Engineering

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Navigating the protein fitness landscape with Gaussian processes
Philip A. Romero, Andreas Krause, Frances H. Arnold|Proceedings of the National Academy of Sciences|2012
Cited by 376Open Access

Knowing how protein sequence maps to function (the "fitness landscape") is critical for understanding protein evolution as well as for engineering proteins with new and useful properties. We demonstrate that the protein fitness landscape can be inferred from experimental data, using Gaussian processes, a Bayesian learning technique. Gaussian process landscapes can model various protein sequence properties, including functional status, thermostability, enzyme activity, and ligand binding affinity. Trained on experimental data, these models achieve unrivaled quantitative accuracy. Furthermore, the explicit representation of model uncertainty allows for efficient searches through the vast space of possible sequences. We develop and test two protein sequence design algorithms motivated by Bayesian decision theory. The first one identifies small sets of sequences that are informative about the landscape; the second one identifies optimized sequences by iteratively improving the Gaussian process model in regions of the landscape that are predicted to be optimized. We demonstrate the ability of Gaussian processes to guide the search through protein sequence space by designing, constructing, and testing chimeric cytochrome P450s. These algorithms allowed us to engineer active P450 enzymes that are more thermostable than any previously made by chimeragenesis, rational design, or directed evolution.

Dissecting enzyme function with microfluidic-based deep mutational scanning
Philip A. Romero, Tuan M. Tran, Adam R. Abate|Proceedings of the National Academy of Sciences|2015
Cited by 247Open Access

Natural enzymes are incredibly proficient catalysts, but engineering them to have new or improved functions is challenging due to the complexity of how an enzyme's sequence relates to its biochemical properties. Here, we present an ultrahigh-throughput method for mapping enzyme sequence-function relationships that combines droplet microfluidic screening with next-generation DNA sequencing. We apply our method to map the activity of millions of glycosidase sequence variants. Microfluidic-based deep mutational scanning provides a comprehensive and unbiased view of the enzyme function landscape. The mapping displays expected patterns of mutational tolerance and a strong correspondence to sequence variation within the enzyme family, but also reveals previously unreported sites that are crucial for glycosidase function. We modified the screening protocol to include a high-temperature incubation step, and the resulting thermotolerance landscape allowed the discovery of mutations that enhance enzyme thermostability. Droplet microfluidics provides a general platform for enzyme screening that, when combined with DNA-sequencing technologies, enables high-throughput mapping of enzyme sequence space.

Neural networks to learn protein sequence–function relationships from deep mutational scanning data
Sam Gelman, Sarah A. Fahlberg, Pete Heinzelman et al.|Proceedings of the National Academy of Sciences|2021
Cited by 177Open Access

The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein's behavior and properties. We present a supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network's internal representation affects its ability to learn the sequence-function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks' ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models' ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.

Similar Researchers

Coming soon — researchers in similar fields and career stages