P

Patricia C. Babbitt

QB3

ORCID: 0000-0003-0375-9015

Publishes on Enzyme Structure and Function, Protein Structure and Dynamics, Microbial Metabolic Engineering and Bioproduction. 185 papers and 15.7k citations.

185Publications
15.7kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

InterPro in 2017—beyond protein family and domain annotations
ROBERT FINN, Teresa K. Attwood, Patricia C. Babbitt et al.|Nucleic Acids Research|2016
Cited by 1.6kOpen Access

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.

InterPro in 2019: improving coverage, classification and access to protein sequence annotations
Alex Mitchell, Teresa K. Attwood, Patricia C. Babbitt et al.|Nucleic Acids Research|2018
Cited by 1.5kOpen Access

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.

A large-scale evaluation of computational protein function prediction
Predrag Radivojac, Wyatt T. Clark, Tal Oron et al.|Nature Methods|2013
Cited by 1.1kOpen Access

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.

Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies
Alexandra M. Schnoes, Shoshana Brown, Igor Dodevski et al.|PLoS Computational Biology|2009
Cited by 711Open Access

Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families); the two other protein sequence databases (GenBank NR and TrEMBL) and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%-63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with "overprediction" of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation.

Divergent Evolution of Enzymatic Function: Mechanistically Diverse Superfamilies and Functionally Distinct Suprafamilies
J.A. Gerlt, Patricia C. Babbitt|Annual Review of Biochemistry|2001
Cited by 544

The protein sequence and structure databases are now sufficiently representative that strategies nature uses to evolve new catalytic functions can be identified. Groups of divergently related enzymes whose members catalyze different reactions but share a common partial reaction, intermediate, or transition state (mechanistically diverse superfamilies) have been discovered, including the enolase, amidohydrolase, thiyl radical, crotonase, vicinal-oxygen-chelate, and Fe-dependent oxidase superfamilies. Other groups of divergently related enzymes whose members catalyze different overall reactions that do not share a common mechanistic strategy (functionally distinct suprafamilies) have also been identified: (a) functionally distinct suprafamilies whose members catalyze successive transformations in the tryptophan and histidine biosynthetic pathways and (b) functionally distinct suprafamilies whose members catalyze different reactions in different metabolic pathways. An understanding of the structural bases for the catalytic diversity observed in super- and suprafamilies may provide the basis for discovering the functions of proteins and enzymes in new genomes as well as provide guidance for in vitro evolution/engineering of new enzymes.