Protein Data Bank: the single global archive for 3D macromolecular structure dataThe Protein Data Bank (PDB) is the single global archive of experimentally determined three-dimensional (3D) structure data of biological macromolecules. Since 2003, the PDB has been managed by the Worldwide Protein Data Bank (wwPDB; wwpdb.org), an international consortium that collaboratively oversees deposition, validation, biocuration, and open access dissemination of 3D macromolecular structure data. The PDB Core Archive houses 3D atomic coordinates of more than 144 000 structural models of proteins, DNA/RNA, and their complexes with metals and small molecules and related experimental data and metadata. Structure and experimental data/metadata are also stored in the PDB Core Archive using the readily extensible wwPDB PDBx/mmCIF master data format, which will continue to evolve as data/metadata from new experimental techniques and structure determination methods are incorporated by the wwPDB. Impacts of the recently developed universal wwPDB OneDep deposition/validation/biocuration system and various methods-specific wwPDB Validation Task Forces on improving the quality of structures and data housed in the PDB Core Archive are described together with current challenges and future plans.
PDBe: improved findability of macromolecular structure data in the PDBThe Protein Data Bank in Europe (PDBe), a founding member of the Worldwide Protein Data Bank (wwPDB), actively participates in the deposition, curation, validation, archiving and dissemination of macromolecular structure data. PDBe supports diverse research communities in their use of macromolecular structures by enriching the PDB data and by providing advanced tools and services for effective data access, visualization and analysis. This paper details the enrichment of data at PDBe, including mapping of RNA structures to Rfam, and identification of molecules that act as cofactors. PDBe has developed an advanced search facility with ∼100 data categories and sequence searches. New features have been included in the LiteMol viewer at PDBe, with updated visualization of carbohydrates and nucleic acids. Small molecules are now mapped more extensively to external databases and their visual representation has been enhanced. These advances help users to more easily find and interpret macromolecular structure data in order to solve scientific problems.
PDBe: improved accessibility of macromolecular structure data from PDB and EMDBThe Protein Data Bank in Europe (http://pdbe.org) accepts and annotates depositions of macromolecular structure data in the PDB and EMDB archives and enriches, integrates and disseminates structural information in a variety of ways. The PDBe website has been redesigned based on an analysis of user requirements, and now offers intuitive access to improved and value-added macromolecular structure information. Unique value-added information includes lists of reviews and research articles that cite or mention PDB entries as well as access to figures and legends from full-text open-access publications that describe PDB entries. A powerful new query system not only shows all the PDB entries that match a given query, but also shows the 'best structures' for a given macromolecule, ligand complex or sequence family using data-quality information from the wwPDB validation reports. A PDBe RESTful API has been developed to provide unified access to macromolecular structure data available in the PDB and EMDB archives as well as value-added annotations, e.g. regarding structure quality and up-to-date cross-reference information from the SIFTS resource. Taken together, these new developments facilitate unified access to macromolecular structure data in an intuitive way for non-expert users and support expert users in analysing macromolecular structure data.
Modeling protein‐protein, protein‐peptide, and protein‐oligosaccharide complexes: CAPRI 7th editionMarc F. Lensink, Nurul Nadzirin, Sameer Velankar et al.|Proteins Structure Function and Bioinformatics|2019 We present the seventh report on the performance of methods for predicting the atomic resolution structures of protein complexes offered as targets to the community-wide initiative on the Critical Assessment of Predicted Interactions. Performance was evaluated on the basis of 36 114 models of protein complexes submitted by 57 groups-including 13 automatic servers-in prediction rounds held during the years 2016 to 2019 for eight protein-protein, three protein-peptide, and five protein-oligosaccharide targets with different length ligands. Six of the protein-protein targets represented challenging hetero-complexes, due to factors such as availability of distantly related templates for the individual subunits, or for the full complex, inter-domain flexibility, conformational adjustments at the binding region, or the multi-component nature of the complex. The main challenge for the protein-peptide and protein-oligosaccharide complexes was to accurately model the ligand conformation and its interactions at the interface. Encouragingly, models of acceptable quality, or better, were obtained for a total of six protein-protein complexes, which included four of the challenging hetero-complexes and a homo-decamer. But fewer of these targets were predicted with medium or higher accuracy. High accuracy models were obtained for two of the three protein-peptide targets, and for one of the protein-oligosaccharide targets. The remaining protein-sugar targets were predicted with medium accuracy. Our analysis indicates that progress in predicting increasingly challenging and diverse types of targets is due to closer integration of template-based modeling techniques with docking, scoring, and model refinement procedures, and to significant incremental improvements in the underlying methodologies.
Blind prediction of homo‐ and hetero‐protein complexes: The CASP13‐CAPRI experimentMarc F. Lensink, Guillaume Brysbaert, Nurul Nadzirin et al.|Proteins Structure Function and Bioinformatics|2019 We present the results for CAPRI Round 46, the third joint CASP-CAPRI protein assembly prediction challenge. The Round comprised a total of 20 targets including 14 homo-oligomers and 6 heterocomplexes. Eight of the homo-oligomer targets and one heterodimer comprised proteins that could be readily modeled using templates from the Protein Data Bank, often available for the full assembly. The remaining 11 targets comprised 5 homodimers, 3 heterodimers, and two higher-order assemblies. These were more difficult to model, as their prediction mainly involved "ab-initio" docking of subunit models derived from distantly related templates. A total of ~30 CAPRI groups, including 9 automatic servers, submitted on average ~2000 models per target. About 17 groups participated in the CAPRI scoring rounds, offered for most targets, submitting ~170 models per target. The prediction performance, measured by the fraction of models of acceptable quality or higher submitted across all predictors groups, was very good to excellent for the nine easy targets. Poorer performance was achieved by predictors for the 11 difficult targets, with medium and high quality models submitted for only 3 of these targets. A similar performance "gap" was displayed by scorer groups, highlighting yet again the unmet challenge of modeling the conformational changes of the protein components that occur upon binding or that must be accounted for in template-based modeling. Our analysis also indicates that residues in binding interfaces were less well predicted in this set of targets than in previous Rounds, providing useful insights for directions of future improvements.