Fudan University
ORCID: 0000-0003-0485-0259Publishes on Computational Drug Discovery Methods, Protein Structure and Dynamics, Geomechanics and Mining Engineering. 266 papers and 18.1k citations.
Add your photo, update your bio, and get notified when your ranking changes.
We have screened the entire Protein Data Bank (Release No. 103, January 2003) and identified 5671 protein-ligand complexes out of 19 621 experimental structures. A systematic examination of the primary references of these entries has led to a collection of binding affinity data (K(d), K(i), and IC(50)) for a total of 1359 complexes. The outcomes of this project have been organized into a Web-accessible database named the PDBbind database.
We have developed the PDBbind database to provide a comprehensive collection of binding affinities for the protein-ligand complexes in the Protein Data Bank (PDB). This paper gives a full description of the latest version, i.e., version 2003, which is an update to our recently reported work. Out of 23 790 entries in the PDB release No.107 (January 2004), 5897 entries were identified as protein-ligand complexes that meet our definition. Experimentally determined binding affinities (K(d), K(i), and IC(50)) for 1622 of these were retrieved from the references associated with these complexes. A total of 900 complexes were selected to form a "refined set", which is of particular value as a standard data set for docking and scoring studies. All of the final data, including binding affinity data, reference citations, and processed structural files, have been incorporated into the PDBbind database accessible on-line at http:// www.pdbbind.org/.
We have developed a new method, i.e., XLOGP3, for logP computation. XLOGP3 predicts the logP value of a query compound by using the known logP value of a reference compound as a starting point. The difference in the logP values of the query compound and the reference compound is then estimated by an additive model. The additive model implemented in XLOGP3 uses a total of 87 atom/group types and two correction factors as descriptors. It is calibrated on a training set of 8199 organic compounds with reliable logP data through a multivariate linear regression analysis. For a given query compound, the compound showing the highest structural similarity in the training set will be selected as the reference compound. Structural similarity is quantified based on topological torsion descriptors. XLOGP3 has been tested along with its predecessor, i.e., XLOGP2, as well as several popular logP methods on two independent test sets: one contains 406 small-molecule drugs approved by the FDA and the other contains 219 oligopeptides. On both test sets, XLOGP3 produces more accurate predictions than most of the other methods with average unsigned errors of 0.24-0.51 units. Compared to conventional additive methods, XLOGP3 does not rely on an extensive classification of fragments and correction factors in order to improve accuracy. It is also able to utilize the ever-increasing experimentally measured logP data more effectively.
Eleven popular scoring functions have been tested on 100 protein-ligand complexes to evaluate their abilities to reproduce experimentally determined structures and binding affinities. They include four scoring functions implemented in the LigFit module in Cerius2 (LigScore, PLP, PMF, and LUDI), four scoring functions implemented in the CScore module in SYBYL (F-Score, G-Score, D-Score, and ChemScore), the scoring function implemented in the AutoDock program, and two stand-alone scoring functions (DrugScore and X-Score). These scoring functions are not tested in the context of a particular docking program. Instead, conformational sampling and scoring are separated into two consecutive steps. First, an exhaustive conformational sampling is performed by using the AutoDock program to generate an ensemble of docked conformations for each ligand molecule. This conformational ensemble is required to cover the entire conformational space as much as possible rather than to focus on a few energy minima. Then, each scoring function is applied to score this conformational ensemble to see if it can identify the experimentally observed conformation from all of the other decoys. Among all of the scoring functions under test, six of them, i.e., PLP, F-Score, LigScore, DrugScore, LUDI, and X-Score, yield success rates higher than the AutoDock scoring function. The success rates of these six scoring functions range from 66% to 76% if using root-mean-square deviation < or =2.0 A as the criterion. Combining any two or three of these six scoring functions into a consensus scoring scheme further improves the success rate to nearly 80% or even higher. However, when applied to reproduce the experimentally determined binding affinities of the 100 protein-ligand complexes, only X-Score, PLP, DrugScore, and G-Score are able to give correlation coefficients over 0.50. All of the 11 scoring functions are further inspected by their abilities to construct a descriptive, funnel-shaped energy surface for protein-ligand complexation. The results indicate that X-Score and DrugScore perform better than the other ones at this aspect.