QSAR Modeling: Where Have You Been? Where Are You Going To?Artem Cherkasov, Eugene Muratov, Denis Fourches et al.|Journal of Medicinal Chemistry|2013 Quantitative structure-activity relationship modeling is one of the major computational tools employed in medicinal chemistry. However, throughout its entire history it has drawn both praise and criticism concerning its reliability, limitations, successes, and failures. In this paper, we discuss (i) the development and evolution of QSAR; (ii) the current trends, unsolved problems, and pressing challenges; and (iii) several novel and emerging applications of QSAR modeling. Throughout this discussion, we provide guidelines for QSAR development, validation, and application, which are summarized in best practices for building rigorously validated and externally predictive QSAR models. We hope that this Perspective will help communications between computational and experimental chemists toward collaborative development and use of QSAR models. We also believe that the guidelines presented here will help journal editors and reviewers apply more stringent scientific standards to manuscripts reporting new QSAR studies, as well as encourage the use of high quality, validated QSARs for regulatory decision making.
QSAR without bordersPrediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling ResearchDenis Fourches, Eugene Muratov, Alexander Tropsha|Journal of Chemical Information and Modeling|2010 ADVERTISEMENT RETURN TO ISSUEPerspectiveNEXTTrust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling ResearchDenis Fourches†, Eugene Muratov†‡, and Alexander Tropsha*†View Author Information Laboratory for Molecular Modeling, Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, and Laboratory of Theoretical Chemistry, Department of Molecular Structure, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080, Ukraine* To whom correspondence should be addressed. E-mail: [email protected]†University of North Carolina at Chapel Hill.‡A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine.Cite this: J. Chem. Inf. Model. 2010, 50, 7, 1189–1204Publication Date (Web):June 24, 2010Publication History Received5 May 2010Published online24 June 2010Published inissue 26 July 2010https://pubs.acs.org/doi/10.1021/ci100176xhttps://doi.org/10.1021/ci100176xreview-articleACS PublicationsCopyright © 2010 American Chemical SocietyRequest reuse permissionsArticle Views6405Altmetric-Citations559LEARN ABOUT THESE METRICSArticle Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated. Share Add toView InAdd Full Text with ReferenceAdd Description ExportRISCitationCitation and abstractCitation and referencesMore Options Share onFacebookTwitterWechatLinked InRedditEmail Other access optionsGet e-Alertsclose SUBJECTS:Bioinformatics and computational biology,Chemical structure,Molecular structure,Software,Structure activity relationship Get e-Alerts
Chemical Basis of Interactions Between Engineered Nanoparticles and Biological SystemsADVERTISEMENT RETURN TO ISSUEPREVReviewChemical Basis of Interactions Between Engineered Nanoparticles and Biological SystemsQingxin Mu†, Guibin Jiang§, Lingxin Chen∥, Hongyu Zhou†⊥, Denis Fourches, Alexander Tropsha#, and Bing Yan*†View Author Information† School of Chemistry and Chemical Engineering, Shandong University, Jinan 250100, China§ State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China∥ Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai 264003, China⊥ Department of Surgery, Emory University School of Medicine, Atlanta, Georgia 30322, United States# Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States*Phone: +86-531-88380019. Fax: +86-531-88380029. E-mail: [email protected]Cite this: Chem. Rev. 2014, 114, 15, 7740–7781Publication Date (Web):June 13, 2014Publication History Received29 May 2013Published online13 June 2014Published inissue 13 August 2014https://pubs.acs.org/doi/10.1021/cr400295ahttps://doi.org/10.1021/cr400295areview-articleACS PublicationsCopyright © 2014 American Chemical SocietyRequest reuse permissionsArticle Views12613Altmetric-Citations466LEARN ABOUT THESE METRICSArticle Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated. Share Add toView InAdd Full Text with ReferenceAdd Description ExportRISCitationCitation and abstractCitation and referencesMore Options Share onFacebookTwitterWechatLinked InRedditEmail Other access optionsGet e-Alertsclose SUBJECTS:Carbon nanotubes,Genetics,Metal oxide nanoparticles,Molecules,Nanoparticles Get e-Alerts
Critical Assessment of QSAR Models of Environmental Toxicity against <i>Tetrahymena pyriformis:</i> Focusing on Applicability Domain and Overfitting by Variable SelectionIgor V. Tetko, Iurii Sushko, Anil Kumar Pandey et al.|Journal of Chemical Information and Modeling|2008 The estimation of the accuracy of predictions is a critical problem in QSAR modeling. The "distance to model" can be defined as a metric that defines the similarity between the training set molecules and the test set compound for the given property in the context of a specific model. It could be expressed in many different ways, e.g., using Tanimoto coefficient, leverage, correlation in space of models, etc. In this paper we have used mixtures of Gaussian distributions as well as statistical tests to evaluate six types of distances to models with respect to their ability to discriminate compounds with small and large prediction errors. The analysis was performed for twelve QSAR models of aqueous toxicity against T. pyriformis obtained with different machine-learning methods and various types of descriptors. The distances to model based on standard deviation of predicted toxicity calculated from the ensemble of models afforded the best results. This distance also successfully discriminated molecules with low and large prediction errors for a mechanism-based model developed using log P and the Maximum Acceptor Superdelocalizability descriptors. Thus, the distance to model metric could also be used to augment mechanistic QSAR models by estimating their prediction errors. Moreover, the accuracy of prediction is mainly determined by the training set data distribution in the chemistry and activity spaces but not by QSAR approaches used to develop the models. We have shown that incorrect validation of a model may result in the wrong estimation of its performance and suggested how this problem could be circumvented. The toxicity of 3182 and 48774 molecules from the EPA High Production Volume (HPV) Challenge Program and EINECS (European chemical Substances Information System), respectively, was predicted, and the accuracy of prediction was estimated. The developed models are available online at http://www.qspr.org site.