The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR ModelsAbstract This paper emphasizes the importance of rigorous validation as a crucial, integral component of Quantitative Structure Property Relationship (QSPR) model development. We consider some examples of published QSPR models, which in spite of their high fitted accuracy for the training sets and apparent mechanistic appeal, fail rigorous validation tests, and, thus, may lack practical utility as reliable screening tools. We present a set of simple guidelines for developing validated and predictive QSPR models. To this end, we discuss several validation strategies including (1) randomization of the modelled property, also called Y‐scrambling, (2) multiple leave‐many‐out cross‐validations, and (3) external validation using rational division of a dataset into training and test sets. We also highlight the need to establish the domain of model applicability in the chemical space to flag molecules for which predictions may be unreliable, and discuss some algorithms that can be used for this purpose. We advocate the broad use of these guidelines in the development of predictive QSPR models.
Principles of QSAR models validation: internal and externalPaola Gramatica|QSAR & Combinatorial Science|2007 Abstract The recent REACH Policy of the European Union has led to scientists and regulators to focus their attention on establishing general validation principles for QSAR models in the context of chemical regulation (previously known as the Setubal, nowadays, the OECD principles). This paper gives a brief analysis of some principles: unambiguous algorithm, Applicability Domain (AD), and statistical validation. Some concerns related to QSAR algorithm reproducibility and an example of a fast check of the applicability domain for MLR models are presented. Common myths and misconceptions related to popular techniques for verifying internal predictivity, particularly for MLR models (for instance cross‐validation, bootstrap), are commented on and compared with commonly used statistical techniques for external validation. The differences in the two validating approaches are highlighted, and evidence is presented that only models that have been validated externally, after their internal validation, can be considered reliable and applicable for both external prediction and regulatory purposes.
QSAR Modeling: Where Have You Been? Where Are You Going To?Artem Cherkasov, Eugene Muratov, Denis Fourches et al.|Journal of Medicinal Chemistry|2013 Quantitative structure-activity relationship modeling is one of the major computational tools employed in medicinal chemistry. However, throughout its entire history it has drawn both praise and criticism concerning its reliability, limitations, successes, and failures. In this paper, we discuss (i) the development and evolution of QSAR; (ii) the current trends, unsolved problems, and pressing challenges; and (iii) several novel and emerging applications of QSAR modeling. Throughout this discussion, we provide guidelines for QSAR development, validation, and application, which are summarized in best practices for building rigorously validated and externally predictive QSAR models. We hope that this Perspective will help communications between computational and experimental chemists toward collaborative development and use of QSAR models. We also believe that the guidelines presented here will help journal editors and reviewers apply more stringent scientific standards to manuscripts reporting new QSAR studies, as well as encourage the use of high quality, validated QSARs for regulatory decision making.
Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs.Lennart Eriksson, Joanna Jaworska, Andrew Worth et al.|Environmental Health Perspectives|2003 This article provides an overview of methods for reliability assessment of quantitative structure-activity relationship (QSAR) models in the context of regulatory acceptance of human health and environmental QSARs. Useful diagnostic tools and data analytical approaches are highlighted and exemplified. Particular emphasis is given to the question of how to define the applicability borders of a QSAR and how to estimate parameter and prediction uncertainty. The article ends with a discussion regarding QSAR acceptability criteria. This discussion contains a list of recommended acceptability criteria, and we give reference values for important QSAR performance statistics. Finally, we emphasize that rigorous and independent validation of QSARs is an essential step toward their regulatory acceptance and implementation.
Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity RelationshipsTatiana I. Netzeva, Andrew Worth, Tom Aldenberg et al.|Alternatives to Laboratory Animals|2005 This is the 52nd report of a series of workshops organised by the European Centre for the Validation of Alternative Methods (ECVAM). The main objective of ECVAM, as defined in 1993 by its Scientific Advisory Committee, is to promote the scientific and regulatory acceptance of alternative methods which are of importance to the biosciences, and that reduce, refine or replace the use of laboratory animals. The ECVAM workshop on the quantitative structure-activity relationship applicability domain was held at ECVAM on 29 September–1 October 2004, under the chairmanship of Andrew Worth. The workshop was attended by experts from academia, industry, international organisations and regulatory authorities. The aim of the workshop was to review the state of the art of methods for identifying the domain of applicability of structure-activity relationships (SARs) and quantitative structure-activity relationships (QSARs), collectively referred to as (Q)SARs. The report is intended to provide a source of input to the development of an OECD Guidance Document on (Q)SAR Validation. The report also makes recommendations for further research needed to understand and apply the concept of the (Q)SAR applicability domain (AD).