Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs.Lennart Eriksson, Joanna Jaworska, Andrew Worth et al.|Environmental Health Perspectives|2003 This article provides an overview of methods for reliability assessment of quantitative structure-activity relationship (QSAR) models in the context of regulatory acceptance of human health and environmental QSARs. Useful diagnostic tools and data analytical approaches are highlighted and exemplified. Particular emphasis is given to the question of how to define the applicability borders of a QSAR and how to estimate parameter and prediction uncertainty. The article ends with a discussion regarding QSAR acceptability criteria. This discussion contains a list of recommended acceptability criteria, and we give reference values for important QSAR performance statistics. Finally, we emphasize that rigorous and independent validation of QSARs is an essential step toward their regulatory acceptance and implementation.
Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity RelationshipsTatiana I. Netzeva, Andrew Worth, Tom Aldenberg et al.|Alternatives to Laboratory Animals|2005 This is the 52nd report of a series of workshops organised by the European Centre for the Validation of Alternative Methods (ECVAM). The main objective of ECVAM, as defined in 1993 by its Scientific Advisory Committee, is to promote the scientific and regulatory acceptance of alternative methods which are of importance to the biosciences, and that reduce, refine or replace the use of laboratory animals. The ECVAM workshop on the quantitative structure-activity relationship applicability domain was held at ECVAM on 29 September–1 October 2004, under the chairmanship of Andrew Worth. The workshop was attended by experts from academia, industry, international organisations and regulatory authorities. The aim of the workshop was to review the state of the art of methods for identifying the domain of applicability of structure-activity relationships (SARs) and quantitative structure-activity relationships (QSARs), collectively referred to as (Q)SARs. The report is intended to provide a source of input to the development of an OECD Guidance Document on (Q)SAR Validation. The report also makes recommendations for further research needed to understand and apply the concept of the (Q)SAR applicability domain (AD).
QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A ReviewAs the use of Quantitative Structure Activity Relationship (QSAR) models for chemical management increases, the reliability of the predictions from such models is a matter of growing concern. The OECD QSAR Validation Principles recommend that a model should be used within its applicability domain (AD). The Setubal Workshop report provided conceptual guidance on defining a (Q)SAR AD, but it is difficult to use directly. The practical application of the AD concept requires an operational definition that permits the design of an automatic (computerised), quantitative procedure to determine a models AD. An attempt is made to address this need, and methods and criteria for estimating AD through training set interpolation in descriptor space are reviewed. It is proposed that response space should be included in the training set representation. Thus, training set chemicals are points in n-dimensional descriptor space and m-dimensional model response space. Four major approaches for estimating interpolation regions in a multivariate space are reviewed and compared: range, distance, geometrical, and probability density distribution.
Uncertainty of the Hazardous Concentration and Fraction Affected for Normal Species Sensitivity DistributionsTom Aldenberg, Joanna Jaworska|Ecotoxicology and Environmental Safety|2000 Approaches to Measure Chemical Similarity – a ReviewNina Nikolova, Joanna Jaworska|QSAR & Combinatorial Science|2003 Abstract Although the concept of similarity is a convenient for humans, a formal definition of similarity between chemical compounds is needed to enable automatic decision‐making. The objective of similarity measures in toxicology and drug design is to allow assessment of chemical activities. The ideal similarity measure should be relevant to the activity of interest. The relevance could be established by exploiting the knowledge about fundamental chemical and biological processes responsible for the activity. Unfortunately, this knowledge is rarely available and therefore different approximations have been developed based on similarity between structures or descriptor values. Various methods are reviewed, ranging from two‐dimensional, three‐dimensional and field approaches to recent methods based on “Atoms in Molecules” theory. All these methods attempt to describe chemical compounds by a set of numerical values and define some means for comparison between them. The review provides analysis of potential pitfalls of this methodology – loss of information in the representations of molecular structures – the relevance of a particular representation and chosen similarity measure to the activity. A brief review of known methods for descriptor selection is also provided. The popular “neighborhood behavior” principle is criticized, since proximity with respect to descriptors does not necessarily mean proximity with respect to activity. Structural similarity should also be used with care, as it does not always imply similar activity, as shown by examples. We remind that similarity measures and classification techniques based on distances rely on certain data distribution assumptions. If these assumptions are not satisfied for a given dataset, the results could be misleading. A discussion on similarity in descriptor space in the context of applicability domain assessment of QSAR models is also provided. Finally, it is shown that descriptor based similarity analysis is prone to errors if the relationship between the activity and the descriptors has not been previously established. A justification for the usage of a particular similarity measure should be provided for every specific activity by expert knowledge or derived by data modeling techniques.