Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set

Iurii Sushko(Helmholtz Zentrum München), Sergii Novotarskyi(Helmholtz Zentrum München), Robert Körner(Helmholtz Zentrum München), Anil Kumar Pandey(Helmholtz Zentrum München), Artem Cherkasov(Helmholtz Zentrum München), Jiazhong Li(Helmholtz Zentrum München), Paola Gramatica(Helmholtz Zentrum München), Katja Hansen(Helmholtz Zentrum München), Timon Schroeter(Helmholtz Zentrum München), Klaus‐Robert Müller(Helmholtz Zentrum München), Lili Xi(Helmholtz Zentrum München), Huanxiang Liu(Helmholtz Zentrum München), Xiaojun Yao(Helmholtz Zentrum München), Tomas Öberg(Helmholtz Zentrum München), Farhad Hormozdiari(Helmholtz Zentrum München), Phuong Dao(Helmholtz Zentrum München), Cenk Sahinalp(Helmholtz Zentrum München), Roberto Todeschini(Helmholtz Zentrum München), Pavel Polishchuk(Helmholtz Zentrum München), Anatoliy Artemenko(Helmholtz Zentrum München), V. Е. Kuz’min(Helmholtz Zentrum München), Todd M. Martin(Helmholtz Zentrum München), Douglas M. Young(Helmholtz Zentrum München), Denis Fourches(Helmholtz Zentrum München), Eugene Muratov(Helmholtz Zentrum München), Alexander Tropsha(Helmholtz Zentrum München), Igor I. Baskin(Helmholtz Zentrum München), Dragos Horvath(Helmholtz Zentrum München), Gilles Marcou(Helmholtz Zentrum München), Christophe Müller(Helmholtz Zentrum München), Alexander Varnek(Helmholtz Zentrum München), В. В. Прокопенко(Helmholtz Zentrum München), Igor V. Tetko(Helmholtz Zentrum München)
Journal of Chemical Information and Modeling
October 29, 2010
Cited by 256Open Access
Full Text

Abstract

The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of "distance to model" (DM) is defined as a metric of similarity between the training and test set compounds that have been subjected to QSAR/QSPR modeling. In our previous work, we demonstrated the utility and optimal performance of DM metrics that have been based on the standard deviation within an ensemble of QSAR models. The current study applies such analysis to 30 QSAR models for the Ames mutagenicity data set that were previously reported within the 2009 QSAR challenge. We demonstrate that the DMs based on an ensemble (consensus) model provide systematically better performance than other DMs. The presented approach identifies 30-60% of compounds having an accuracy of prediction similar to the interlaboratory accuracy of the Ames test, which is estimated to be 90%. Thus, the in silico predictions can be used to halve the cost of experimental measurements by providing a similar prediction accuracy. The developed model has been made publicly available at http://ochem.eu/models/1 .


Related Papers

No related papers found

Powered by citation graph analysis