K

Karl Leswing

Schrodinger (United States)

Publishes on Machine Learning in Materials Science, Computational Drug Discovery Methods, Protein Structure and Dynamics. 51 papers and 4.3k citations.

51Publications
4.3kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

MoleculeNet: a benchmark for molecular machine learning
Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg et al.|Chemical Science|2017
Cited by 2.9kOpen Access

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

Efficient Exploration of Chemical Space with Docking and Deep Learning
Yang Ying, Kun Yao, Matthew P. Repasky et al.|Journal of Chemical Theory and Computation|2021
Cited by 395

With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today's screening libraries are larger and more diverse, enabling the discovery of more-potent hit compounds and unlocking new areas of chemical space, represented by new core scaffolds. Applying physics-based in silico screening methods in an exhaustive manner, where every molecule in the library must be enumerated and evaluated independently, is increasingly cost-prohibitive. Here, we introduce a protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking. We leverage a novel selection protocol that strikes a balance between two objectives: (1) identifying the best scoring compounds and (2) exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach. Together with automated redocking of the top compounds, this method captures almost all the high scoring scaffolds in the library found by exhaustive docking. This protocol is applied to our recent virtual screening campaigns against the D4 and AMPC targets that produced dozens of highly potent, novel inhibitors, and a blind test against the MT1 target. Our protocol recovers more than 80% of the experimentally confirmed hits with a 14-fold reduction in compute cost, and more than 90% of the hit scaffolds in the top 5% of model predictions, preserving the diversity of the experimentally confirmed hit compounds.

Epik: p <i>K</i> <sub>a</sub> and Protonation State Prediction through Machine Learning
Ryne C. Johnston, Kun Yao, Zachary Kaplan et al.|Journal of Chemical Theory and Computation|2023
Cited by 300

Epik version 7 is a software program that uses machine learning for predicting the pKa values and protonation state distribution of complex, druglike molecules. Using an ensemble of atomic graph convolutional neural networks (GCNNs) trained on over 42,000 pKa values across broad chemical space from both experimental and computed origins, the model predicts pKa values with 0.42 and 0.72 pKa unit median absolute and root mean square errors, respectively, across seven test sets. Epik version 7 also generates protonation states and recovers 95% of the most populated protonation states compared to previous versions. Requiring on average only 47 ms per ligand, Epik version 7 is rapid and accurate enough to evaluate protonation states for crucial molecules and prepare ultra-large libraries of compounds to explore vast regions of chemical space. The simplicity and time required for the training allow for the generation of highly accurate models customized to a program’s specific chemistry.

Reaction-Based Enumeration, Active Learning, and Free Energy Calculations To Rapidly Explore Synthetically Tractable Chemical Space and Optimize Potency of Cyclin-Dependent Kinase 2 Inhibitors
Kyle D. Konze, Pieter H. Bos, Markus K. Dahlgren et al.|Journal of Chemical Information and Modeling|2019
Cited by 140Open Access

The hit-to-lead and lead optimization processes usually involve the design, synthesis, and profiling of thousands of analogs prior to clinical candidate nomination. A hit finding campaign may begin with a virtual screen that explores millions of compounds, if not more. However, this scale of computational profiling is not frequently performed in the hit-to-lead or lead optimization phases of drug discovery. This is likely due to the lack of appropriate computational tools to generate synthetically tractable lead-like compounds in silico, and a lack of computational methods to accurately profile compounds prospectively on a large scale. Recent advances in computational power and methods provide the ability to profile much larger libraries of ligands than previously possible. Herein, we report a new computational technique, referred to as “PathFinder”, that uses retrosynthetic analysis followed by combinatorial synthesis to generate novel compounds in synthetically accessible chemical space. In this work, the integration of PathFinder-driven compound generation, cloud-based FEP simulations, and active learning are used to rapidly optimize R-groups, and generate new cores for inhibitors of cyclin-dependent kinase 2 (CDK2). Using this approach, we explored >300 000 ideas, performed >5000 FEP simulations, and identified >100 ligands with a predicted IC50 < 100 nM, including four unique cores. To our knowledge, this is the largest set of FEP calculations disclosed in the literature to date. The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns.

High-Dimensional Neural Network Potential for Liquid Electrolyte Simulations
Steven Dajnowicz, Garvit Agarwal, James Stevenson et al.|The Journal of Physical Chemistry B|2022
Cited by 88

Liquid electrolytes are one of the most important components of Li-ion batteries, which are a critical technology of the modern world. However, we still lack the computational tools required to accurately calculate key properties of these materials (viscosity and ionic diffusivity) from first principles necessary to support improved designs. In this work, we report a machine learning-based force field for liquid electrolyte simulations, which bridges the gap between the accuracy of range-separated hybrid density functional theory and the efficiency of classical force fields. Predictions of material properties made with this force field are quantitatively accurate compared to experimental data. Our model uses the QRNN deep neural network architecture, which includes both long-range interactions and global charge equilibration. The training data set is composed solely of non-periodic density functional theory (DFT), allowing the practical use of an accurate theory (here, ωB97X-D3BJ/def2-TZVPD), which would be prohibitively expensive for generating large data sets with periodic DFT. In this report, we focus on seven common carbonates and LiPF6, but this methodology has very few assumptions and can be readily applied to any liquid electrolyte system. This provides a promising path forward for large-scale atomistic modeling of many important battery chemistries.