An Integrated Machine Learning Framework for Novel Small Molecule Drug Design
Dr. Jonathan E. Allen
Informatics Thrust Leader, Biosecurity Center at ATOM Consortium and Lawrence Livermore National Laboratory
The drug discovery process is costly, slow, and failure prone. It takes an average of 5.5 years to get to the clinical testing stage, and in this time millions of molecules are tested, thousands are made, and most fail. The ATOM Consortium (atomscience.org), comprised of LLNL, GSK, Frederick National Lab, and UCSF, is working to increase efficiencies in the drug discovery process through improved integration of machine learning earlier in the drug design and discovery process by evaluating multiple properties needed to make a viable drug. A combination of safety, pharmacokinetic and efficacy properties are considered simultaneously in the early drug design phase with an aim to ultimately show that these molecules will have better success rates with subsequent pre-clinical and clinical testing.The purpose of this webinar will be to introduce key components of the ATOM computational framework, highlight ongoing challenges and opportunities for improvement. The presentation will begin with a description of AMPL, the open source framework developed to build machine learning models that generate key safety and pharmacokinetics parameters, used for molecule evaluation and as input to anticipated Quantitative System Pharmacology and Toxicology models. The end-to-end pipeline handles data curation, feature extraction, model building, prediction generation, and data visualization. Next, we’ll describe how the best-performing models are integrated into an active learning loop (with code in the process of being open sourced) to guide the search for de novo compounds, with plans to integrate an in-house PBPK model to predict in-vivo behavior. The active learning loop includes a computational search through chemical space for candidate small molecules with opportunities for proposed molecules to be evaluated experimentally for model validation and re-training. Discussion of the active learning pipeline will include an examination of the utility of machine learning model uncertainty estimates needed to guide active learning and challenges in designing and bounding the chemical search space. We will conclude with an examination of an early test of one round of the active learning loop applied to the design of a selective kinase inhibitor.This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.