Progress In Computational Toxicology Sean Ekins 1,2 1 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay Varina, NC27526, USA, NC. 2 Collaborative.

Slides:

Advertisements

Similar presentations

CONFIDENTIAL INFORMATION KnowTox, a Franco- Hungarian collaborative project relative to toxicity.

Advertisements

Analysis of High-Throughput Screening Data C371 Fall 2004.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Hotspot Hunter: a computational system for large-scale screening and selection of candidate immunological hotspots in pathogen proteomes G.L. Zhang, A.M.

Feature selection and transduction for prediction of molecular bioactivity for drug design Reporter: Yu Lun Kuo (D )

Why Are We Still Doing Industrial Age Drug Discovery For Neglected Diseases in The Information Age? Sean Ekins Collaborations In Chemistry, Fuquay Varina,

PharmaMiner: Geometric Mining of Pharmacophores 1.

Lipinski’s rule of five

GraphSig: Mining Significant Substructures in Compound Libraries 1.

Chemical Category Formation: Toxicology and REACH Dr Steven Enoch Liverpool John Moores University 14 th May 2009.

…ask more of your data 1 Bayesian Learning Build a model which estimates the likelihood that a given data sample is from a "good" subset of a larger set.

Cheminformatics II Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.

A Study on Feature Selection for Toxicity Prediction*

Rule Extraction From Trained Neural Networks Brian Hudson University of Portsmouth, UK.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Cloud Computing for Chemical Property Prediction Paul Watson School of Computing Science Newcastle University, UK Microsoft Cloud.

Active Learning Strategies for Compound Screening Megon Walker 1 and Simon Kasif 1,2 1 Bioinformatics Program, Boston University 2 Department of Biomedical.

© 2013 IBM Corporation Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments Presented by Shashank Mujumdar IBM Research,

Application and Efficacy of Random Forest Method for QSAR Analysis

Graduate Research Symposium 2014William G. Lowrie Dept. of Chemical and Biomolecular Engineering Evaluating the potential toxicity of chemical compounds.

Lecture 7: Computer aided drug design: Statistical approach. Lecture 7: Computer aided drug design: Statistical approach. Chen Yu Zong Department of Computational.

Combining Statistical and Physical Considerations in Deriving Targeted QSPRs Using Very Large Molecular Descriptor Databases Inga Paster and Mordechai.

Aniko T. Valko, Keymodule Ltd.

Abstract Recent repurposing project tendering calls by the National Center for Advancing Translational Sciences (US) and the Medical Research Council (UK)

Analysis of Molecular and Clinical Data at PolyomX Adrian Driga 1, Kathryn Graham 1, 2, Sambasivarao Damaraju 1, 2, Jennifer Listgarten 3, Russ Greiner.

Introduction In order for us to learn from the extensive prior literature we have collated information on molecules screened versus Mycobacterium tuberculosis.

1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.

David Kim Allergan Inc. SoCalBSI California State University, Los Angeles.

1 Innovative Science To Improve Public Health EDKB: Endocrine Disruptors Knowledge Base at the FDA Huixiao Hong, Ph.D. Center for Bioinformatics Division.

1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.

Dispensing Processes Impact Computational and Statistical Analyses Sean Ekins 1, Joe Olechno 2 Antony J. Williams 3 1 Collaborations in Chemistry, Fuquay.

The Broad Institute of MIT and Harvard Classification / Prediction.

Patterns of Event Causality Suggest More Effective Corrective Actions Abstract: The Occurrence Reporting and Processing System (ORPS) has used a consistent.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Dispensing Processes Profoundly Impact Biological, Computational and Statistical Analyses Sean Ekins 1, Joe Olechno 2 Antony J. Williams 3 1 Collaborations.

Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.

NIH Common Fund Library of Integrated Network- based Cellular Signatures LINCS Applicant Information Webinar for RFA-RM September 6, :00 –

Construction of cancer pathways for personalized medicine | Presented By Date Construction of cancer pathways for personalized medicine Predictive, Preventive.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.

QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.

CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.

Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.

Alessandro Pedretti MetaPies, an annotated database for metabolism analysis and prediction: results and future perspectives L’Aquila November 21, 2011.

Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.

December 1, Classification Analysis of HIV RNase H Bioassay Lianyi Han Computational Biology Branch NCBI/NLM/NIH Rocky ‘07.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

PharmaMiner: Geometric Mining of Pharmacophores 1.

Design of a Compound Screening Collection Gavin Harper Cheminformatics, Stevenage.

CZ3253: Computer Aided Drug design Lecture 6: QSAR part II Prof. Chen Yu Zong Tel: Room 07-24,

Use of Machine Learning in Chemoinformatics

CZ3253: Computer Aided Drug design Lecture 7: Drug Design Methods II: SVM Prof. Chen Yu Zong Tel:

Introduction to PubChem BioAssay

David Amar, Tom Hait, and Ron Shamir

Lipinski’s rule of five

The KNIME workflow for automated processing of PHYSPROP data

Constructing a Predictor to Identify Drug and Adverse Event Pairs

Development of computational tools for

JMP Discovery Summit 2016 Janet Alvarado

Hyunghoon Cho, Bonnie Berger, Jian Peng Cell Systems

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

ADME/Tox PredictionTox Prediction. The characterization of Absorption, Distribution, Metabolism, and Excretion (also known as ADME) and Toxicity are essential.

Introduction Feature Extraction Discussions Conclusions Results

CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,

Extra Tree Classifier-WS3 Bagging Classifier-WS3

Virtual Screening.

Aniko T. Valko, Keymodule Ltd.

Hyunghoon Cho, Bonnie Berger, Jian Peng Cell Systems

Presentation transcript:

Progress In Computational Toxicology Sean Ekins 1,2 1 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay Varina, NC27526, USA, NC. 2 Collaborative Drug Discovery, Burlingame, CA. Abstract Computational methods have been widely applied to toxicology across pharmaceutical, consumer product and environmental fields over the past decade. Progress in computational toxicology is now reviewed. A literature review was performed on computational models for hepatotoxicity (e.g. for drug-induced liver injury (DILI)), cardiotoxicity, renal toxicity and genotoxicity. In addition various publications have been highlighted that use machine learning methods. Several computational toxicology model datasets from past publications were used to compare Bayesian and Support Vector Machine (SVM) learning methods. The increasing amounts of data for defined toxicology endpoints have enabled machine learning models that have been increasingly used for predictions. It is shown that across many different models Bayesian and SVM perform similarly based on cross validation data. Considerable progress has been made in computational toxicology in a decade in both model development and availability of larger scale or 'big data' models. The future efforts in toxicology data generation will likely provide us with hundreds of thousands of compounds that are readily accessible for machine learning models. These models will cover relevant chemistry space for pharmaceutical, consumer product and environmental applications. Table 1. Comparison of SVM and Bayesian ADME/Tox classification models generated with the same molecular descriptors. NT not tested. Note DILI model used ECFC_6 fingerprint descriptors instead of FCFP_6. *For CDD Models only FCFP_6 descriptors were used; 1 SVM (regression) - 5 fold cross validation q 2 = 0.17; 2 SVM (regression) - 5 fold cross validation q 2 = ModelReference DS + R SVM 5 fold cross validation ROC DS Bayesian 5 fold cross validation ROC DS Bayesian Leave one out cross validation ROC DS RP single tree DS RP forest CDD Bayesian 3 fold cross validation DILI (N = 532) (S. Ekins et al., 2010) * PXR (N = 312) (S. Kortagereet al., 2009) HT 2B (N = 238) (Chekmarev, et al., 2008) HT 2B (N = 754) (Hajjo et al., 2010) hERG (N = 134) (Chekmarev, et al., 2008) hERG (N = 806) (Wang, et al., 2012) hERG (N = 305,616) (Du, et al., 2011) BBB (N = 1968) (Martins, et al., 2012) AMES (N = 6512) (Hansen, et al., 2009) Nephrotoxicity (N =104) (Lin & Will, 2012) Clearance iv (N = 512) 1 (Gombar & Hall, 2013) VDss iv (N = 556) 2 (Gombar & Hall, 2013) Methods Datasets: Molecule datasets were extracted from various publications (Table 1). Machine learning models: Laplacian-corrected Bayesian classifier models were developed using Discovery Studio 3.5 (Biovia, San Diego, CA). The models were generated using the following molecular descriptors: molecular function class fingerprints of maximum diameter 6 (FCFP_6), AlogP, molecular weight, number of rotatable bonds, number of rings, number of aromatic rings, number of hydrogen bond acceptors, number of hydrogen bond donors, and molecular fractional polar surface area which were all calculated from input sdf files. The resulting datasets were validated using leave-one-out cross-validation, 5 fold validation to generate the receiver operator curve area under the curve (ROC AUC). SVM, RP Forest and single tree models were built with the same molecular descriptors in Discovery Studio. For SVM models we calculated interpretable descriptors in Discovery Studio then used Pipeline Pilot to generate the FCFP_6 descriptors followed by integration with R. RP Forest and RP Single Tree models used the standard protocol in Discovery Studio. In the case of RP Forest models 10 trees were created with bagging. RP Single Trees had a minimum of 10 samples per node and a maximum tree depth of 20. In all cases, 5-fold cross validation (leave out 20% of the database) was used to calculate the ROC for the models generated. The recently released CDD Models ( functionality in CDD Vault was used to build a Bayesian model (with FCFP_6 fingerprints only) for several datasets and used 3 fold cross validation AUC ROC. a b Figure 1. Ames Bayesian model built with 6512 molecules (Hansen et al., 2009). A. Features important for Ames actives. B. Features important for Ames inactives. Figure 2. Ames Bayesian model built using CDD Models showing ROC for 3 fold cross validation. Note only FCFP_6 descriptors were used References Chekmarev, D. S., Kholodovych, V., Balakin, K. V., Ivanenkov, Y., Ekins, S., & Welsh, W. J. (2008). Chem Res Toxicol, 21, Du, F., Yu, H., Zou, B., Babcock, J., Long, S., & Li, M. (2011). Assay Drug Dev Technol, 9, Ekins, S. (2014). J Pharmacol Toxicol Meth, 69, Ekins, S., Williams, A. J., & Xu, J. J. (2010). Drug Metab Dispos, 38, Gombar, V. K., & Hall, S. D. (2013). J Chem Inf Model, 53, Hajjo, R., et al., J Med Chem, (21): p Hansen, K., Mika, S., Schroeter, T., Sutter, A., ter Laak, A., Steger-Hartmann, T., Heinrich, N., & Muller, K. R. (2009). J Chem Inf Model, 49, Kortagere, S., Chekmarev, D., Welsh, W. J., & Ekins, S. (2009). Pharm Res, 26, Lin, Z., & Will, Y. (2012). Toxicol Sci, 126, Martins, I. F., Teixeira, A. L., Pinheiro, L., & Falcao, A. O. (2012). J Chem Inf Model, 52, Wang, S., Li, Y., Wang, J., Chen, L., Zhang, L., Yu, H., & Hou, T. (2012). Mol Pharm, 9, Acknowledgments S.E. gratefully acknowledges Biovia (formerly Accelrys) for providing Discovery Studio and Dr. Katalin Nadassy for her assistance in using PipelinePilot. S.E. also acknowledges S.C. Johnson & Son, Inc. for discussions and support. NCAT, NIH funded the CDD Models development (9R44TR , Ekins PI) and CDD colleagues are acknowledged for assistance developing this software. Results and Discussion Several literature datasets were selected to build computational models for hepatotoxicity, cardiotoxicity, renal toxicity, genotoxicity and other ADME properties of interest using multiple machine learning methods. There is good agreement in the ROC values across different methods using cross validation (SVM perform slightly better Bayesian in 7 of 12 models) (Table 1). Can the models be combined or a consensus used for predictions? The Bayesian methods are interpretable by defining toxicophores in the fingerprint features (Fig 1). REACH and other new legislation are likely to require even more computational testing of compounds for which “there is insufficient information on the hazards that they pose to human health and the environment”. There needs to be a first tier prioritization to filter molecules of interest to pharmaceutical, consumer products and environmental researchers. Efforts to make these computational toxicology models accessible may aid in their utilization. Within a decade we have progressed from models like those for hERG with a handful of molecules to over 300,000 molecules. The challenge ensure such large scale models can be used to maximal effect and influence (Ekins 2014). A major hurdle is awareness of what models exist, where they reside and how to utilize them for decision making. Therefore inclusion in database such as the CDD Vault may be important (Fig 2). In conclusion, ‘big data’ models will promote computational toxicology and further focus in vitro and in vivo testing on verification of selected predictions. Future efforts to externally test the models in Table 1 is critical and I welcome interest in assessing them.