Graduate Research Symposium 2014William G. Lowrie Dept. of Chemical and Biomolecular Engineering Evaluating the potential toxicity of chemical compounds is an important step in the development of all new products these days. Current methods for assessing toxicity largely rely on experimental techniques that are time consuming and resource intensive. Predictive computational models (referred to as “in silico” models) need to be developed to prioritize experimental tests. Goal: To develop a novel in silico tool for classifying compounds of unknown toxicity using annotated linear structural fragments. STRUCTURE-BASED IN SILICO MODELING OF CHEMICALLY-INDUCED TOXICITY Mehta, Darshan 1, Rathman, James F. 1,2, Yang, Chihae 2 1 Department of Chemical and Biomolecular Engineering, The Ohio State University 2 Altamira LLC and FDA CFSAN MD A. INTRODUCTION B. LINEAR FRAGMENTS AND CHEMICAL ANNOTATIONS C. AMES MUTAGENICITY DATASET A unique method of generating structural descriptors is proposed. These descriptors are linear subgraphs (fragments) that are extracted dynamically from a database of chemical structures. Generation of linear fragments using different annotation schemes D. CLASSIFICATION STRATEGY Go through all compounds in training set (POS and NEG separately) and count the connections between different possible states. Compute corresponding probabilities and calculate the likelihood of fragments in test compounds. The likelihood of fragment [‘C30’,‘C21’,‘C30’,‘C22’] is calculated as: Likelihood = p(C30-C21) * p(C21-C30) * p(C30-C22) = p(C30-C21) 2 * p(C30-C22) Log-likelihood = 2 * log(p(C30-C21)) + log(p(C30-C22)) Calculate difference in log-likelihood under POS and NEG models. Diff = (log-likelihood) POS – (log-likelihood) NEG If Diff > 0, classify compound as Ames POS If Diff < 0, classify compound as Ames NEG Performance parameters: Sensitivity = Pr(Y pred = 1 | Y = 1) (True positives) Specificity = Pr(Y pred = 0 | Y = 0) (True negatives) Ames test detects frame-shift mutations in a test compound by treating it with strains of Salmonella typhimurium. Ames positive Mutagenic; Ames negative Non-mutagenic Benchmark dataset with pre-defined cross validation splits compiled by Hansen et al. 1 is used to evaluate performance. Total compounds = 6512; Ames POS = 3503; Ames NEG = E. PRELIMINARY RESULTS 1. Katja Hansen, Sebastian Mika, Timon Schroeter, Andreas Sutter, Antonius ter Laak, Thomas Steger-Hartmann, Nikolaus Heinrich, and Klaus-Robert Muller. Benchmark data set for in silico prediction of Ames mutagenicity. Journal of chemical information and modeling, 49(9):2077–81, September MethodSensitivitySpecificity Linear fragments Pipeline Pilot DEREK MultiCASE Annotations are features/attributes assigned to each atom type. Annotation options: Atom identity (AI), Number of heavy atom connections (nC), Number of attached hydrogen atoms (nH) Annotation scheme: any possible combination of annotation options. Graph of nodes & edges Linear paths from node Comparison of performance with other non-parametric methods (averaged over 5-fold cross validation splits) Demonstration of extracting linear subgraphs from m-ethyl phenol Training Set Test Set One-step connection countsOne-step connection probability ratios Total linear fragments generatedDistribution of fragment lengths Performance of linear fragments method is obtained using {AI, nC, nH} annotation scheme Fragments of length 2. Characteristics of Ames Benchmark dataset Training Set 1 Test Set 1