Institute of Biomedical Chemistry of Rus. Acad. Med. Sci.; *A.N. Sysin Institute of Human Ecology and Environmental Health of Rus. Acad. Med. Sci., Moscow,

Slides:



Advertisements
Similar presentations
ADBIS 2007 A Clustering Approach to Generalized Pattern Identification Based on Multi-instanced Objects with DARA Rayner Alfred Dimitar Kazakov Artificial.
Advertisements

Learning Algorithm Evaluation
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Carcinogenicity prediction for Regulatory Use Natalja Fjodorova Marjana Novič, Marjan Vračko, Marjan Tušar National institute of Chemistry, Ljubljana,
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
A Study on Feature Selection for Toxicity Prediction*
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Evaluating Hypotheses
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Biostatistics Frank H. Osborne, Ph. D. Professor.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
QSAR Modelling of Carcinogenicity for Regulatory Use in Europe Natalja Fjodorova, Marjana Novič, Marjan Vračko, Marjan Tušar, National institute of Chemistry,
Relationships Among Variables
Criteria for Screens— Review of the EDSTAC Recommendations Presentation to the EDMVS July 23, 2002.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Inference for regression - Simple linear regression
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
A unifying model of cation binding by humic substances Class: Advanced Environmental Chemistry (II) Presented by: Chun-Pao Su (Robert) Date: 2/9/1999.
David Kim Allergan Inc. SoCalBSI California State University, Los Angeles.
Dr. Asawer A. Alwasiti.  Chapter one: Introduction  Chapter two: Frequency Distribution  Chapter Three: Measures of Central Tendency  Chapter Four:
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
“Emergency discovery” of novel antimicrobials among known drugs in response to new and re-emerging infectious threats A. Cherkasov UBC / VGH Infectious.
Identifying Applicability Domains for Quantitative Structure Property Relationships Mordechai Shacham a, Neima Brauner b Georgi St. Cholakov c and Roumiana.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
1 New Lazar Developments 10/2008 A. Maunz 1) C. Helma 1), 2) 1) FDM Freiburg Univ. 2) in silico toxicology.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
Chemical Reactions in Ideal Gases. Non-reacting ideal gas mixture Consider a binary mixture of molecules of types A and B. The canonical partition function.
Section 10.1 Confidence Intervals
© Copyright McGraw-Hill 2000
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
P. Gramatica and F. Consolaro QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy.
Evaluating Results of Learning Blaž Zupan
REVIEW ON PROJECT WORK Measurement… Prof. Andras Fekete Department of Physics and Control Corvinus University of Budapest.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
Unsupervised Forward Selection A data reduction algorithm for use with very large data sets David Whitley †, Martyn Ford † and David Livingstone †‡ † Centre.
Log Koc = MW nNO – 0.19 nHA CIC MAXDP Ts s = 0.35 F 6, 134 = MW: molecular weight nNO: number of NO bonds.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
STATISTICS AND OPTIMIZATION Dr. Asawer A. Alwasiti.
Correlation & Regression Analysis
O PTIMAL NANO - DESCRIPTORS AS TRANSLATORS OF ECLECTIC DATA INTO PREDICTION OF THE CELL MEMBRANE DAMAGE BY MEANS OF NANO METAL - OXIDES A LLA P. T OROPOVA.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 108 (2014) 203–209 ACKNOWLEDGMENTS WE THANK THE EC PROJECT NANOPUZZLES (PROJECT REFERENCE: ) Optimal descriptor.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
F.Consolaro 1, P.Gramatica 1, H.Walter 2 and R.Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Classification Ensemble Methods 1
MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.
P. Gramatica 1, H. Walter 2 and R. Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental Research.
Use of Machine Learning in Chemoinformatics
Discriminating between Drugs and Nondrugs by Prediction of Activity Spectra for Substances (PASS) Soheila Anzali, Gerhard Barnickel, Bertram Cezanne, Michael.
A Framework and Methods for Characterizing Uncertainty in Geologic Maps Donald A. Keefer Illinois State Geological Survey.
Chapter 7: The Distribution of Sample Means
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
(Quantitative) Structure- Activity Relationships (Q)SAR.
이 장 우. 1. Introduction  HPLC-MS/MS methodology achieved its preferred status -Highly selective and effectively eliminated interference -Without.
So that k k E 5 = - E 2 = = x J = x J Therefore = E 5 - E 2 = x J Now so 631.
General Concepts in QSAR for Using the QSAR Application Toolbox
7. Performance Measurement
Performance Evaluation 02/15/17
(Q)SAR and (Q)AAR analysis of ToxCast Dataset Using PASS
Evaluating Results of Learning
Hierarchical Classification of Calculated Molecular Descriptors
P. Gramatica1, F. Consolaro1, M. Vighi2, A. Finizio2 and M. Faust3
Learning Algorithm Evaluation
Multivariate Methods Berlin Chen
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
Presentation transcript:

Institute of Biomedical Chemistry of Rus. Acad. Med. Sci.; *A.N. Sysin Institute of Human Ecology and Environmental Health of Rus. Acad. Med. Sci., Moscow, Russia. (Q)SAR and (Q)AAR analysis of ToxCast Dataset Using PASS and GUSAR approaches Vladimir Poroikov, Dmitry Filimonov, Alexey Zakharov, Alexey Lagunin, Sergey Novikov* References Acknowledgements. We gratefully acknowledge Prof. Alex Tropsha for kindly assistance in presentation of the results at the ToxCast Poster Session. The work was supported in part by the FP7 project (OpenTox) and ISTC project # Apologies. I am sorry for not obtaining the US visa in time and, therefore, inability to take part in the ToxCast Workshop on May 14-15, In case, if you will have any questions/suggestions, please, do not hesitate to contact me: tel: ; fax: Introduction The aim of the study: (1) To estimate the possibility of prediction of ToxCast Phase 1 (TC1) in vivo data on the basis of structural formulae, physical-chemical properties and in vitro data from TC1 dataset. (2) To estimate the possibility of prioritization of molecules from the TC1 dataset for the toxicological testing using the integral parameter. Materials The data on in vivo and in vitro assays of chemical compounds were used for (quantitative) structure-activity relationships ((Q)SAR) and (quantitative) activity-activity relationships ((Q)AAR) analysis from the ToxCast Phase 1 dataset. The data from CPDB dataset (CPDB, 25 October 2007) was used as the training set for the carcinogenicity prediction of in vivo ToxCast assays. The data were extracted from EPA Distributed Structure-Searchable Toxicity (DSSTox) Public Database Network [1]. We used 1397 compounds that were tested in the standard two-year rodent carcinogenicity bioassay. Small inorganic compounds (e.g. NO2), oils, paraffins and mixtures of compounds were excluded from the set. Methods PASS program. (Prediction of Activity Spectra for Substances) is a computer program for evaluation of general biological potential in a molecule on the basis of its structural formulae [2]. MNA ("Multilevel Neighbourhoods of Atoms") descriptors are used for presentation of a compound’s structure. The list of predictable biological activities contains 3750 types (PASS version) including main and side pharmacological effects (antihypertensive, hepatoprotective, anti-inflammatory etc.), mechanisms of action (5-hydroxytryptamine agonist, cyclooxygenase inhibitor, adenosine uptake inhibitor, etc.), specific toxicities (mutagenicity, carcinogenicity, teratogenicity, etc.) and metabolic terms (CYP1A substrate, CYP3A4 inhibitor, CYP2C9 inducer, etc.). The mean accuracy calculated by leave-one-out cross-validation procedure is 95%. PASS predictions for TC1 molecules are presented at the ToxCast web-site, and can be used as parameters characterizing these compounds in biological space. QNA descriptors. A molecular structure is described as a set of QNA (Quantitative Neighborhoods of Atoms) descriptors [3]. QNA descriptors are based on values of ionization potential (IP) and electron affinity (EA) of each atom in the molecule. QNA descriptors are calculated as following: Pi = Σk Bi-½(Exp(–½C))ikBk-½, Qi = Σk Bi-½(Exp(–½C))ikBk-½Ak, where Ak = ½(IPk + EAk), Bk = IPk – EAk, C is a molecular connectivity matrix. Thus, each atom of molecule is described by two values, P and Q. Since any molecule has different number of atoms, P and Q are proportional to the number of atoms in molecule, but for regression analysis it is necessary to describe the molecular structure as a vector with the fixed length. Therefore, Chebyshev polynomial’s are used for vector’s presentation of a molecular structure: where Tn is nth degree of Chebyshev polynomial, P` and Q` are the orthonormalized representation of P and Q values (zero mean values of P` and Q`, unit variance and absence of correlation of P` and Q`). The Tn(P,Q) values are calculated for each atom of a molecule. A whole molecule is presented as an average value of Chebyshev polynomials for all atoms; therefore, the length of the vector is defined by the numbers of Chebyshev polynomials - m. On one hand the large number of Chebyshev polynomials may describe complex structure-activity relationships; on the other hand the large length of the vector that represents the structure may provide overtraining in regression analysis. Therefore, the initial value of m is determined as a half of number of molecules in the training set. Self-Consistent Regression (SCR). Self-consistent regression can obtain the best QSAR/QSPR model for the training set with a large number of descriptors. SCR is based on least-squares regularized method. The main feature of SCR method is a removal of variables, which are worse for description of an appropriate value [4]. Integral parameter ToxDose. Using dosage characteristics from all 75 end-points, experimental data for which were obtained in vivo, we calculated the integral parameter - ToxDose for 283 compounds: Analysis of Chemical Space We compared the distribution of molecules from ToxCast Phase 1 dataset with compounds from the PASS Training set (10_MF), MDDR 2003, and RoadMap The compounds from ToxCast Phase 1 dataset contain less non-hydrogen atoms than typical drug-like molecules, and RoadMap dataset. In PASS training set average MW=416 Dalton; The average molecular weight of compounds from ToxCast Phase 1 database is 302 Dalton that smaller than those for drug-like compounds. Analysis of Biological Parameters where: D is the LEL value; m is the number of end-points for a particular compound; n is the total number of tests. PASS prediction of the carcinogenicity To predict the carcinogenicity with PASS, we used compounds from CPDB as a training set. After the training procedure average accuracy of prediction (LOO CV) equals to 74%. With the trained version of PASS we predicted carcinogenicity for 306 compounds from ToxRefDB. Four compounds that have two components were excluded from the prediction. Accuracy of prediction for ToxRefDB is given below. NATPTNFPFNSensitivitySpecificityAccuracy CHR_Mouse_LiverTumors CHR_Mouse_LungTumors CHR_Mouse_Tumorigen CHR_Rat_LiverTumors CHR_Rat_TesticularTumors CHR_Rat_ThyroidTumors CHR_Rat_Tumorigen NA - data not available; TP - true positive; TN - true negative; FP - false positive; FN - false negative; Sensitivity – TP/(TP+FN); Specificity – TN/(TN+FP); Accuracy – (TP+TN)/(TP+TN+FP+FN) Accuracy of prediction varies from 0.57 (CHR_Mouse_LiverTumors and CHR_Rat_TesticularTumors) to 0.85 (CHR_Rat_ThyroidTumors). Sensetivity varies from 0.12 (CHR_Mouse_LungTumors) to 0.63 (CHR_Rat_TesticularTumors). Specificity varies from 0.57 (CHR_Rat_TesticularTumors) to 0.93 (CHR_Mouse_LungTumors). Rat activities were predicted more accurately than mouse activities. QSAR Models for Rat’s Cholinesterase Inhibitors We collected 45 inhibitors of Rat’s Cholinesterase (CHR_Rat_CholinesteraseInhibition) from TC1. The toxicity end-point based on the EC50 (mg/kg) values was used for building of QSAR models, using QNA descriptors and SCR. The eighteen QSAR models were created by QNA/SCR approach. Only four QSAR models have Q 2 > 0.50 and R 2 > These models were additionally validated by leave-10%-out cross validation. Leave-10%-out cross validation procedure was repeated 20 times and average R 2 of prediction was calculated. Results of validation are presented below. NameNumberR2R2 Q2Q2 FisherSDVariablesL10%OCV model model model model Three from four QSAR models have average R 2 of prediction more then Therefore, the obtained models are robust and predictive. ToxDose We calculated the integral parameter ToxDose for twenty end-points of carcinogenicity for mouse and rat. ToxDose values are correlated with other carcinogenicity end-points and may be use for prioritization of molecules from the TC1 dataset for toxicological testing. The data for cholinesterase inhibition differs significantly from all other end-points; thus they were excluded from the further analysis. Results of PASS Training for Predicting Different Categories of ToxDose The ninety five percent of ToxCast compounds are discriminated from drug-like molecules. For different toxicity grouping PASS accuracy of recognition varies from 75.0% to 59.1%; and the most toxic compounds are predicted better. Thus, PASS prediction could be applied for selection of priorities in testing of the most probable toxic compounds. Conclusions 1) 2) Poroikov V, Filimonov D PASS: Prediction of Biological Activity Spectra for Substances. In: Predictive Toxicology (Christoph Helma, eds). LLC, Boca Raton, Taylor & Francis Group, ) Lagunin, A.; Zakharov, A.; Filimonov, D.; Poroikov, V. A new approach to QSAR modelling of acute toxicity. SAR and QSAR in Environmental Research 2007, 18, ) Filimonov, D.; Akimov, D.; Poroikov, V. The Method of Self-Consistent Regression for the Quantitative Analysis of Relationships Between Structure and Properties of Chemicals. Pharm.Chem. J. 2004, 1, Biological activities predicted by PASS can be directly compared to TC1 in vivo data only in a few cases (carcinogenicity and cholinesterase inhibitors). Comparison of TC1 in vivo data for the same species and between the species lead to the following conclusions: 1) Correlation coefficients between the in vivo data for same species varies from 0.75 to ) Correlation coefficients for the same tissues between the species less than Comparison of in vivo data presented as integral parameter (ToxDose) with in vitro data demonstrated that the maximal value of correlation coefficient is 0.26 (ToxDose vs. ToxCast Novascreen data). Thus, no significant correlation between in vivo and in vitro data is found. NoActivity TypeNumber IEP, % Data1 IEP, % Data2 1ToxCast CHR_ToxDose < 1.0 mkM/kg CHR_ToxDose < 3.16 mkM/kg CHR_ToxDose < 10.0 mkM/kg CHR_ToxDose < 31.6 mkM/kg CHR_ToxDose < mkM/kg CHR_ToxDose < mkM/kg CHR_ToxDose < mkM/kg CHR_ToxDose mkM/kg CHR_ToxDose mkM/kg CHR_ToxDose mkM/kg CHR_ToxDose mkM/kg CHR_ToxDose mkM/kg CHR_ToxDose mkM/kg CHR_ToxDose mkM/kg Here: Data1 is 301 chemicals from ToxCast in vivo dataset with 100 drug-like chemical compounds; Data2 is 301 chemicals from ToxCast in vivo dataset with chemical compounds from PASS training set. 1) Compounds from TC1 dataset are smaller than typical drug-like structures and molecules from RoadMap set. 2) No significant correlation between the in vivo and in vitro data from TC1 set was observed. 3) Despite the chemical dissimilarity between the TC1 compounds and drug-like molecules, PASS-based prediction of carcinogenicity could be obtain with reasonable accuracy. 4) It is shown that integral parameter characterizing general toxicity ToxDose can be predicted by PASS with reasonable accuracy. Thus, such approach could be recommended for prioritization in chemicals testing.