MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.

Slides:



Advertisements
Similar presentations
Analysis of High-Throughput Screening Data C371 Fall 2004.
Advertisements

C A INTRODUCTION An Environmental Quality Objective (EQO), intended as a real “No Effect Concentration” (NEC), is not accessible experimentally. The usual.
MODELLING OF PHYSICO-CHEMICAL PROPERTIES FOR ORGANIC POLLUTANTS F. Consolaro, P. Gramatica and S. Pozzi QSAR Research Unit, Dept. of Structural and Functional.
Artificial Neural Networks
6th lecture Modern Methods in Drug Discovery WS07/08 1 More QSAR Problems: Which descriptors to use How to test/validate QSAR equations (continued from.
ABSTRACT The BEAM EU research project focuses on the risk assessment of mixture toxicity. A data set of 124 heterogeneous chemicals of high concern as.
陆文聪 Data Mining Applied to Chemistry and chemical engineering Department of Chemistry, College of Sciences, Shanghai University, P. R. China.
Faculty of Computer Science © 2006 CMPUT 605February 04, 2008 Novel Approaches for Small Bio-molecule Classification and Structural Similarity Search Karakoc.
Cheminformatics II Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.
Basic Steps of QSAR/QSPR Investigations
A Study on Feature Selection for Toxicity Prediction*
4 Th Iranian chemometrics Workshop (ICW) Zanjan-2004.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Learning From Data Chichang Jou Tamkang University.
Cloud Computing for Chemical Property Prediction Paul Watson School of Computing Science Newcastle University, UK Microsoft Cloud.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
8 th Iranian workshop of Chemometrics 7-9 February 2009 Progress of Chemometrics in Iran Mehdi Jalali-Heravi February 2009 In the Name of God.
QSAR Modelling of Carcinogenicity for Regulatory Use in Europe Natalja Fjodorova, Marjana Novič, Marjan Vračko, Marjan Tušar, National institute of Chemistry,
Application and Efficacy of Random Forest Method for QSAR Analysis
Graduate Research Symposium 2014William G. Lowrie Dept. of Chemical and Biomolecular Engineering Evaluating the potential toxicity of chemical compounds.
A Comparative Analysis of Software Refinement Techniques Ion IVAN Adrian VISOIU.
Predicting Highly Connected Proteins in PIN using QSAR Art Cherkasov Apr 14, 2011 UBC / VGH THE UNIVERSITY OF BRITISH COLUMBIA.
Non ionic organic pesticide environmental behaviour: ranking and classification F. Consolaro and P. Gramatica QSAR Research Unit, Dept. of Structural and.
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Molecular Descriptors
RESULT and DISCUSSION In order to find a relation between the three rate reaction constant (k OH, k NO3 and k O3 ) and the structural features of chemicals,
Surveillance monitoring Operational and investigative monitoring Chemical fate fugacity model QSAR Select substance Are physical data and toxicity information.
ABSTRACT Bioconcentration by aquatic biota is an important factor in assessing the environmental behaviour and potential hazard evaluation of a chemical,
Predicting a Variety of Constant Pure Compound Properties by the Targeted QSPR Method Abstract The possibility of obtaining a reliable prediction a wide.
CONCLUSIONS CONCLUSIONS - Missing values of the principal physico-chemical properties are predicted by validated regression models by using different kinds.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
3D- QSAR. QSAR A QSAR is a mathematical relationship between a biological activity of a molecular system and its physicochemical parameters. QSAR attempts.
“Emergency discovery” of novel antimicrobials among known drugs in response to new and re-emerging infectious threats A. Cherkasov UBC / VGH Infectious.
The aquatic toxicity values of 57 esters, with experimental and predicted LC50 in fish, EC50 in Daphnia and seaweed and IGC in Entosiphon sulcatum, were.
Paola GRAMATICA a, Paola LORENZINI a, Angela SANTAGOSTINO b and Ezio BOLZACCHINI b a University of Insubria, Dep. of Structural and Functional Biology,
Identifying Applicability Domains for Quantitative Structure Property Relationships Mordechai Shacham a, Neima Brauner b Georgi St. Cholakov c and Roumiana.
Ligand-based drug discovery No a priori knowledge of the receptor What information can we get from a few active compounds.
Institute for Advanced Studies in Basic Sciences – Zanjan Kohonen Artificial Neural Networks in Analytical Chemistry Mahdi Vasighi.
Institute for Advanced Studies in Basic Sciences – Zanjan Mahdi Vasighi Supervised Kohonen Artificial Neural Networks.
DESIRABILITY OF POPs ACCORDING TO THEIR ATMOSPHERIC MOBILITY The main goal pursued in this work is the formulation of a POP ranking by atmospheric mobility.
TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS Council Directive 76/464/EEC of the European.
Martin Waldseemüller's World Map of 1507 Zanjan. Roberto Todeschini Viviana Consonni Davide Ballabio Andrea Mauri Alberto Manganaro chemometrics molecular.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
Spatially Assessing Model Error Using Geographically Weighted Regression Shawn Laffan Geography Dept ANU.
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
ABSTRACT The behavior and fate of chemicals in the environment is strongly influenced by the inherent properties of the compounds themselves, particularly.
P. Gramatica and F. Consolaro QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy.
QSAR AND CHEMOMETRIC APPROACHES TO THE SCREENING OF POPs FOR ENVIRONMENTAL PERSISTENCE AND LONG RANGE TRANSPORT FOR ENVIRONMENTAL PERSISTENCE AND LONG.
Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors P.Gramatica, F.Consolaro and M.Pavan QSAR.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Log Koc = MW nNO – 0.19 nHA CIC MAXDP Ts s = 0.35 F 6, 134 = MW: molecular weight nNO: number of NO bonds.
Alice E. Smith and Mehmet Gulsen Department of Industrial Engineering
F.Consolaro 1, P.Gramatica 1, H.Walter 2 and R.Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental.
P. Gramatica 1, H. Walter 2 and R. Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental Research.
Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.
Use of Machine Learning in Chemoinformatics
Neural networks – Hands on
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Software Defects Cmpe 550 Fall 2005
Artificial Neural Network
PHYSICO-CHEMICAL PROPERTIES MODELLING FOR ENVIRONMENTAL POLLUTANTS
Hierarchical Classification of Calculated Molecular Descriptors
SMA5422: Special Topics in Biotechnology
Building Hypotheses and Searching Databases
Virtual Screening.
Current Status at BioChemtek
P. Gramatica1, F. Consolaro1, M. Vighi2, A. Finizio2 and M. Faust3
M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini
Presentation transcript:

MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy Web: INTRODUCTION Aromatic and heteroaromatic amines are widespread chemicals of considerable industrial and environmental relevance as they are carcinogenic for human beings. QSAR studies have been used to develop models to estimate and to predict mutagenicity by relating it to chemical structure. In mutagenicity QSAR applications, the investigators focus on either the molecular determinants that discriminate between active and inactive chemicals, or the modulators of the relative potency of the active chemicals. The development of a model to predict mutagenicity necessitates a test system capable of providing reproducible and quantitative estimates of toxic activity; the most widely used is a bacterial test, based on the Salmonella typhimurium strains (TA98  frameshift mutation; TA100  base-substitution mutation), introduced by Ames. The data set is constituted by 146 aromatic and heteroaromatic amines collected by Benigni 1 ; mutagenicity data are expressed as the mutation rate in log (revertants/nmol). [1] R. Benigni et all. QSAR Models for both mutagenic potency and activity: application to nitroarens and aromatic amines. Environmental and Molecular Mutagenesis 24, (1994). MOLECULAR DESCRIPTORS The molecular structure has been represented by a wide set of 670 molecular descriptors calculated by a software developed by R.Todeschini ( sum of atomic properties descriptors (6) counters (45) empirical descriptors (2) information indices (16) topological descriptors (58) topographic descriptors (7) geometric descriptors (170) quanto-chemicals descriptors (6) autocorrelation descriptors (252) directional WHIM descriptors (66) 2 non-directional WHIM descriptors (33) 2 [2]R.Todeschini and P.Gramatica, 3D-modelling and prediction by WHIM descriptors. Part 5. Theory development and chemical meaning of the WHIM descriptors, Quant.Struct.-Act.Relat., 16 (1997) Training set Internal validation Internal validation Test set External validation External validation Models Predictions Q 2 LOO Q 2 LMO Z r ER cv Q 2 LOO Q 2 LMO Z r ER cv Q 2 ext ER ext Q 2 ext ER ext TRAINING SET SELECTION In order to have knowledge of the predictive capability of the models both internal and external validations were performed. Chemometric methods like PCA, cluster analysis and Kohonen maps were used to select the most representative training set of amines: models were developed on the selected training set and predictions were made for the molecules excluded from the model generation step (test set). Molecular descriptors Experimental responses DATASET: 146 amines 670 molecular descriptors 2 responses - TA98 - TA100 DATASET: 146 amines 670 molecular descriptors 2 responses - TA98 - TA100 Training set Test set Variable subset selection Variable subset selection Genetic Algorithm (GA) GA-RLog PLS-DA GA-RLog PLS-DA Regression models Regression models Classification models Classification models CART SIMCA K-NN RDA CP-ANN CART SIMCA K-NN RDA CP-ANN OLS REGRESSION MODELS The mutagenicity potency has been modelled by Ordinary Least Squares (OLS) method using a selected subset starting from 670 different molecular descriptors; the selection of the best subset of variables has been realised by Genetic Algorithm (GA-VSS). The obtained models have been validated by leave-one-out (Q 2 LOO ), leave-more-out (Q 2 LMO ), y-scrambling (Z r ) and an external test set (Q 2 ext ) and show satisfactory predictive performances, considering the uncertainty of the biological end-points. TA98TA98 LogTA98= MWC06 LogTA98= nR PJI2-0.06SPAN-2.16SPG14m-14.65E1p LogTA100= nHA+10.78ACMB5v+0.57L2v TA100 TA100 Training set = 60 comp. Test set = 42 comp. n.variables Q 2 LOO Q 2 LMO Q 2 ext R n.variables Q 2 LOO Q 2 LMO Q 2 ext R Training set = 60 compounds Test set = 42 comp. Training set = 43 comp. Test set = 31 comp. n.variables Q 2 LOO Q 2 LMO Q 2 ext R MWC06 = molecular walk count nHA = number of acceptor atoms for H-bonds ACMB5v = autocorrelation descriptors L2v = directional WHIM CLASSIFICATION MODELS Some classification methods (CART, K-NN, RDA, SIMCA and CP-ANN) have been applied to this data set in order to distinguish between activity classes. The selection of the best subset of variables has been realised by Genetic Algorithm (GA-VSS) on Logistic Regression (RLog), a regression method useful when there is a restriction on the possible values of the dependent variable Y, and by PLS-DA, which confirmed the results previously obtained. The models have been validated internally (ER) and externally (ER ext ). TA98TA98 CP-ANN (counterpropagationartificial neural networks) CP-ANN (counterpropagation artificial neural networks) 10x10 neurones; 500 learning epochs mutagen compounds non mutagen compounds unknown compounds NOMER% ER% ER ext % Training set = 55 compounds Test set = 60 comp. TA100TA100 CART (classification and regression tree) Mor26u <-0.29 NOMER% ER% ER ext % Training set = 63 comp. Test set = 52 comp. Mor32v <-0.23 Mor32v <-0.29Mor27v <-0.13 JGI4 <0.03 JGI4 <0.04 CHI0 <8.05 PHI < PW2 molecular shape Mor17u 3D-structure Mor32v,Mor27v,Mor26u steric properties JGI4 topological charges PHI molecular flexibility CHI0 molecular connectivity CONCLUSIONS 2 molecular dimension 2 molecular branching aromatic and heteroaromatic amines intercalary agents 2 molecular dimension 2 electronic properties 2 flexibility 2 connectivity aromatic and heteroaromatic aminescomplex base-substitution mutation STRAIN TA100 base-substitution mutation base-substitution mutation frameshift mutation STRAIN TA98