Presentation is loading. Please wait.

Presentation is loading. Please wait.

MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.

Similar presentations


Presentation on theme: "MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of."— Presentation transcript:

1 MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy e-mail: manuela.pavan@libero.it Web: http://fisio.varbio2.unimi.it/dbsf/home.html INTRODUCTION Aromatic and heteroaromatic amines are widespread chemicals of considerable industrial and environmental relevance as they are carcinogenic for human beings. QSAR studies have been used to develop models to estimate and to predict mutagenicity by relating it to chemical structure. In mutagenicity QSAR applications, the investigators focus on either the molecular determinants that discriminate between active and inactive chemicals, or the modulators of the relative potency of the active chemicals. The development of a model to predict mutagenicity necessitates a test system capable of providing reproducible and quantitative estimates of toxic activity; the most widely used is a bacterial test, based on the Salmonella typhimurium strains (TA98  frameshift mutation; TA100  base-substitution mutation), introduced by Ames. The data set is constituted by 146 aromatic and heteroaromatic amines collected by Benigni 1 ; mutagenicity data are expressed as the mutation rate in log (revertants/nmol). [1] R. Benigni et all. QSAR Models for both mutagenic potency and activity: application to nitroarens and aromatic amines. Environmental and Molecular Mutagenesis 24, 208-219 (1994). MOLECULAR DESCRIPTORS The molecular structure has been represented by a wide set of 670 molecular descriptors calculated by a software developed by R.Todeschini (http://www.disat.unimib.it/chm): sum of atomic properties descriptors (6) counters (45) empirical descriptors (2) information indices (16) topological descriptors (58) topographic descriptors (7) geometric descriptors (170) quanto-chemicals descriptors (6) autocorrelation descriptors (252) directional WHIM descriptors (66) 2 non-directional WHIM descriptors (33) 2 [2]R.Todeschini and P.Gramatica, 3D-modelling and prediction by WHIM descriptors. Part 5. Theory development and chemical meaning of the WHIM descriptors, Quant.Struct.-Act.Relat., 16 (1997) 113-119. Training set Internal validation Internal validation Test set External validation External validation Models Predictions Q 2 LOO Q 2 LMO Z r ER cv Q 2 LOO Q 2 LMO Z r ER cv Q 2 ext ER ext Q 2 ext ER ext TRAINING SET SELECTION In order to have knowledge of the predictive capability of the models both internal and external validations were performed. Chemometric methods like PCA, cluster analysis and Kohonen maps were used to select the most representative training set of amines: models were developed on the selected training set and predictions were made for the molecules excluded from the model generation step (test set). Molecular descriptors Experimental responses DATASET: 146 amines 670 molecular descriptors 2 responses - TA98 - TA100 DATASET: 146 amines 670 molecular descriptors 2 responses - TA98 - TA100 Training set Test set Variable subset selection Variable subset selection Genetic Algorithm (GA) GA-RLog PLS-DA GA-RLog PLS-DA Regression models Regression models Classification models Classification models CART SIMCA K-NN RDA CP-ANN CART SIMCA K-NN RDA CP-ANN OLS REGRESSION MODELS The mutagenicity potency has been modelled by Ordinary Least Squares (OLS) method using a selected subset starting from 670 different molecular descriptors; the selection of the best subset of variables has been realised by Genetic Algorithm (GA-VSS). The obtained models have been validated by leave-one-out (Q 2 LOO ), leave-more-out (Q 2 LMO ), y-scrambling (Z r ) and an external test set (Q 2 ext ) and show satisfactory predictive performances, considering the uncertainty of the biological end-points. TA98TA98 LogTA98=-3.807+1.184MWC06 LogTA98=-10.33+1.68nR06-4.93PJI2-0.06SPAN-2.16SPG14m-14.65E1p LogTA100=-3.39-0.72nHA+10.78ACMB5v+0.57L2v TA100 TA100 Training set = 60 comp. Test set = 42 comp. n.variables Q 2 LOO Q 2 LMO Q 2 ext R 2 1 62.8 63.1 67.5 65.1 n.variables Q 2 LOO Q 2 LMO Q 2 ext R 2 5 78.5 78.0 38.7 82.8 Training set = 60 compounds Test set = 42 comp. Training set = 43 comp. Test set = 31 comp. n.variables Q 2 LOO Q 2 LMO Q 2 ext R 2 3 86.2 85.8 73.9 88.2 MWC06 = molecular walk count nHA = number of acceptor atoms for H-bonds ACMB5v = autocorrelation descriptors L2v = directional WHIM CLASSIFICATION MODELS Some classification methods (CART, K-NN, RDA, SIMCA and CP-ANN) have been applied to this data set in order to distinguish between activity classes. The selection of the best subset of variables has been realised by Genetic Algorithm (GA-VSS) on Logistic Regression (RLog), a regression method useful when there is a restriction on the possible values of the dependent variable Y, and by PLS-DA, which confirmed the results previously obtained. The models have been validated internally (ER) and externally (ER ext ). TA98TA98 CP-ANN (counterpropagationartificial neural networks) CP-ANN (counterpropagation artificial neural networks) 10x10 neurones; 500 learning epochs mutagen compounds non mutagen compounds unknown compounds NOMER% ER% ER ext % 14.5 3.6 8.3 Training set = 55 compounds Test set = 60 comp. TA100TA100 CART (classification and regression tree) Mor26u <-0.29 NOMER% ER% ER ext % 42.9 4.7 15.4 Training set = 63 comp. Test set = 52 comp. Mor32v <-0.23 Mor32v <-0.29Mor27v <-0.13 JGI4 <0.03 JGI4 <0.04 CHI0 <8.05 PHI <1.72 1 1 1 11 2 1 2 2 PW2 molecular shape Mor17u 3D-structure Mor32v,Mor27v,Mor26u steric properties JGI4 topological charges PHI molecular flexibility CHI0 molecular connectivity CONCLUSIONS 2 molecular dimension 2 molecular branching aromatic and heteroaromatic amines intercalary agents 2 molecular dimension 2 electronic properties 2 flexibility 2 connectivity aromatic and heteroaromatic aminescomplex base-substitution mutation STRAIN TA100 base-substitution mutation base-substitution mutation frameshift mutation STRAIN TA98


Download ppt "MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of."

Similar presentations


Ads by Google