Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.

Slides:



Advertisements
Similar presentations
Analysis of High-Throughput Screening Data C371 Fall 2004.
Advertisements

Basic Gene Expression Data Analysis--Clustering
C A INTRODUCTION An Environmental Quality Objective (EQO), intended as a real “No Effect Concentration” (NEC), is not accessible experimentally. The usual.
LIRDHIST Methodology and results to analyse conceptions on nature utilization and preservation among teachers from 16 countries François Munoz, Franz Bogner,
Everardo Macias, Patrick Tomboc Eamonn F. Healy, Chemistry Department,
MODELLING OF PHYSICO-CHEMICAL PROPERTIES FOR ORGANIC POLLUTANTS F. Consolaro, P. Gramatica and S. Pozzi QSAR Research Unit, Dept. of Structural and Functional.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
PROBABILISTIC ASSESSMENT OF THE QSAR APPLICATION DOMAIN Nina Jeliazkova 1, Joanna Jaworska 2 (1) IPP, Bulgarian Academy of Sciences, Sofia, Bulgaria (2)
ABSTRACT The BEAM EU research project focuses on the risk assessment of mixture toxicity. A data set of 124 heterogeneous chemicals of high concern as.
Mutual Information Mathematical Biology Seminar
Basic Steps of QSAR/QSPR Investigations
A Study on Feature Selection for Toxicity Prediction*
4 Th Iranian chemometrics Workshop (ICW) Zanjan-2004.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Evaluating Hypotheses
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
8 th Iranian workshop of Chemometrics 7-9 February 2009 Progress of Chemometrics in Iran Mehdi Jalali-Heravi February 2009 In the Name of God.
Clustering and MDS Exploratory Data Analysis. Outline What may be hoped for by clustering What may be hoped for by clustering Representing differences.
Application and Efficacy of Random Forest Method for QSAR Analysis
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Non ionic organic pesticide environmental behaviour: ranking and classification F. Consolaro and P. Gramatica QSAR Research Unit, Dept. of Structural and.
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Molecular Descriptors
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
RESULT and DISCUSSION In order to find a relation between the three rate reaction constant (k OH, k NO3 and k O3 ) and the structural features of chemicals,
Surveillance monitoring Operational and investigative monitoring Chemical fate fugacity model QSAR Select substance Are physical data and toxicity information.
ABSTRACT Bioconcentration by aquatic biota is an important factor in assessing the environmental behaviour and potential hazard evaluation of a chemical,
CONCLUSIONS CONCLUSIONS - Missing values of the principal physico-chemical properties are predicted by validated regression models by using different kinds.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
The aquatic toxicity values of 57 esters, with experimental and predicted LC50 in fish, EC50 in Daphnia and seaweed and IGC in Entosiphon sulcatum, were.
Paola GRAMATICA a, Paola LORENZINI a, Angela SANTAGOSTINO b and Ezio BOLZACCHINI b a University of Insubria, Dep. of Structural and Functional Biology,
Identifying Applicability Domains for Quantitative Structure Property Relationships Mordechai Shacham a, Neima Brauner b Georgi St. Cholakov c and Roumiana.
AN APPROACH TO DETERMINE THE APPLICATION DOMAIN OF GROUP CONTRIBUTION MODELS Nina Jeliazkova 1 Joanna Jaworska 2, (2) Central Product Safety, Procter &
DESIRABILITY OF POPs ACCORDING TO THEIR ATMOSPHERIC MOBILITY The main goal pursued in this work is the formulation of a POP ranking by atmospheric mobility.
TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS Council Directive 76/464/EEC of the European.
Martin Waldseemüller's World Map of 1507 Zanjan. Roberto Todeschini Viviana Consonni Davide Ballabio Andrea Mauri Alberto Manganaro chemometrics molecular.
A B S T R A C T The study presents the application of selected chemometric techniques to the pollution monitoring dataset, namely, cluster analysis,
Spatially Assessing Model Error Using Geographically Weighted Regression Shawn Laffan Geography Dept ANU.
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.
ABSTRACT The behavior and fate of chemicals in the environment is strongly influenced by the inherent properties of the compounds themselves, particularly.
P. Gramatica and F. Consolaro QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
QSAR AND CHEMOMETRIC APPROACHES TO THE SCREENING OF POPs FOR ENVIRONMENTAL PERSISTENCE AND LONG RANGE TRANSPORT FOR ENVIRONMENTAL PERSISTENCE AND LONG.
Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors P.Gramatica, F.Consolaro and M.Pavan QSAR.
STATISTICAL METHODS AND DATA MANAGEMENT TOOLS FOR OUTLIER DETECTION IN TRI DATA Dr. Nagaraj K. Neerchal and Justin Newcomer Department of Mathematics and.
Selecting Diverse Sets of Compounds C371 Fall 2004.
McKim Conference on Predictive Toxicology
Log Koc = MW nNO – 0.19 nHA CIC MAXDP Ts s = 0.35 F 6, 134 = MW: molecular weight nNO: number of NO bonds.
O PTIMAL NANO - DESCRIPTORS AS TRANSLATORS OF ECLECTIC DATA INTO PREDICTION OF THE CELL MEMBRANE DAMAGE BY MEANS OF NANO METAL - OXIDES A LLA P. T OROPOVA.
ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 108 (2014) 203–209 ACKNOWLEDGMENTS WE THANK THE EC PROJECT NANOPUZZLES (PROJECT REFERENCE: ) Optimal descriptor.
F.Consolaro 1, P.Gramatica 1, H.Walter 2 and R.Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental.
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.
P. Gramatica 1, H. Walter 2 and R. Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental Research.
Chance Correlation in QSAR studies Ahmadreza Mehdipour Medicinal & Natural Product Chemistry Research Center.
Use of Machine Learning in Chemoinformatics
Classification Categorization is the process in which ideas and objects are recognized, differentiated and understood. Categorization implies that objects.
QSAR Application Toolbox: Step 12: Building a QSAR model
PHYSICO-CHEMICAL PROPERTIES MODELLING FOR ENVIRONMENTAL POLLUTANTS
Self Organizing Maps: Parametrization of Parton Distribution Functions
Nahid Abbas and Sonal Dubey
General Concepts in QSAR for Using the QSAR Application Toolbox
Hierarchical Classification of Calculated Molecular Descriptors
P. Gramatica1, F. Consolaro1, M. Vighi2, A. Finizio2 and M. Faust3
M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini
Clustering The process of grouping samples so that the samples are similar within each group.
BEC 30325: MANAGERIAL ECONOMICS
Presentation transcript:

Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of Insubria, Varese, Italy. Web: INTRODUCTION Phenols are chemicals widespread in the environment and widely used as precursors for many products. It is well known that phenols exert effects on human health at concentrations commonly encountered in the environment. For this reason, the toxicity of these compounds has been extensively studied on different end points, but obviously data are not available for all phenols and organisms. Thus, reliable estimation methods are required. QSAR studies are useful for a simple and fast prediction of such data DATA SET The compounds used in this work are the 109 phenols described by Schultz [2]. Toxicity data, available only for 103 chemicals, are expressed in mM/l and in logarithmic scale as log of the inverse of the IGC 50 (percent inhibitory growth concentration) on Tetrahymena pyriformis strain. Three phenols (2-aminophenol, cathecol and 4-nitrophenol) that have been shown as outliers by several models, have been excluded from the data set. [2] T.W.Schultz et all. Quantitative structure-activity relationships for the Tetrahymena piryformis population growth end-point: a mechanism of action approach. Practical Applications of Quantitative Structure-Activity Relationships (QSAR) in Environmental chemistry and toxicology, (1990). CHEMOMETRIC METHODS Several chemometric analyses were applied to the compounds (represented by molecular descriptors) for the selection of an optimal training set for the QSAR models. The analyses performed are:  Principal Component Analysis (PCA):  Principal Component Analysis (PCA): this analysis was used to calculate just a few components from a large number of variables. These components allow the highlighting of the distribution of the compounds according to their structure; only the significant components were used in Cluster Analysis and Kohonen Maps to avoid the redundancy of the information.  Hierarchical Cluster Analysis:  Hierarchical Cluster Analysis: hierarchical clustering was performed using the significant components of the molecular descriptors as variables. Different distance metrics (Euclidean and Manhattan) and different linkages (Complete, average, etc.) were used and compared to find the best way to cluster these compounds.  Kohonen Maps:  Kohonen Maps: this is an additional way that allows the mapping of similar compounds by using the so-called “self- organised topological feature maps”, which are maps that preserve the topology of a multidimensional representation within the new two-dimensional representation. The position of the compounds in the cells of this map shows the similarity level of the structure of the studied phenols. The centroids of each cell have been selected as the most representative compounds in order to create a training set constituted of the more different phenols. CONCLUSION The present investigation confirms that the toxic response of phenols in the Tetrahymena system can be modelled by a logKow- dependent QSAR. The models developed starting from a wide set of various molecular descriptors identify the hydrophobicity as the single most important variable, as the logKow alone gives a good enough prediction model with a Q2(LOO)= 72.14; other structural parameters, such as electronic and connectivity ones play a role of secondary but useful relevance, at least for this set of compounds. Moreover this study demonstrates that theoretical molecular descriptors are an effective and useful alternative of LogKow. The internal and external validation procedures have confirmed the high predictive capability of the models developed. REGRESSION MODELS The selection of the best subset variables for modelling toxicity was done by a Genetic Algorithm (GA-VSS) approach, where the response is obtained by ordinary least square regression (OLS). All the calculations have been performed by using the leave-one-out (LOO) and leave-more-out (LMO) procedures and the scrambling of the responses for the validation of the models. QSAR MODELLING AND PREDICTION OF PHENOL TOXICITY MOLECULAR DESCRIPTORS The molecular structures of the studied compounds have been described by using several molecular descriptors, calculated by a software developed by R.Todeschini Sum of atomic properties descriptors (6) Count descriptors (45) Empirical descriptors (2) Information indices (16) [1 ]R.Todeschini and P.Gramatica, 3D-modelling and prediction by WHIM descriptors. Part 5. Theory development and chemical meaning of the WHIM descriptors, Quant.Struct.-Act.Relat., 16 (1997) Autocorrelation descriptors (252) Directional WHIM descriptors (66) [1] Non directional WHIM descriptors (33) [1] Topological descriptors (58) Topographic descriptors (7) Geometric descriptors (170) Quanto-chemicals descriptors (6) test set training set Selection of training set THE NUMBERED COMPOUNDS ARE OUTLIERS