P. Gramatica and F. Consolaro QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy.

Slides:



Advertisements
Similar presentations
JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.
Advertisements

The Influence of Chemical and Physical Factors on Macrobenthos in the San Francisco Estuary A Stressor Identification Method Aroon R. Melwani and Bruce.
Analysis of High-Throughput Screening Data C371 Fall 2004.
C A INTRODUCTION An Environmental Quality Objective (EQO), intended as a real “No Effect Concentration” (NEC), is not accessible experimentally. The usual.
An Introduction to Multivariate Analysis
Using the Crosscutting Concepts As conceptual tools when meeting an unfamiliar problem or phenomenon.
MODELLING OF PHYSICO-CHEMICAL PROPERTIES FOR ORGANIC POLLUTANTS F. Consolaro, P. Gramatica and S. Pozzi QSAR Research Unit, Dept. of Structural and Functional.
PROBABILISTIC ASSESSMENT OF THE QSAR APPLICATION DOMAIN Nina Jeliazkova 1, Joanna Jaworska 2 (1) IPP, Bulgarian Academy of Sciences, Sofia, Bulgaria (2)
Face Recognition and Biometric Systems
Artificial Neural Networks
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
ABSTRACT The BEAM EU research project focuses on the risk assessment of mixture toxicity. A data set of 124 heterogeneous chemicals of high concern as.
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Multivariate Methods Pattern Recognition and Hypothesis Testing.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
8 th Iranian workshop of Chemometrics 7-9 February 2009 Progress of Chemometrics in Iran Mehdi Jalali-Heravi February 2009 In the Name of God.
Clustering and MDS Exploratory Data Analysis. Outline What may be hoped for by clustering What may be hoped for by clustering Representing differences.
Community Ordination and Gamma Diversity Techniques James A. Danoff-Burg Dept. Ecol., Evol., & Envir. Biol. Columbia University.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Combining Statistical and Physical Considerations in Deriving Targeted QSPRs Using Very Large Molecular Descriptor Databases Inga Paster and Mordechai.
Non ionic organic pesticide environmental behaviour: ranking and classification F. Consolaro and P. Gramatica QSAR Research Unit, Dept. of Structural and.
Molecular Descriptors
Taking Raw Data Towards Analysis 1 iCSC2015, Vince Croft, NIKHEF Exploring EDA, Clustering and Data Preprocessing Lecture 2 Taking Raw Data Towards Analysis.
RESULT and DISCUSSION In order to find a relation between the three rate reaction constant (k OH, k NO3 and k O3 ) and the structural features of chemicals,
Marketing Research Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides.
ARROW: system for the evaluation of the status of waters in the Czech Republic Jiří Jarkovský 1) Institute of Biostatistics and Analyses, Masaryk University,
CONCLUSIONS CONCLUSIONS - Missing values of the principal physico-chemical properties are predicted by validated regression models by using different kinds.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Predicting Blood-Brain Permeation from Three-Dimensional Molecular Structure Patrizia Crivori, Gabriele Cruciani, Pierre-Alain Carrupt, and Bernard Testa.
The aquatic toxicity values of 57 esters, with experimental and predicted LC50 in fish, EC50 in Daphnia and seaweed and IGC in Entosiphon sulcatum, were.
Paola GRAMATICA a, Paola LORENZINI a, Angela SANTAGOSTINO b and Ezio BOLZACCHINI b a University of Insubria, Dep. of Structural and Functional Biology,
Identifying Applicability Domains for Quantitative Structure Property Relationships Mordechai Shacham a, Neima Brauner b Georgi St. Cholakov c and Roumiana.
DESIRABILITY OF POPs ACCORDING TO THEIR ATMOSPHERIC MOBILITY The main goal pursued in this work is the formulation of a POP ranking by atmospheric mobility.
TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS Council Directive 76/464/EEC of the European.
Martin Waldseemüller's World Map of 1507 Zanjan. Roberto Todeschini Viviana Consonni Davide Ballabio Andrea Mauri Alberto Manganaro chemometrics molecular.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
A B S T R A C T The study presents the application of selected chemometric techniques to the pollution monitoring dataset, namely, cluster analysis,
ABSTRACT The behavior and fate of chemicals in the environment is strongly influenced by the inherent properties of the compounds themselves, particularly.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
QSAR AND CHEMOMETRIC APPROACHES TO THE SCREENING OF POPs FOR ENVIRONMENTAL PERSISTENCE AND LONG RANGE TRANSPORT FOR ENVIRONMENTAL PERSISTENCE AND LONG.
In the name of GOD. Zeinab Mokhtari 1-Mar-2010 In data analysis, many situations arise where plotting and visualization are helpful or an absolute requirement.
Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors P.Gramatica, F.Consolaro and M.Pavan QSAR.
AMPS 2 Analysis and Monitoring of Priority Substances and Chemical Pollutants Second Meeting Ispra Review of Actions.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Log Koc = MW nNO – 0.19 nHA CIC MAXDP Ts s = 0.35 F 6, 134 = MW: molecular weight nNO: number of NO bonds.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
F.Consolaro 1, P.Gramatica 1, H.Walter 2 and R.Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental.
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.
P. Gramatica 1, H. Walter 2 and R. Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental Research.
Use of Machine Learning in Chemoinformatics
Mixture Pure substances REVIEW. Mixtures: 1.Two or more _____________or _____________ NOT chemically combined 2.No reaction between substances. 3.Mixtures.
BEAM Bridging Effect Assessment of Mixtures to ecosystem situations and regulation University of Bremen, Germany University of Göteborg, Sweden University.
Multivariate statistical methods Cluster analysis.
Advanced Strategies for Metabolomic Data Analysis Dmitry Grapov, PhD.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
Toxicity vs CHEMICAL space
Multivariate statistical methods
Organic Chemistry Lesson 21 X-ray crystallography.
PHYSICO-CHEMICAL PROPERTIES MODELLING FOR ENVIRONMENTAL POLLUTANTS
Principal Component Analysis (PCA)
Quality Control at a Local Brewery
Clustering and Multidimensional Scaling
P. Gramatica1, F. Consolaro1, M. Vighi2, A. Finizio2 and M. Faust3
Walking the Interactome for Prioritization of Candidate Disease Genes
M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini
Clustering The process of grouping samples so that the samples are similar within each group.
Machine Learning – a Probabilistic Perspective
Presentation transcript:

P. Gramatica and F. Consolaro QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy. Web-site: EEC PRIORITY LIST 1 The so-called “EEC Priority List 1” includes a large number of commercial chemicals dangerous to man and the environment. These chemicals, selected according to the Directive 76/464/EEC because of their environmental impact and diffusion, are very heterogeneous: they have unrelated structures and most of them have unknown mechanisms of action and type of effect. A final list of 202 compounds was obtained (by exclusion of chemicals impossible to study by our approach, i.e. inorganics, salts, etc., and by addition of some isomers of listed chemicals) and it was studied for structural similarity. DESCRIPTION OF MOLECULAR STRUCTURE A wide set of molecular descriptors is used here to describe the chemical structure of these compounds, the aim being to find an objective method to group such compounds on the basis of their structural aspects. In particular we have used: count descriptors that are the number of different kinds of atoms, functional groups, rings of different size or atom acceptors and donors of H-bonds; graph-invariants descriptors that include both topological and information indices and give information about the 2-D molecular structure and connectivity. molecular weight (MW) is always used. WHIM descriptors (1-2) that are molecular indices that represent different sources of chemical information about the whole 3D-molecular structure in terms of size, shape, symmetry and atom distribution. (1) R. Todeschini and P. Gramatica, 3D-modelling and prediction by WHIM descriptors. Part 5. Theory development and chemical meaning of the WHIM descriptors, Quant. Struct.-Act. Relat., 16 (1997) (2) R. Todeschini, WHIM-3D/QSAR- Software for the calculation of the WHIM descriptors, rel. 4.1 for Windows, Talete srl, Milano (Italy) Download: PREDICT PROJECT This work concerns the EEC PREDICT (Prediction and Assessment of the Aquatic Toxicity of Mixtures of Chemicals) project. Its objective is to provide a suitable means for the early identification of environmental risks resulting from the combined effects of chemical mixtures, specifically focussing on aquatic pollutants of concern, such as the List 1 chemicals. One need of the PREDICT project is the identification of an objective method to group compounds only according to their structural features, and then identify the most representative compounds for each group. For this reason we used the chemometric approach for the 202 studied compounds, our aim being to identify from 15 to 20 different structural groups. CHEMOMETRIC METHODS Several chemometric analyses have been applied to the compounds (represented by molecular descriptors) to group the more similar ones, in accordance with a multivariate structural approach. The analyses performed are: Hierarchical Cluster Analysis: Hierarchical Cluster Analysis: hierarchical clustering was performed with the aim of finding clusters of the studied compounds in high dimensional space, using molecular descriptors as variables. Different distance metrics (Euclidean, Manhattan, Pearson) and different linkages (Complete, average, single, etc.) were used and compared to find the best way to cluster these compounds. Principal Component Analysis (PCA): Principal Component Analysis (PCA): this analysis was used to calculate just a few components from a large number of variables. These components allow the highlighting of the distribution of the compounds according to structure, and find the similarity between compounds assigned to the same cluster. Kohonen Maps: Kohonen Maps: this is an additional way of mapping similar compounds by using the so-called “self- organized topological feature maps”, which are maps that preserve the topology of a multidimensional representation within a toroidal two-dimensional representation. The position of the compounds in this map shows the similarity level of the structure of the List 1 compounds. Dendrogram of hierarchical cluster analysis. Euclidean distance - complete linkage. Variables = first 10 structural principal components RANKING The reported structural analyses allowed the identification of some groups of the more similar List 1 compounds. These groups reflect only structural patterns of molecules, without any evaluation of their activities. Taking into account that the aim of this work was the finding of the most representative compounds, with different structural aspects, from among the 202 List 1 molecules, we have proposed (to Rolf Altenburger, our PREDICT project partner) some possible candidate compounds: 1) dieldrin [71] or endrin [77] 2) 2,4,5-trichlorophenol [122] 3) naphtalene [96] 4) phoxime [103] 5) biphenyl [11] or benzidine [8] 6)  or  -hexachlorocyclohexane [respectively 85 and 85b] 7) chloroacetic acid [16] or 1,3-dichloropropane-2-ol [66] or epichlorohydrins [78] 8) 2,4-D [45] or simazine [106] or atrazine [130suppl] 9) 1,1,2,2-tetrachloroethane [110] or 1,2-dichloroethane [59] 10) fluoranthene [99] or benzo(b)fluoranthene [99c] 11) one of the 17 PCB 12) triphenyltin chloride [126] or triphenyltin acetate [125] 13) tributhyltin oxide [115] DISCUSSION The considered cluster analysis method is the one performed with Euclidean distance and Complete linkage, made on the first 10 principal components of our molecular descriptors. These components explain about the 84% of the total information regarding structural variability, so noise and not very important information is excluded. Analogously a PCA has been performed on all molecular descriptors and the compounds clustering in different groups have been highlighted with different colours. This analysis allows us to highlights the distribution of all compounds and the relationships existing between the different identified structural clusters. A same approach based on the first 10 PCA as input variables was used to perform a Kohonen Map that shows the position of all 202 considered compounds, according to their structural features. Also in this case different colours were used to discriminate between the different structural clusters. Benzene derivatives (2) Chloroaliphatic compounds (7) DDT - PCBs (11) Organo-phosphates (12) Phen.-Triaz. (10) PAH (15) Chlorinated aliphatics (9) 0 2s/P005