CONCLUSIONS CONCLUSIONS - Missing values of the principal physico-chemical properties are predicted by validated regression models by using different kinds.

Slides:



Advertisements
Similar presentations
Real Gases. The ideal gas equation of state is not sufficient to describe the P,V, and T behaviour of most real gases. Most real gases depart from ideal.
Advertisements

Analysis of High-Throughput Screening Data C371 Fall 2004.
Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
C A INTRODUCTION An Environmental Quality Objective (EQO), intended as a real “No Effect Concentration” (NEC), is not accessible experimentally. The usual.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
CE Introduction to Environmental Engineering and Science Readings for This Class: O hio N orthern U niversity Introduction Chemistry, Microbiology.
Chapter 3 Stoichiometry. Section 3.1 Atomic Masses Mass Spectrometer – a device used to compare the masses of atoms Average atomic mass – calculated as.
The Gaseous State Chapter 5.
Appendix 3 Frank Wania Evaluating Persistence and Long Range Transport Potential of Organic Chemicals Using Multimedia Fate Models.
MODELLING OF PHYSICO-CHEMICAL PROPERTIES FOR ORGANIC POLLUTANTS F. Consolaro, P. Gramatica and S. Pozzi QSAR Research Unit, Dept. of Structural and Functional.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
ABSTRACT The BEAM EU research project focuses on the risk assessment of mixture toxicity. A data set of 124 heterogeneous chemicals of high concern as.
Basic Steps of QSAR/QSPR Investigations
A Study on Feature Selection for Toxicity Prediction*
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
Quantitative Structure-Activity Relationships (QSAR)  Attempts to identify and quantitate physicochemical properties of a drug in relation to its biological.
Non ionic organic pesticide environmental behaviour: ranking and classification F. Consolaro and P. Gramatica QSAR Research Unit, Dept. of Structural and.
Molecular Descriptors
BINF6201/8201 Principle components analysis (PCA) -- Visualization of amino acids using their physico-chemical properties
Ch 23 pages Lecture 15 – Molecular interactions.
Chapter 10 & 11 Chemical quantities and Chemical Reactions.
RESULT and DISCUSSION In order to find a relation between the three rate reaction constant (k OH, k NO3 and k O3 ) and the structural features of chemicals,
Sung Kyu (Andrew) Maeng. Contents  QSAR Introduction  QSBR Introduction  Results and discussion  Current QSAR project in UNESCO-IHE.
Surveillance monitoring Operational and investigative monitoring Chemical fate fugacity model QSAR Select substance Are physical data and toxicity information.
ABSTRACT Bioconcentration by aquatic biota is an important factor in assessing the environmental behaviour and potential hazard evaluation of a chemical,
Chapter 10; Gases. Elements that exist as gases at 25 0 C and 1 atmosphere.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
The aquatic toxicity values of 57 esters, with experimental and predicted LC50 in fish, EC50 in Daphnia and seaweed and IGC in Entosiphon sulcatum, were.
Paola GRAMATICA a, Paola LORENZINI a, Angela SANTAGOSTINO b and Ezio BOLZACCHINI b a University of Insubria, Dep. of Structural and Functional Biology,
Identifying Applicability Domains for Quantitative Structure Property Relationships Mordechai Shacham a, Neima Brauner b Georgi St. Cholakov c and Roumiana.
Ligand-based drug discovery No a priori knowledge of the receptor What information can we get from a few active compounds.
DESIRABILITY OF POPs ACCORDING TO THEIR ATMOSPHERIC MOBILITY The main goal pursued in this work is the formulation of a POP ranking by atmospheric mobility.
TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS Council Directive 76/464/EEC of the European.
Martin Waldseemüller's World Map of 1507 Zanjan. Roberto Todeschini Viviana Consonni Davide Ballabio Andrea Mauri Alberto Manganaro chemometrics molecular.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
A B S T R A C T The study presents the application of selected chemometric techniques to the pollution monitoring dataset, namely, cluster analysis,
The Gas State  Gases are everywhere – atmosphere, environmental processes, industrial processes, bodily functions  Gases have unique properties from.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
ABSTRACT The behavior and fate of chemicals in the environment is strongly influenced by the inherent properties of the compounds themselves, particularly.
P. Gramatica and F. Consolaro QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy.
QSAR AND CHEMOMETRIC APPROACHES TO THE SCREENING OF POPs FOR ENVIRONMENTAL PERSISTENCE AND LONG RANGE TRANSPORT FOR ENVIRONMENTAL PERSISTENCE AND LONG.
Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors P.Gramatica, F.Consolaro and M.Pavan QSAR.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Selection of Molecular Descriptor Subsets for Property Prediction Inga Paster a, Neima Brauner b and Mordechai Shacham a, a Department of Chemical Engineering,
Log Koc = MW nNO – 0.19 nHA CIC MAXDP Ts s = 0.35 F 6, 134 = MW: molecular weight nNO: number of NO bonds.
A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,
F.Consolaro 1, P.Gramatica 1, H.Walter 2 and R.Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental.
Review Session BS123A/MB223 UC-Irvine Ray Luo, MBB, BS.
Unit 1 How do we distinguish substances?
MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.
P. Gramatica 1, H. Walter 2 and R. Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental Research.
Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.
Use of Machine Learning in Chemoinformatics
Intermolecular Forces Topic 4.3. Intermolecular Forces Intramolecular forces – refer to the forces that hold atoms together within molecules or formula.
WELCOME STUDENTS Mobile : Skype: aamarpali.puri.
1 Prediction of Phase Equilibrium Related Properties by Correlations Based on Similarity of Molecular Structures N. Brauner a, M. Shacham b, R.P. Stateva.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Chemical Kinetics. The branch of Physical chemistry which deals with the rate of reactions is called chemical kinetics. The study of chemical kinetics.
Intermolecular Forces
PHYSICO-CHEMICAL PROPERTIES MODELLING FOR ENVIRONMENTAL POLLUTANTS
Hierarchical Classification of Calculated Molecular Descriptors
Lecture 49 More on Phase Transition, binary system
Thermal Properties of Matter
Virtual Screening.
P. Gramatica1, F. Consolaro1, M. Vighi2, A. Finizio2 and M. Faust3
An Introduction to Correlational Research
M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini
Presentation transcript:

CONCLUSIONS CONCLUSIONS - Missing values of the principal physico-chemical properties are predicted by validated regression models by using different kinds of molecular descriptors. - Ranking of POPs according to their atmospheric mobility tendency is carried out by means of two multivariate approaches: a) Principal Component Analysis of predicted physico-chemical data for long range-transport potential; b) Multicriteria Decision Making, taking also into account the atmospheric half-life and the sorption on the atmospheric particles, for more general environmental behaviour. - POP classifications in 4 mobility classes allow the direct and simple assessment of POP environmental behaviour (for both existing and new chemicals) from the molecular structure only. Paola Gramatica, Stefano Pozzi, Federica Consolaro and Roberto Todeschini § QSAR Research Unit, Dep. of Structural and Functional Biology, University of Insubria, via Dunant 3, Varese (Italy) § Milano Chemometric and QSAR Res. Group, Dep. of Environmental Sciences, University of Milano-Bicocca, via Emanueli 15, Milano (Italy) web-site: INTRODUCTION Persistent Organic Pollutants (POPs), particularly PAH, PCB, polychlorinated dibenzo-p-dioxins and some pesticides, are organic compounds that bioaccumulate and resist photolytic, chemical or biological degradation. The discovery that these compounds can move for thousands of kilometres from the point of release, by the so-called “grasshopper effect” (Figure 1), has shown their global distribution behaviour. The POPs capable of exhibiting this long-range transport potential are nowadays the focus of various national and international regulatory initiatives (UNECE, UNEP, etc.) and have been the object of the recent SETAC Pellston Workshop. POP environmental behaviour is clearly controlled by a variety of physical and chemical processes, that can be analyzed with the study of properties such as vapour pressure (vp), Henry’s law constant (H), various partition coefficients (Kow, Koc,...), water solubility (S), atmospheric half-life, etc. Unfortunately, for a large number of POPs, the experimental data for several properties remain unknown, thus regression models, according to the QSAR/QSPR strategies, need to be developed to predict the missing values. A rank of POPs, according to their atmospheric mobility is possible by multivariate approaches (PCA and Multicriteria Decision Making) based on these predicted data. A classification of POPs in mobility classes by few molecular descriptors of chemical structure will allow a fast screening of existing and new compounds. MOLECULAR DESCRIPTORS The QSAR/QSPR approach is applied to predict the values of several physico-chemical properties (lacking for a large number of POPs) by regression models using different kinds of molecular descriptors for the structural representation of the studied compounds. Molecular descriptors represent the way chemical information contained in the molecular structure is transformed and coded. Among the theoretical descriptors, the best known, obtained simply from the knowledge of the formula, are: molecular weight and count descriptors (1D-descriptors, i. e. counting of bonds, atoms of different kind, presence or counting of functional groups and fragments, etc.). Graph-invariant descriptors (2D-descriptors, including both topological and information indices), are obtained from the knowledge of the molecular topology. WHIM molecular descriptors [1] contain information about the whole 3D-molecular structure in terms of size, symmetry and atom distribution. All these indices are calculated [2] from the (x,y,z)-coordinates of a three-dimensional structure of a molecule, usually from a spatial conformation of minimum energy: 37 non-directional (or global) and 66 directional WHIM descriptors are obtained. A complete set of about two hundred molecular descriptors has been obtained. Being our representation of a chemical based on a number of molecular descriptors, an effective variable selection strategy GA-VSS (Genetic Algorithm - Variable Subset Selection) was applied to the whole set of descriptors in order to set out the most relevant variables in modelling POP properties by Ordinary Least Squares regression (OLS), maximising the predictive power (Q2 LOO) [3]. Models with good predictive performances (Q2 LOO = 78-96%) are obtained for all the physico-chemical properties and the atmospheric half-life, thus achieving reliable data for 87 compounds. [1] Todeschini R. and Gramatica P.; Quant.Struct.-Act.Relat. 1997, 16, [2] Todeschini R. - WHIM-3D / QSAR - Software for the calculation of the WHIM descriptors. rel. 4.1 for Windows, Talete srl, Milan (Italy) Download: [3] Todeschini R. - Moby Digs - Software for Variable Subset Selection by Genetic Algorithms. Rel. 1.0 for Windows, Talete srl, Milan (Italy) PRINCIPAL COMPONENT ANALYSIS The biplot of principal component analysis (Figure 2) for 87 POP (Tab. 1), described by the principal physico-chemical properties (boiling point, melting point, logKow, logKoc, Henry’s law constant, TSA, Vmol, water solubility, vapour pressure) and the atmospheric half-life, shows a distribution of compounds along the first component (PC1, EV = 70.4%) according to the global mobility categories assigned by Wania and Mackay[4]. Consequently, it is possible to use the PC1 score values to classify all the 87 POPs in one of the four classes of mobility (high, relatively high, relatively low and low mobility), defined by the marked cut-off. This PCA model doesn’t include the atmospheric half-life, because this property is represented only in the second component, thus it is a model capable to describe the relative long-range transport potential of POPs. [4] Wania F. and Mackay D., Environ. Sci. Technol., Vol. 30, NO. 9, 1996 Categories of Wania and Mackay Some of the variables represented in the first principal component Not represented in the first PC FIGURE 2 “DESIRABILITY” OF POP s ACCORDING TO THEIR ATMOSPHERIC MOBILITY The main goal of this work is a suggestion of POP ranking according to their atmospheric mobility. In addition to the long-range transport potential, modelled by PC1, the compound half-life and their sorption on the atmospheric particles must be considered because of their influence on the atmospheric mobility. A chemometric strategy known as “Multicriteria Decision Making“, particularly the desirability functions, was used for these purposes. Less mobile POPs are considered in this work as more desirable. Thus we apply the following criteria: first principal component (PC1) score values as mobility index: optimum = low values logKoc as atmospheric particle sorption index: optimum = high values atmospheric half-life values: optimum = low values The desirability values for each compounds were calculated by a linear function. In figure 3 the desirability values were plotted as a function of the molecule ID; the compound mobility trend appears very similar to the real world distribution. TABLE High mobility Relat. high mobility Relat. Low mobility Low mobility GRASSHOPPER EFFECT DDT HCB FIGURE 1 Selected molecular descriptors C = count descriptors T = topological descriptors W-DIR = directional WHIM descriptors W-ND = no directional WHIM descriptors NAT= number of atoms (C) NBO = number of bonds (C) CHI0 = Randic chi-0 (T) CHI1A = Randic chi-1 (average) (T) GSI = Gordon-Scatlebury index (T) BAL = Balaban index (T) IAC = index of atomic composition (T) IDDE = total index on equivalence of degrees (T) DELS = total electrotopological difference (T) ROUV = Rouvray index (T) MAXDP = maximum electrotop. difference (T) MW = molecular weight NCl = number of Cl (C) L1m = dimension along the first component with atomic mass weight (W-DIR) L2s = size along the second component with electrotopological weight (W-DIR) E1u, E3u = density along respectively the first and the third dimension with unit weight (W-DIR) P2u = shape along the second component with unit weight (W-DIR) P1v = shape along the first component with van der Waals volume weight (W-DIR) Ts, Tm = size (eigenvalue sum) with respectively atomic mass and electrotopological weight (W-ND) Av = size (cross-term eigenvalue sum) with van der Waals volume weight (W-ND) Vu = size (complete eigenvalue expression) with unit weight (W-ND) LDA MOLECULAR DESCRIPTORS: CHI0 IAC DELS ROUV MAXDP MW NCl L1m E3u P1v L2s Ts Tm Vu Av CART (162 DESCRIPTORS) MR cv = % = 0.5 ;  = 0.0 = 0.5 ;  = 0.0 MR cv = % K = 3 MR cv = % RDA KNN KNN CLASSIFICATIONS A POP classification according to their environmental behaviour was made by means of several classification methods (CART, K-NN and RDA). The a priori classes were obtained from the desirability values: high mobility (class 1)  0.33, relatively high (cla 2) = [ ], relatively low (cla 3) = [ ], low mobility (cla 4) > All the classification methods give models with satisfactory prediction power (results below). The simplest model, and consequently the most directly applicable, is developed with CART (figure 4): the selected descriptors are mainly related to the molecular size. In Tab. 1 are reported the a priori classes (cla) and the predicted classes by each model for all POPs: most of the compounds have been assigned to the same class by all the applied classification methods, only compounds at the border of two contiguous classes have a different classification. Nevertheless, it must be noted that no compounds have been assigned to not-adjacent classes and the cut-off values are not strictly definable. CLASSIFICATION NOMMR = % FIGURE 4 5 FIGURE 3 High mobility Low mobility