The aquatic toxicity values of 57 esters, with experimental and predicted LC50 in fish, EC50 in Daphnia and seaweed and IGC in Entosiphon sulcatum, were.

Slides:



Advertisements
Similar presentations
Analysis of High-Throughput Screening Data C371 Fall 2004.
Advertisements

C A INTRODUCTION An Environmental Quality Objective (EQO), intended as a real “No Effect Concentration” (NEC), is not accessible experimentally. The usual.
June 19, Proposal: An overall Plan Design to obtain answer to the research questions or problems Outline the various tasks you plan to undertake.
MODELLING OF PHYSICO-CHEMICAL PROPERTIES FOR ORGANIC POLLUTANTS F. Consolaro, P. Gramatica and S. Pozzi QSAR Research Unit, Dept. of Structural and Functional.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
1 Development & Evaluation of Ecotoxicity Predictive Tools EPA Development Team Regional Stakeholder Meetings January 11-22, 2010.
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
Outline 1) Objectives 2) Model representation 3) Assumptions 4) Data type requirement 5) Steps for solving problem 6) A hypothetical example Path Analysis.
ABSTRACT The BEAM EU research project focuses on the risk assessment of mixture toxicity. A data set of 124 heterogeneous chemicals of high concern as.
Basic Steps of QSAR/QSPR Investigations
A Study on Feature Selection for Toxicity Prediction*
4 Th Iranian chemometrics Workshop (ICW) Zanjan-2004.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Predicting Highly Connected Proteins in PIN using QSAR Art Cherkasov Apr 14, 2011 UBC / VGH THE UNIVERSITY OF BRITISH COLUMBIA.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Combining Statistical and Physical Considerations in Deriving Targeted QSPRs Using Very Large Molecular Descriptor Databases Inga Paster and Mordechai.
Non ionic organic pesticide environmental behaviour: ranking and classification F. Consolaro and P. Gramatica QSAR Research Unit, Dept. of Structural and.
RESULT and DISCUSSION In order to find a relation between the three rate reaction constant (k OH, k NO3 and k O3 ) and the structural features of chemicals,
Sung Kyu (Andrew) Maeng. Contents  QSAR Introduction  QSBR Introduction  Results and discussion  Current QSAR project in UNESCO-IHE.
Surveillance monitoring Operational and investigative monitoring Chemical fate fugacity model QSAR Select substance Are physical data and toxicity information.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
ABSTRACT Bioconcentration by aquatic biota is an important factor in assessing the environmental behaviour and potential hazard evaluation of a chemical,
CONCLUSIONS CONCLUSIONS - Missing values of the principal physico-chemical properties are predicted by validated regression models by using different kinds.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Paola GRAMATICA a, Paola LORENZINI a, Angela SANTAGOSTINO b and Ezio BOLZACCHINI b a University of Insubria, Dep. of Structural and Functional Biology,
Identifying Applicability Domains for Quantitative Structure Property Relationships Mordechai Shacham a, Neima Brauner b Georgi St. Cholakov c and Roumiana.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
DESIRABILITY OF POPs ACCORDING TO THEIR ATMOSPHERIC MOBILITY The main goal pursued in this work is the formulation of a POP ranking by atmospheric mobility.
TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS Council Directive 76/464/EEC of the European.
Martin Waldseemüller's World Map of 1507 Zanjan. Roberto Todeschini Viviana Consonni Davide Ballabio Andrea Mauri Alberto Manganaro chemometrics molecular.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.
ABSTRACT The behavior and fate of chemicals in the environment is strongly influenced by the inherent properties of the compounds themselves, particularly.
P. Gramatica and F. Consolaro QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy.
QSAR AND CHEMOMETRIC APPROACHES TO THE SCREENING OF POPs FOR ENVIRONMENTAL PERSISTENCE AND LONG RANGE TRANSPORT FOR ENVIRONMENTAL PERSISTENCE AND LONG.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors P.Gramatica, F.Consolaro and M.Pavan QSAR.
Unsupervised Forward Selection A data reduction algorithm for use with very large data sets David Whitley †, Martyn Ford † and David Livingstone †‡ † Centre.
McKim Conference on Predictive Toxicology
Selection of Molecular Descriptor Subsets for Property Prediction Inga Paster a, Neima Brauner b and Mordechai Shacham a, a Department of Chemical Engineering,
Log Koc = MW nNO – 0.19 nHA CIC MAXDP Ts s = 0.35 F 6, 134 = MW: molecular weight nNO: number of NO bonds.
O PTIMAL NANO - DESCRIPTORS AS TRANSLATORS OF ECLECTIC DATA INTO PREDICTION OF THE CELL MEMBRANE DAMAGE BY MEANS OF NANO METAL - OXIDES A LLA P. T OROPOVA.
ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 108 (2014) 203–209 ACKNOWLEDGMENTS WE THANK THE EC PROJECT NANOPUZZLES (PROJECT REFERENCE: ) Optimal descriptor.
F.Consolaro 1, P.Gramatica 1, H.Walter 2 and R.Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental.
MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.
THE PRAGMATICS OF CORRELATION OR HOW MODELS RESHAPE THE GOVERNMENT OF TECHNICAL OBJECTS BRICE LAURENT & FRANÇOIS THOREAU CENTRE DE SOCIOLOGIE DE L’INNOVATION,
P. Gramatica 1, H. Walter 2 and R. Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental Research.
A molecular descriptor database for homologous series of hydrocarbons ( n - alkanes, 1-alkenes and n-alkylbenzenes) and oxygen containing organic compounds.
Chance Correlation in QSAR studies Ahmadreza Mehdipour Medicinal & Natural Product Chemistry Research Center.
Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.
Use of Machine Learning in Chemoinformatics
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
General Concepts in QSAR for Using the QSAR Application Toolbox
CHAPTER 3 Describing Relationships
QSAR Application Toolbox: Step 12: Building a QSAR model
PHYSICO-CHEMICAL PROPERTIES MODELLING FOR ENVIRONMENTAL POLLUTANTS
General Concepts in QSAR for Using the QSAR Application Toolbox
Background This is a step-by-step presentation designed to take the first time user of the Toolbox through the workflow of a data filling exercise.
Hierarchical Classification of Calculated Molecular Descriptors
IS6000 – Class 10 Introduction to SmartPLS (&SPSS)
Interpreting Principal Components
P. Gramatica1, F. Consolaro1, M. Vighi2, A. Finizio2 and M. Faust3
Tutorial 8 Table 3.10 on Page 76 shows the scores in the final examination F and the scores in two preliminary examinations P1 and P2 for 22 students in.
An Introduction to Correlational Research
M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini
Presentation transcript:

The aquatic toxicity values of 57 esters, with experimental and predicted LC50 in fish, EC50 in Daphnia and seaweed and IGC in Entosiphon sulcatum, were studied in the Principal Component space. The first component was found to be the most important with 61.8% of explained variance and can be considered as a general index of aquatic toxicity. In order to have a fast method to rank the esters according to their aquatic toxicity, the PC1 was modeled by theoretical molecular descriptors. The best model, selected by Genetic Algorithm, was verified for stability and predictivity by internal and external validation. Gramatica, P., Battaini, F., Papa, E. QSAR and Environmental Chemistry Research Unit, University of Insubria, Varese (Italy). Web: QSAR PREDICTION OF AQUATIC TOXICITY OF ESTERS INTRODUCTION A large number of compounds (more than 100,000) are currently in common use, and about 2,000 new ones appear each year. No data are available for the majority of these compounds so we have no understanding of their environmental fate, their behavior or effects [1]. This general lack of knowledge has led to the European Commission adopting a “White Paper on a strategy for a future Community Policy for Chemicals” [2]. This Directive requires, at the latest by the end of 2005, physico-chemicals data and toxicity data for HPV (High Production Volume) compounds with production volume of 1,000 tonnes/year. Among the HPV compounds the class of esters is one of the largest and environmentally most “interesting”. Some esters, i.e. phthalates, are known for their weak carcinogenic and estrogenic effects [3], thus, there is a need to identify these compounds to assess their potential health hazard and their impact on the environment. The aim of our research was to develop “local” QSAR (Quantitative Structure-Activity Relationship) models to rapidly predict the toxicity of esters. As this prediction is based simply on knowing molecular structure, the approach could be applied usefully to new chemicals, even those not yet synthesised, if they belong to the chemical domain of the training set. In this case it is possible to reduce the cost and the time needed for experimental data. RESULTS AND DISCUSSION The more relevant molecular descriptors, calculated by the DRAGON software, were select by Genetic Algorithm (GA – Variable Subset Selection). For each end-points the best model was validated with more validation techniques: Leave-one-out using QUIK rule (Q Under Influence of K (18)) to avoid chance correlation. Strongest validation using leave-many-out procedure (15-30%). Y scrambling ( permutation testing by recalculating models for randomly reordered response ). The models were not all validated externally owing to the small sets studied (14-30 obj.). The reliability of the predictions was always checked by the leverage approach in order to verify the chemical domain of the models. The regression lines of the fish and Daphnia models are reported (outliers and influential chemicals are highlighted). Table 1 shows the performance of the best models for each end-point. ABSTRACT Esters are an important class of industrial chemicals, for which the EU-Directive “White Paper on a strategy for a future Community Policy for Chemicals” requires toxicity data by, at the latest, the end of The object of the study was to develop QSAR models to rapidly predict the aquatic toxicity of esters. Unfortunately the experimental toxicity data are not known for a large number of these compounds or, if known, the data are not all homogeneous, hindering an accurate and comparable evaluation of the toxicological behaviour of the considered compounds. Different theoretical molecular descriptors (1D-constitutional, 2D-topological, and different 3D-descriptors) are calculated by the DRAGON software. The Genetic Algorithm (GA-Variable Subset Selection) is used to select the more relevant molecular descriptors in the modelling by Ordinary Least Squares (OLS) regression. The studied end- points are: LC50 in Pimephales promelas, EC50 in Daphnia magna and in seaweed, IGC50 in Entosiphon sulcatum and chronic toxicity in Daphnia magna. The best models were validated for their predictive performance using leave- one-out (Q 2 LOO =70-90%), leave-many-out (30% of perturbation, Q 2 LMO =70-90%) and the scrambling of the responses. The models were not all externally validated owing to the small dimension (14-30) of the studied sets. The reliability of the predictions was always checked by the leverage approach in order to verify the chemical domain of the models. A PCA model, based on four acute toxicity end-points, has been proposed to evaluate the trend of aquatic toxicity for the studied esters. The PC1 score is also modelled by theoretical molecular descriptors (Q 2 LOO =89%, Q 2 LMO =88%): this last model can be used as an evaluative method for screening esters according to their aquatic toxicity, just starting from their molecular structure. MOLECULAR DESCRIPTORS The molecular structure of the studied compounds was described using several molecular descriptors calculated by the DRAGON software [8]:  descriptors 0D – costitutional descriptors (atoms and group counts)  descriptors 1D – functional groups, atom centered fragments and empirical descriptors  descriptors 2D – BCUTs, Galvez indices from the adjacency matrix, walk counts, various autocorrelations from the molecular graph and topological descriptors.  descriptors 3D – Randic molecular profiles from the geometry matrix, WHIMs, GETAWAY and geometrical descriptors CHEMOMETRIC METHODS Multiple Linear Regression analysis and variable selection were performed by the software MOBY DIGS [9] using the Ordinary Least Square Regression (OLS) method and GA-VSS (Genetic Algorithm-Variable Subset Selection) [10]. All the calculations were performed using the leave-one-out (LOO) and leave-many-out (LMO) procedures and the response scrambling for the internal validation of the models. External validation [11-12] was performed on a validation set obtained with the splitting at 75% of the original data set by Experimental Design procedure, applying the software DOLPHIN of Todeschini et al [13]. Tools of regression diagnostics as residual plots and Williams plots were used to check the quality of the best models and define their applicability regarding to the chemical domain, using the chemometric package SCAN [14]. RMS (residual mean squares) are also reported for model comparison with ECOSAR [15]. EXPERIMENTAL DATA The studied end-points are: LC50 in Pimephales promelas, EC50 in Daphnia magna, in Pseudomonas and in seaweed, IGC50 in Entosiphon sulcatum, in Scenedesmus and in Pseudomonas. Also studied was the chronic toxicity of phthalates in Daphnia magna. The experimental data were taken from literature [4-7], reported in mmol/L and transformed in logarithmic units. MATERIALS & METHODS Log (1/EC50) in Daphnia magna Log (1/LC50) in Fish For comparison purposes the RMS (Residual Mean Squares) values are reported only for LC50 in fish and EC50 in seaweed as the other end-points are not included in the ECOSAR software. The ECOSAR models for LC50 in fish and our new models show similar performance; but the EPA model for EC50 in seaweed has the biggest RMS (tab.2). This result appears particulary satisfactory considering that EPIWIN model was obtained on a training set bigger than our data set. Tab.1 – Model Performances Tab.2 – Comparison of models Aquatic Toxicity n.obj=43 R 2 =91.5% Q 2 =89.9% Q 2 LMO30% =89.9% Q 2 EXT =95.6%CONCLUSIONS  New predictive “local” models for ecotoxicity end-points of esters are proposed.  These models are based only on theoretical molecular descriptors selected by Genetic Algorithm.  All models have good predictive power, verified by internal validation techniques.  Principal Component Analysis has been used to propose an esters ranking for global aquatic toxicity for 4 acute toxicity end-points (LC50 in fish, EC50 in Daphnia magna and in seaweed, IGC in Entosiphon sulcatum).  The PC1 score highlights the global trend of aquatic toxicity and is modelled by theoretical molecular descriptors. This model can be used for the screening and ranking of esters according to their global toxicity, just starting from their structure.  The application of those models reduces animal testing and minimises the time and money needed for experimental data. REFERENCES [1 ] Gramatica P., Fine Chemicals and Intermediates technologies (Chemistry Today), 1991, 18-24; [2] [3] Thomsen M. and al. Chemophere, 1999, 38, [4] Cash G.G.and Clements R.G., SAR and QSAR in Environmental Research, 1996, 5, ; [5] European Commission – Joint Research Centre IUCLID CD-ROM, 2000; [6] Verschueren K., Handbook of Environmental Data on Organic Chemicals, 1983, 2th Edition, Van Nostrand Reinhold [7] Rhodes J.E. and al., Environmental Toxicology and Chemistry, 1995, 14, [8] Todeschini R., Consonni V. and Pavan E DRAGON – Software for the calculation of molecular descriptors, rel for Windows. Free download available at [9] Todeschini, R., Moby Digs - Software for multilinear regression analysis and variable subset selection by Genetic Algorithm, rel. 2.3 for Windows, Talete srl, Milan (Italy); [10] Leardi, R.; Boggia, R.; Terrile, M.,. J. Chemom., 1992, 6, ; [11] Wold, S. Eriksson, L. Chemometric Methods in Molecular Design, 1995, VCH, Germany, ; [12] Golbraikh, A. Tropsha, A., J. Mol. Graph and Mod., 2002, 20, [13] Todeschini, R. and Mauri, A., 2000; DOLPHIN- Software for Optimal Distance-based Experimental Design rel 1.1 for Windows, Talete srl, Milan (Italy); [14] SCAN- Software for Chemometric Analysis, rel. 1.1 for Windows, Jerll. Inc., Standard, CA, 1992; [15] ECOSAR in EPIWIN-EPI Suite 2001, Ver.3.10, Environmental Protection Agency (