ABSTRACT The BEAM EU research project focuses on the risk assessment of mixture toxicity. A data set of 124 heterogeneous chemicals of high concern as.

Slides:



Advertisements
Similar presentations
SADC Course in Statistics Revision of key regression ideas (Session 10)
Advertisements

C A INTRODUCTION An Environmental Quality Objective (EQO), intended as a real “No Effect Concentration” (NEC), is not accessible experimentally. The usual.
Design of Experiments Lecture I
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Everardo Macias, Patrick Tomboc Eamonn F. Healy, Chemistry Department,
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
MODELLING OF PHYSICO-CHEMICAL PROPERTIES FOR ORGANIC POLLUTANTS F. Consolaro, P. Gramatica and S. Pozzi QSAR Research Unit, Dept. of Structural and Functional.
S-SENCE Signal processing for chemical sensors Martin Holmberg S-SENCE Applied Physics, Department of Physics and Measurement Technology (IFM) Linköping.
Artificial Neural Networks
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Basic Steps of QSAR/QSPR Investigations
4 Th Iranian chemometrics Workshop (ICW) Zanjan-2004.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
QSAR Modelling of Carcinogenicity for Regulatory Use in Europe Natalja Fjodorova, Marjana Novič, Marjan Vračko, Marjan Tušar, National institute of Chemistry,
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Predicting Highly Connected Proteins in PIN using QSAR Art Cherkasov Apr 14, 2011 UBC / VGH THE UNIVERSITY OF BRITISH COLUMBIA.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Combining Statistical and Physical Considerations in Deriving Targeted QSPRs Using Very Large Molecular Descriptor Databases Inga Paster and Mordechai.
Non ionic organic pesticide environmental behaviour: ranking and classification F. Consolaro and P. Gramatica QSAR Research Unit, Dept. of Structural and.
Crowdsourcing Predictors of Behavioral Outcomes. Abstract Generating models from large data sets—and deter¬mining which subsets of data to mine—is becoming.
RESULT and DISCUSSION In order to find a relation between the three rate reaction constant (k OH, k NO3 and k O3 ) and the structural features of chemicals,
Surveillance monitoring Operational and investigative monitoring Chemical fate fugacity model QSAR Select substance Are physical data and toxicity information.
Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.
ABSTRACT Bioconcentration by aquatic biota is an important factor in assessing the environmental behaviour and potential hazard evaluation of a chemical,
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Predicting a Variety of Constant Pure Compound Properties by the Targeted QSPR Method Abstract The possibility of obtaining a reliable prediction a wide.
CONCLUSIONS CONCLUSIONS - Missing values of the principal physico-chemical properties are predicted by validated regression models by using different kinds.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
The aquatic toxicity values of 57 esters, with experimental and predicted LC50 in fish, EC50 in Daphnia and seaweed and IGC in Entosiphon sulcatum, were.
Paola GRAMATICA a, Paola LORENZINI a, Angela SANTAGOSTINO b and Ezio BOLZACCHINI b a University of Insubria, Dep. of Structural and Functional Biology,
Identifying Applicability Domains for Quantitative Structure Property Relationships Mordechai Shacham a, Neima Brauner b Georgi St. Cholakov c and Roumiana.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
DESIRABILITY OF POPs ACCORDING TO THEIR ATMOSPHERIC MOBILITY The main goal pursued in this work is the formulation of a POP ranking by atmospheric mobility.
TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS Council Directive 76/464/EEC of the European.
Martin Waldseemüller's World Map of 1507 Zanjan. Roberto Todeschini Viviana Consonni Davide Ballabio Andrea Mauri Alberto Manganaro chemometrics molecular.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
Spatially Assessing Model Error Using Geographically Weighted Regression Shawn Laffan Geography Dept ANU.
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.
Taguchi. Abstraction Optimisation of manufacturing processes is typically performed utilising mathematical process models or designed experiments. However,
ABSTRACT The behavior and fate of chemicals in the environment is strongly influenced by the inherent properties of the compounds themselves, particularly.
P. Gramatica and F. Consolaro QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy.
QSAR AND CHEMOMETRIC APPROACHES TO THE SCREENING OF POPs FOR ENVIRONMENTAL PERSISTENCE AND LONG RANGE TRANSPORT FOR ENVIRONMENTAL PERSISTENCE AND LONG.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors P.Gramatica, F.Consolaro and M.Pavan QSAR.
Selection of Molecular Descriptor Subsets for Property Prediction Inga Paster a, Neima Brauner b and Mordechai Shacham a, a Department of Chemical Engineering,
Log Koc = MW nNO – 0.19 nHA CIC MAXDP Ts s = 0.35 F 6, 134 = MW: molecular weight nNO: number of NO bonds.
A "Reference Series" Method for Prediction of Properties of Long-Chain Substances Inga Paster and Mordechai Shacham Dept. Chem. Eng. Ben-Gurion University.
F.Consolaro 1, P.Gramatica 1, H.Walter 2 and R.Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental.
MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.
Slide 1 Regression Assumptions and Diagnostic Statistics The purpose of this document is to demonstrate the impact of violations of regression assumptions.
P. Gramatica 1, H. Walter 2 and R. Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental Research.
Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.
Use of Machine Learning in Chemoinformatics
Better together... we deliver MODELLING, CONTROL AND OPTIMISATION OF A DUAL CIRCUIT INDUCED DRAFT COOLING WATER SYSTEM February 2016 C.J. Muller Sasol;
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
The KNIME workflow for automated processing of PHYSPROP data
Chapter 13 Simple Linear Regression
QSAR Application Toolbox: Step 12: Building a QSAR model
Artificial Neural Network
PHYSICO-CHEMICAL PROPERTIES MODELLING FOR ENVIRONMENTAL POLLUTANTS
Hierarchical Classification of Calculated Molecular Descriptors
Current Status at BioChemtek
P. Gramatica1, F. Consolaro1, M. Vighi2, A. Finizio2 and M. Faust3
M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini
Random Neural Network Texture Model
Presentation transcript:

ABSTRACT The BEAM EU research project focuses on the risk assessment of mixture toxicity. A data set of 124 heterogeneous chemicals of high concern as environmental pollutants has been studied for toxicity on Scenedesmus vacuolatus. Several chemometric techniques were applied on the experimental toxicity data with the aim of developing a “universal” QSAR able to describe and predict the toxicity of structurally heterogeneous and dissimilarly acting chemical. The chemical structure of the compounds was described with several types of theoretical molecular descriptors calculated by the software DRAGON [1]. The Genetic Algorithm approach was used as the Variable Subset Selection method applied to OLS regression. In order to verify the predictive capability of the developed QSAR models a training set selection was performed by Experimental Design. OLS models have been developed on 76 chemicals selected as training set for the two parameters “a” (correlated with EC50 values) and “b” (steepness) of the Weibull model. Counter Propagation-Artificial Neural Networks (CP-ANN) approaches were also used to verify the utility of non-linear techniques. The used methodologies, applied to the overall dataset of 124 chemicals, showed a not-satisfactory performance in validation, demonstrating that a “universal” QSAR model is not possible when chemicals are significantly different in structure and mode of action. This highlights the essential need for data set representativity for the successful application of QSAR. Moreover QSAR models on the limited data sets on the more similar compound, in both structure and mode of action, show high predictive performance. Molecular descriptors The molecular descriptors were calculated by the software DRAGON [1]. A total of 1500 molecular descriptors of different kinds were used to describe compound chemical diversity. The descriptor typology is: OD: Constitutional descriptors. 1D: Empirical, Functional groups, Properties, Atom-centred fragments descriptors. 2D: Autocorrelations, Topological, Molecular walk counts, Galvez topological charge indices, BCUT descriptors. 3D: Geometrical, Randic molecular profiles, WHIM, GETAWAY, RDF, 3D-MoRSE, Charge descriptors. In addition, five quantum-chemical descriptors (HOMO, LUMO, (HOMO-LUMO)GAP, energies, heat of formation and ionization potential E i,v ),calculated by MOPAC (PM3 method) and log Kow experimental were always added as molecular descriptors. Experimental data The QSAR models have been developed on the EC50 values of 124 chemicals (with defined mode of action, tested experimentally for toxicity on Scenedesmus vacuolatus by the research group of Prof. Grimme, Bremen University, EU project: BEAM EVK ) and on the two parameters “a” and “b” of the Weibull model (the first parameter “a” is an expression of the location of the sigmoidal toxicity curve, tightly correlated with EC50 value, while parameter ”b” is an expression of the steepness of the toxicity curve). The chemicals in this data set are currently in common use: antifouling agent, antioxidant, bactericide, chemotherapeutic, disinfectant, fungicide, herbicide, insecticide, tool in physiological research and industrial chemical. Chemometric methods Multiple Linear Regression analysis and Variable Selection were performed by software MOBY-DIGS [2], using the Ordinary Least Squares regression (OLS) method and Genetic Algorithm-VSS. In order to verify the predictive capability of the developed QSAR models a test set selection was performed by Experimental Design procedure, by the software DOLPHIN [3]. Tools of regression diagnostics, as residual plots and Williams plots, were used to check the quality of the best models and define their applicability regarding the chemical domain. Counter Propagation Artificial Neural Networks (CP- ANN) approaches were also used to verify the utility of non-linear techniques. For a stronger evaluation of model applicability for prediction on new chemicals, the external validation (verified by Q 2 ext ) of all models is also recommended [4] and was here performed. MATERIALS and METHODS Ordinary Least Squares regression by Genetic Algorithm Variable Selection (OLS - GA) Unfortunately, the obtained models were found to be unsatisfactory due to their low predictive capability (even after the elimination of some outliers). A “universal” QSAR model is not possible when the chemicals are significantly different in both structure and mode of action. For this reason, we decided to OLS model obtained on selected training set OLS model on photosynthetic electron transport inhibitors (49 chemicals) OLS model on steroid biosynthesis inhibitors (17 chemicals) OLS model on compounds with unspecific mode of action (18 chemicals) The best models with good predictive power, on the 101 chemicals and on the split training set, are based on the same molecular descriptors: counting of different nitrogen groups (nCONR2- nCONN-nNHRPh), calculated LogKow (KLOGKow), a 3D descriptor of shape (PJI3) and a 3D-GETAWAY of autocorrelation (HATS3u). The regression line of the externally validated model is reported (outliers for the training and test set chemicals are highlighted). The QSAR models obtained on reduced datasets, selected for representativity and for similarity of mode of action, are all of good quality. The predictive performances and stability have been verified by internal validation (Q 2 and Q 2 LMO ). The chemical domain of applicability of the proposed models for new chemicals must be always verified by the leverage approach, taking into account that some of these models have been developed on relatively small data sets. All the proposed models are based on different molecular descriptors, mainly theoretical, encoding different features of the chemical structures related to the modelled end-points. The logKow parameter is selected only in models for unspecific mode of action (probably as related to the baseline toxicity) and in the global models, thus demonstrating that other molecular descriptors more related to the chemical structure are able to describe and predict the toxicity. A CP-ANN approach was applied on the experimental EC50 toxicity values of a selected training set of 70 chemicals in order to develop a QSAR regression model with a non-linear technique. The 13 significant principal components of the molecular descriptors were used as predictive variables. The best model was developed by a map of 8x8 neurons and 50 learning epochs. The obtained model turned out to be unsatisfactory due to its low predictive power. Regression by Counter Propagation Artificial Neural Networks (CP-ANN) Not satisfactory model The CP-ANN approach was applied on the experimental EC50 toxicity values of a reduced data set of 101 chemicals, which includes only the chemicals with the more represented modes of action. As predicted variables we used the four ones more frequently present in the population of OLS models. The best model was developed by a map of 8x8 neurons and 100 learning epochs.. QSAR MODELLING ON A MORE REPRESENTATIVE DATASET SUBSETS OF CHEMICALS WITH THE SAME MODE OF ACTION CONCLUSIONS QSAR MODELLING ON THE OVERALL DATASET REFERENCES [1] Todeschini R., Consonni V. and Pavan M. DRAGON, version (WINDOWS/PC); Milano, Italy. Program for the calculation of molecular descriptors from HyperChem, Tripos, MDL file, SYBYLmolfile formats from ChemOffice and Tripos molecular design software. Free download available at: [2] Todeschini R. Moby Digs /Evolution, rel 2.0, Talete Milano, Italy. model the EC50 data for a reduced data set of 101 chemicals, including only the chemicals with the more represented modes of action: amino acid biosynthesis, DNA synthesis and function, lipid biosynthesis, photosynthetic electron transport, steroid biosynthesis and unspecific action. Regression by Computer Propagation Artificial Neural Networks (CP-ANN) [3] Todeschini, R. and Mauri, A DOLPHIN-Software for Optimal Distance-Based Experimental Design. rel. 1.1 for Windows, Talete srl, Milan (Italy). [4] Tropsha A., Gramatica P. and Gombar V.K The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models. Quant. Struct.-Act. Relat. 22. Not satisfactory model CHEMOMETRIC METHODOLOGIES FOR THE MODELLING OF HETEROGENEOUS CHEMICALS TOXICITY: DATASET REPRESENTATIVITY AS THE ABSOLUTE ESSENTIAL Paola Gramatica 1, Viviana Consonni 2, Manuela Pavan 2, Pamela Pilutti 1 and Ester Papa 1 1 QSAR and Environmental Chemistry Research Unit - INSUBRIA University (Varese - ITALY) 2 Milano Chemometrics & QSAR Research Group - Milano Bicocca University (Milano– ITALY) e.mail: web: http//dipbsf.uninsubria.it/qsar/ Did not work!!! Satisfactory predictive power Now it works!!! Financially supported The Commission of the European Union (BEAM EVK )