ABSTRACT Bioconcentration by aquatic biota is an important factor in assessing the environmental behaviour and potential hazard evaluation of a chemical,

Slides:

Advertisements

Similar presentations

Analysis of High-Throughput Screening Data C371 Fall 2004.

Advertisements

C A INTRODUCTION An Environmental Quality Objective (EQO), intended as a real “No Effect Concentration” (NEC), is not accessible experimentally. The usual.

Design of Experiments Lecture I

CE Introduction to Environmental Engineering and Science Readings for This Class: O hio N orthern U niversity Introduction Chemistry, Microbiology.

Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,

MODELLING OF PHYSICO-CHEMICAL PROPERTIES FOR ORGANIC POLLUTANTS F. Consolaro, P. Gramatica and S. Pozzi QSAR Research Unit, Dept. of Structural and Functional.

« هو اللطیف » By : Atefe Malek. khatabi Spring 90.

Lipinski’s rule of five

ABSTRACT The BEAM EU research project focuses on the risk assessment of mixture toxicity. A data set of 124 heterogeneous chemicals of high concern as.

Correlation and Autocorrelation

Basic Steps of QSAR/QSPR Investigations

1 LMO & Jackknife If a QSPR/QSAR model has a high average q 2 in LMO valiation, it can be reasonably concluded that obtained model is robust. Leave-many-out.

4 Th Iranian chemometrics Workshop (ICW) Zanjan-2004.

1 Persistent, Bioaccumulative and Toxic Pollutants Persistent, Bioaccumulative and Toxic Pollutants Programme (PBT) launched by the EPA in 1998: - Reduce.

Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.

Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.

Development of Empirical Models From Process Data

Chapter 11 Multiple Regression.

Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.

Combining Statistical and Physical Considerations in Deriving Targeted QSPRs Using Very Large Molecular Descriptor Databases Inga Paster and Mordechai.

Non ionic organic pesticide environmental behaviour: ranking and classification F. Consolaro and P. Gramatica QSAR Research Unit, Dept. of Structural and.

Molecular Descriptors

Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.

RESULT and DISCUSSION In order to find a relation between the three rate reaction constant (k OH, k NO3 and k O3 ) and the structural features of chemicals,

Surveillance monitoring Operational and investigative monitoring Chemical fate fugacity model QSAR Select substance Are physical data and toxicity information.

David Kim Allergan Inc. SoCalBSI California State University, Los Angeles.

CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.

Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander.

Predicting a Variety of Constant Pure Compound Properties by the Targeted QSPR Method Abstract The possibility of obtaining a reliable prediction a wide.

The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.

Développement "IN SILICO" de nouveaux extractants et complexants de métaux Alexandre Varnek Laboratoire d’Infochimie, Université Louis Pasteur, Strasbourg,

CONCLUSIONS CONCLUSIONS - Missing values of the principal physico-chemical properties are predicted by validated regression models by using different kinds.

The aquatic toxicity values of 57 esters, with experimental and predicted LC50 in fish, EC50 in Daphnia and seaweed and IGC in Entosiphon sulcatum, were.

Paola GRAMATICA a, Paola LORENZINI a, Angela SANTAGOSTINO b and Ezio BOLZACCHINI b a University of Insubria, Dep. of Structural and Functional Biology,

Identifying Applicability Domains for Quantitative Structure Property Relationships Mordechai Shacham a, Neima Brauner b Georgi St. Cholakov c and Roumiana.

Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.

University of Texas at AustinMichigan Technological University 1 Module 2: Evaluating Environmental Partitioning and Fate: Approaches based on chemical.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

DESIRABILITY OF POPs ACCORDING TO THEIR ATMOSPHERIC MOBILITY The main goal pursued in this work is the formulation of a POP ranking by atmospheric mobility.

TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS Council Directive 76/464/EEC of the European.

Martin Waldseemüller's World Map of 1507 Zanjan. Roberto Todeschini Viviana Consonni Davide Ballabio Andrea Mauri Alberto Manganaro chemometrics molecular.

Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.

QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.

ABSTRACT The behavior and fate of chemicals in the environment is strongly influenced by the inherent properties of the compounds themselves, particularly.

P. Gramatica and F. Consolaro QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy.

QSAR AND CHEMOMETRIC APPROACHES TO THE SCREENING OF POPs FOR ENVIRONMENTAL PERSISTENCE AND LONG RANGE TRANSPORT FOR ENVIRONMENTAL PERSISTENCE AND LONG.

Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors P.Gramatica, F.Consolaro and M.Pavan QSAR.

Unsupervised Forward Selection A data reduction algorithm for use with very large data sets David Whitley †, Martyn Ford † and David Livingstone †‡ † Centre.

Selecting Diverse Sets of Compounds C371 Fall 2004.

Log Koc = MW nNO – 0.19 nHA CIC MAXDP Ts s = 0.35 F 6, 134 = MW: molecular weight nNO: number of NO bonds.

F.Consolaro 1, P.Gramatica 1, H.Walter 2 and R.Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental.

MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.

P. Gramatica 1, H. Walter 2 and R. Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental Research.

Chance Correlation in QSAR studies Ahmadreza Mehdipour Medicinal & Natural Product Chemistry Research Center.

Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)

Lipinski’s rule of five

Chapter 7. Classification and Prediction

Unit 2 Test Review Topics Include: Nature of Science Basic Chemistry

Chapter 9 Multiple Linear Regression

PHYSICO-CHEMICAL PROPERTIES MODELLING FOR ENVIRONMENTAL POLLUTANTS

Hierarchical Classification of Calculated Molecular Descriptors

CJT 765: Structural Equation Modeling

Unit 1 Test Review Topics Include: Nature of Science Basic Chemistry

Virtual Screening.

METHOD VALIDATION: AN ESSENTIAL COMPONENT OF THE MEASUREMENT PROCESS

P. Gramatica1, F. Consolaro1, M. Vighi2, A. Finizio2 and M. Faust3

Topological Index Calculator III

Statistical Data Analysis

M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini

Presentation transcript:

ABSTRACT Bioconcentration by aquatic biota is an important factor in assessing the environmental behaviour and potential hazard evaluation of a chemical, mainly for Persistent Bioaccumulative and Toxic compounds (PBTs). Since the experimental determination of BCF values is expensive and time consuming, estimation methods have been widely used to supply missing data. Log P (K ow ) is the most widely used physicochemical descriptor for modelling bioconcentration, but for highly hydrophobic chemicals non-linear models must be applied. Analogous results have been obtained by modelling with connectivity indices and polarity correction factors. In this study the application of the Genetic Algorithm as Variable Subset Selection ( GA-VSS ) to a wide set (more than 800) of molecular descriptors of different structural aspects, like 1D-constitutional, 2D-topological, and 3D-descriptors ( i.e. WHIM descriptors and GETAWAY) produces highly predictive models of BCF in fish for 238 non-ionic organic compounds. The best linear regression model ( by Ordinary Least Squares regression ( OLS)), in which log K ow was not selected as molecular descriptor, was always validated for its predictivity by leave-one-out, leave-more-out and external validation (the selection of the optimal and most representative test set was derived by the Experimental Design technique). The approach shows that a good model (Q 2 ext =87.7) can be obtained without using log K ow or introducing polarity correction factors, simply by applying theoretical molecular descriptors calculable from the molecular structure. INTRODUCTION Bioconcentration is the process of accumulation of water borne chemicals by fish and other aquatic animals through nondietary routes, i.e by absorption from the water via the respiratory surface and/ or the skin (1,2). The Bioconcentration Factor (BCF) is defined, for a specific compound, as the equilibrium ratio of the chemical concentration in the exposed organism to the concentration of the dissolved chemical in the aquatic environment. Therefore BCF can be used as an estimate of a chemical tendency to accumulate in an aquatic organism and represent a crucial task in the identification and control of chemicals like Persistent Bioaccumulative and Toxic compounds (PBT). Chemicals bioconcentration is usually estimated by correlation between their BCFs and hydrophobicity, but some difficulty arise on modelling extremely hydrophobic and large chemicals. Due to this problems different approaches, using theoretical molecular descriptors of different kinds, have been applied with the principal aim to take into account many structural aspects of a molecule that can be relevant in determining bioaccumulation. The objective of the present study is to propose new QSAR models validated by internal and external validation for the BCF prediction, applicable to a wide range of organic compounds of different chemicals structures; finally a comparison of the BCF values predicted by these models with those obtained by the Molecular Connectivity Indices, MCI- based models of Lu et al (3) and the K ow - based models of Meylan (4) et al., applied by U.S. EPA (BCFWIN), is presented in order to verify the reliability and predictive performances of the different estimation models. MATERIALS and METHODS EXPERIMENTAL DATA In this work we used data of BCF measured in fish for 238 non-ionic compounds that were collected from an extensive literature review by Lu et al.(3). Owing to the fact that our goal is a comparison with this work, no effort was made to verify this data: only acrolein was deleted from the original data set as it was an outlier. CHEMOMETRIC METHODS Multiple Linear Regression analysis and variable selection were performed by the software MOBY DIGS (11) using the Ordinary Least Square Regression (OLS) method and GA-VSS (Genetic Algorithm-Variable Subset Selection) (12). All the calculations have been performed by using the leave-one-out (LOO) and leave-more-out (LMO) procedures and the scrambling of the responses for the validation of the models. (13-14) External validations (13-16) were performed on two validation sets obtained with the splitting at 50% and 75%of the original data by the Experimental Design procedure, applying the software DOLPHIN (17). MOLECULAR DESCRIPTORS The molecular structure of the studied compounds were described by using several molecular descriptors calculated by the software DRAGON of Todeschini et.al (5). A total of 1166 molecular descriptors of different kinds were calculated to describe compound chemical diversity. The constant values and the descriptors pair-correlated (with a correlation of 1) were excluded, thus the molecular descriptors on which the variable selection by GA was applied are 965. The descriptor tipology is: In addition 5 quantum-chemical descriptors ((calculated by MOPAC – PM3 method (9) ) HOMO, LUMO, deltaHOMO-LUMO, energies and ionization potential) and Log Kow (taken from EPIWIN package) (10) were used. Genetic Algorithm was applied on the set of molecular descriptors reduced by eliminating 237 molecular descriptors singularly not-related to the response. Thus the final set of molecular descriptors used as input is constituted of 734 descriptors. 0D: constitutional descriptors (atoms and group counts) 1D: Functional groups, atom centered fragments and empirical descriptors. 2D: BCUTs, Galvez indices from the adjacency matrix, walk counts, various autocorrelations from the molecular graph and topological descriptors. 3D: Randic molecular profiles from the geometry matrix, WHIMs (6-7), GETAWAY (8) and geometrical descriptors. RESULTS AND DISCUSSION SPLITTING of the ORIGINAL DATA SET by applying EXPERIMENTAL DESIGN On the basis of the structural information represented from all the used molecular descriptors and also taking into account the BCF responses, the original data set was splitted by applying the Experimental Design procedure using the software DOLPHIN (17), to obtain a training set of 179 molecules and a validation set of 59 chemicals (or alternatively a training - test set of 119 molecules). This Design guarantees that the chemical composition of training and validation sets have well balanced structural diversity and are also representative of the entire range of biological response. The usefulness of QSAR models is mainly in the possibility of predictive applications. For this purpose more validation steps are necessary to avoid overestimation of predictive power of the models and to verify their predictivity: Leave-one-out using QUIK rule ( Q Under Influence of K (18)) to avoid chance correlation. Strongest validation using leave-more-out procedure (25-50%). Y scrambling ( permutation testing by recalculating models for randomly reordered response ). Use of external validation verified by Q 2 ext. REGRESSION LINE of the MODEL obtained on a SELECTED TRAINING SET of 179 CHEMICALS The molecular descriptors, most frequently selected by Genetic Algorithm as the most informative and predictive of the chemical tendency to bioconcentrate, are related to the dimension of the chemical and to the distribution of polar atoms in molecule. As we expected dimensional descriptors (MATS2m (19), IDDM (20)) in the proposed models are positive in sign, explaining the bioconcentration tendency of bigger molecules, while the negative descriptors, considering both polarity factors (H6p(8), GATS2e(21)) and the possibility of forming hydrogen bonds (nHAcc (22)), explain, for more polar chemicals, the tendency toward aquatic partitioning. Our linear models are clearly more predictive than the BCFWIN logKow-based model (10), whose predictivity is not even verified and moreover simpler than the MCIs model (3). In fact this last model use 5 connectivity index and 8 correction factors proposing a 13-dimensional non linear model, strongly dependent on the studied dataset in relation to the choice of polar functional groups. By comparing the residuals of the different models it can be seen that the logKow-model has the biggest RMS, while the MCI- based model and our new models show similar performances. (1) Veith, G.D.; DeFoe, D.L.; Bergstedt, B. V. J. Fish Res: Board Can. 1979, 36, ; (2)Barron, M.G. Environ. Sci. Technol. 1990, 24, ; (3) Lu, X.;Tao, S. Hu,H.; Dawson, R.W., Chemosphere, 2000, 41, ; (4) Meylan, W.M.; Howard, P.H.; Boethling, R.S.; Aronson, D.; Printup, H.; Gouichie, S., Environ. Toxicol. Chem. 1999, 18, ; (5) Todeschini R., Consonni V. and Pavan E DRAGON – Software for the calculation of molecular descriptors, rel for Windows. Free download available at (6) Todeschini, R.; Lasagni, M.; Marengo, E. J. Chemometrics 1994, 8, ; (7) Todeschini, R; Gramatica, P. Quant.Struct.-Act.Relat. 1997, 16, ; (8) Consonni, V., Todeschini, R., Pavan, M., J. Chem. Inf. Comput. Sci., 2002 in press; (9) CHEM 3D –Cambridge Soft, 1997, MA, USA; (10) BCFWIN v in EPIWIN Package 2000 U.S.EPA; (11)Todeschini, R., Moby Digs - Software for multilinear regression analysis and variable subset selection by Genetic Algorithm, rel. 2.3 for Windows, Talete srl, Milan (Italy); (12) Leardi, R.; Boggia, R.; Terrile, M.,. J. Chemom., 1992, 6, ; (13) Wold, S. Eriksson, L. Chemometric Methods in Molecular Design, 1995, VCH, Germany, ; (14) Shi, L.M., Fang, H., Tong, W, Wu, J., Perkins, R., Blair, R.M., Branham, W.S., Dial, S.L., Moland, C.L., Sheehan, D.M., J.Chem.Inf.Comput.Sci., 2001, 41, ; (15) Cramer. R.D.; Patterson, D.E.; Bunce, J.D., J.Am.Chem.Soc., 1988, 110, ; (16) Golbraikh, A. Tropsha, A., J. Mol. Graph and Mod., 2002, 20, ; (17) Todeschini, R.; Mauri, A., 2000; DOLPHIN- Software for Optimal Distance-based Experimental Design rel 1.1 for Windows, Talete srl, Milan (Italy); (18) Todeschini, R.; Maiocchi, A.; Consonni, V., Chemom. Intell. Lab. Syst., 1999, 46, 13-29; (19) Moran, P.A.P., Biometrika, 1950, 37, 17-23; (20) Bonchev, D., Information Theoretic Indices for Characterization of Chemical Structures, 1983, Research Studies Press, Chichester (U.K.), p.249; (21) Geary, R.C., Incorp. Statist., 1954, 5, ; (22) Todeschini, R. and Consonni, V., Handbook of Molecular Descriptors, Wiley-VCH, Weinheim (Germany), p REFERENCES CONCLUSIONS  A new predictive model for BCF is proposed.  This model is based only on theoretical molecular descriptors.  Genetic Algorithm is applied for Variable Subset Selection.  Strong validations demonstrate the stability of the models.  BCF values also for new chemicals (even not yet synthesised) can be predicted. LINEAR MODELLING AND PREDICTION OF BIOCONCENTRATION FACTOR (BCF) BY THEORETICAL MOLECULAR DESCRIPTORS Papa Ester - Gramatica Paola Dep.Struct.Funct.Biol. - QSAR Research Unit - University of Insubria ( Varese - Italy ) Web: