Presentation is loading. Please wait.

Presentation is loading. Please wait.

ABSTRACT The behavior and fate of chemicals in the environment is strongly influenced by the inherent properties of the compounds themselves, particularly.

Similar presentations


Presentation on theme: "ABSTRACT The behavior and fate of chemicals in the environment is strongly influenced by the inherent properties of the compounds themselves, particularly."— Presentation transcript:

1 ABSTRACT The behavior and fate of chemicals in the environment is strongly influenced by the inherent properties of the compounds themselves, particularly by the basic physico-chemical properties such as solubility in water, vapour pressure, melting point, boiling point, flash point and density. The knowledge of physico-chemical properties is of fundamental interest in risk assessment studies, and is a specific requirement of the EU-Directive “White Paper on a strategy for a future Community Policy for Chemicals”, particularly for High Production Volume (HPV) compounds. In this paper a data set of 153 esters has been studied. The application of the Genetic Algorithm as Variable Subset Selection (GA-VSS) to a wide set of theoretical molecular descriptors of different structural aspect, like 1D-, 2-D and 3D-descriptors (Dragon software), produces highly predictive models of the studied physico-chemical properties. The best linear models, obtained by Ordinary Least Squares regression (OLS) were validated for predictivity both internally, using leave-one-out (Q 2 =78-94%), leave-many-out (30-50%), Y-scrambling and externally (Q 2 EXT =88-94% except for the melting point model). The splitting of the data set into a training and an evaluation set was realised by D-optimal Experimental Design. The reliability of the predictions was checked by the leverage approach in order to verify the chemical domain of the models. The application of the proposed class-specific QSAR models allows fast knowledge of the physico-chemical properties of existing esters. This approach could also be applied usefully to new chemicals, even those not yet synthesised, as it is based simply on the knowledge of the molecular structure. INTRODUCTION Esters are an important class of industrial chemicals and some of them belong to HPV (High Production Volume), compounds with a production volume of 1,000 tonnes/year. The EU-Directive “White Paper on a strategy for a future Community Policy for Chemicals” is directed towards such compounds, requiring physico-chemical data by, at the latest, the end of 2005 [1]. Experimental testing is both costly and time-consuming and the systematic determination of missing data using laboratory experiments would place an enormous economic burden on industry and regulatory authorities [2]. To overcome the problem of insufficient data in the field of environmental risk assessment, physical chemical properties and the environmental fate of organic chemicals, quantitative structure-property/activity relationships between descriptors of chemical compounds and their physical, chemical and biological properties have been extensively studied in recent years. The object of the study was to develop QSAR models to rapidly predict some physico-chemical properties of esters. MATERIALS & METHODS  EXPERIMENTAL DATA  EXPERIMENTAL DATA: The experimental data of physical-chemical properties for 153 esters have been taken from the literature [3, 4]. The data were measured at 20-25°C and at 1 atm. The end-points studied were: solubility in water, vapour pressure, melting point, boiling point, flash point and density. Solubility and vapour pressure data are expressed in logarithmic scale.  MOLECULAR DESCRIPTORS:  MOLECULAR DESCRIPTORS: The molecular structure of the studied compounds were described by using several molecular descriptors calculated by the software DRAGON of Todeschini et. al [5]. A total of 1198 molecular descriptors of different kinds were calculated to describe compound chemical diversity. Constant values and pair-correlated descriptors (with a correlation of 0.98) were excluded, thus the molecular descriptors on which the variable selection by GA was applied are 422. The descriptor tipology is: 0D: Constitutional descriptors (atoms and group counts) 1D: Functional groups, atom centered fragments and empirical descriptors. 2D: BCUTs, Galvez indices from the adjacency matrix, walk counts, various autocorrelations from the molecular graph and topological descriptors. 3D: Randic molecular profiles from the geometry matrix, WHIMs [6-7], GETAWAY [8] and geometrical descriptors.  CHEMOMETRIC METHODS:  CHEMOMETRIC METHODS: Multiple Linear Regression analysis and variable selection were performed by the software MOBY-DIGS of Todeschini et al.[9], using the Ordinary Least Squares regression (OLS) method and Genetic Algorithm-Variable Subset Selection [10]. The best models were validated by several ways: Leave-one-out: each chemical is left out of the training set and predicted Leave-more-out: up to 50% randomly selected chemicals are left out of the training set Y-scrambling: by random permutation of the responses External validation were performed on a validation set obtained splitting the original data set at 75% by Experimental Design procedure (software DOLPHIN of Todeschini et al [11]). Tools of regression diagnostics as residual plots and Williams plots were used to check the quality of the best models and define their applicability regard to the chemical domain, using the chemometric package SCAN [12]. RMS (residual mean squares) are also reported for model comparison with EPIWIN [13]. CONCLUSIONS  New predictive models “ad-hoc” for physico-chemical properties such as solubility in water, vapour pressure, melting point, boiling point, flash point and density are proposed.  These models are based on theoretical molecular descriptors selected by Genetic Algorithm.  All proposed models have a good predicting power verified with very strong internal validation (50%) and also external validation.  On comparing the residuals it can be seen that our models generally show better performance than EPIWIN.  Physico–chemical property values, also for new chemicals (even not yet synthesized), can be predicted for esters belonging to the chemical domain (leverage approach for applicability).REFERENCES [1] http://europa.eu.int/comm/environmental/chemicals/whitepaper.htmhttp://europa.eu.int/comm/environmental/chemicals/whitepaper.htm [2] Gramatica P. Fine Chemicals and Intermediates technologies (Chemistry Today), 1991, 18-24; [3] Syracuse Corporation Americana, http://esc.syrres.com; [4] European Commission – Joint Research Centre IUCLID CD-ROM, 2000; [5] Todeschini R., Consonni V. and Pavan E. 2002. DRAGON – Software for the calculation of molecular descriptors, rel. 1.12 for Windows. Free download available at http://www.disat.unimib/chm.;http://www.disat.unimib/chm [6] Todeschini, R.; Lasagni, M.; Marengo, E. J. Chemometrics 1994, 8, 263-273; [7] Todeschini, R; Gramatica, P. Quant.Struct.-Act.Relat. 1997, 16, 113-119; [8] Consonni, V., Todeschini, R., Pavan, M., J. Chem. Inf. Comput. Sci., 2002, 42, 693-705; [9] Todeschini, R., 2001. Moby Digs - Software for multilinear regression analysis and variable subset selection by Genetic Algorithm, rel. 2.3 for Windows, Talete srl, Milan (Italy); [10] Leardi, R.; Boggia, R.; Terrile, M.,. J. Chemom., 1992, 6, 267-281; [11] Todeschini, R.; Mauri, A., 2000; DOLPHIN- Software for Optimal Distance-based Experimental Design rel 1.1 for Windows, Talete srl, Milan (Italy); [12] SCAN- Software for Chemometric Analysis, rel. 1.1 for Windows, Jerll. Inc., Standard, CA, 1992; [13] EPI Suite 2001, Ver.3.10, Environmental Protection Agency (http://www.epa.gov) [14] Wold, S. Eriksson, L. Chemometric Methods in Molecular Design, 1995, VCH, Germany, 309-318; [15] Golbraikh, A. Tropsha, A., J. Mol. Graph and Mod., 2002, 20, 269-276.. QSAR PREDICTION OF PHYSICO-CHEMICAL PROPERTIES OF ESTERS Gramatica, P., Battaini, F., Papa, E. QSAR and Environmental Chemistry Research Unit, University of Insubria, Varese (Italy). Web: http://dipbsf.uninsubria.it/qsar/ e-mail: paola.gramatica@uninsubria.it paola.gramatica@uninsubria.it RESULTS AND DISCUSSION: The best set of descriptors relevant to the modeling of response was selected by Genetic Algorithm from 422 calculated descriptors. The models, always evaluated by optimising their predictive capabilities, were verified for stability and predictivity by internal validation (leave-one-out and leave-many-out) and the permutation of response (Y-scrambling). The leverage of all the studied compounds was also calculated to check the distance from the model experimental space. In order to estimate the true predictive power of models, the original data set of solubility in water, vapour pressure, boiling point and density were spilt in training and test set for calculated external Q 2 [14,15]. The best splitting was here realized by Experimental Design procedure using the software DOLPHIN. Table 1 shows the best models for each end-point. The regression lines of the externally validated models are reported (outliers for the training and test set chemicals are highlighted). On comparing the residuals of the different models (tab.2), it is evident that EPIWIN models show similar performance to our models in predicting boiling point and vapour pressure, but bigger RMS for solubility and melting than our model. BOILING POINT Tab.2 – Comparison of models This result appears satisfactory considering that EPIWIN model was obtained on a training set much bigger than our data set. For the other end-points no comparison is possible as EPIWIN does not include these end-points. SOLUBILITY DENSITY VAPOUR PRESSURE Tab.1 – Model Performances


Download ppt "ABSTRACT The behavior and fate of chemicals in the environment is strongly influenced by the inherent properties of the compounds themselves, particularly."

Similar presentations


Ads by Google