C A INTRODUCTION An Environmental Quality Objective (EQO), intended as a real “No Effect Concentration” (NEC), is not accessible experimentally. The usual.

Slides:



Advertisements
Similar presentations
Mexican Export and Import Unit Value Indices. Introduction Export and import price indices are useful for the analysis of foreign trade statistics. Besides.
Advertisements

Step three: statistical analyses to test biological hypotheses General protocol continued.
Analysis of High-Throughput Screening Data C371 Fall 2004.
Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.
Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
A SOFTWARE TOOL DEVELOPED FOR THE CLASSIFICATION OF REMOTE SENSING SPECTRAL REFLECTANCE DATA Abdullah Faruque School of Computing & Software Engineering.
MODELLING OF PHYSICO-CHEMICAL PROPERTIES FOR ORGANIC POLLUTANTS F. Consolaro, P. Gramatica and S. Pozzi QSAR Research Unit, Dept. of Structural and Functional.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
S-SENCE Signal processing for chemical sensors Martin Holmberg S-SENCE Applied Physics, Department of Physics and Measurement Technology (IFM) Linköping.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Fei Xing1, Ping Guo1,2 and Michael R. Lyu2
ABSTRACT The BEAM EU research project focuses on the risk assessment of mixture toxicity. A data set of 124 heterogeneous chemicals of high concern as.
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
A Study on Feature Selection for Toxicity Prediction*
4 Th Iranian chemometrics Workshop (ICW) Zanjan-2004.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
8 th Iranian workshop of Chemometrics 7-9 February 2009 Progress of Chemometrics in Iran Mehdi Jalali-Heravi February 2009 In the Name of God.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Combining Statistical and Physical Considerations in Deriving Targeted QSPRs Using Very Large Molecular Descriptor Databases Inga Paster and Mordechai.
Non ionic organic pesticide environmental behaviour: ranking and classification F. Consolaro and P. Gramatica QSAR Research Unit, Dept. of Structural and.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
RESULT and DISCUSSION In order to find a relation between the three rate reaction constant (k OH, k NO3 and k O3 ) and the structural features of chemicals,
Sung Kyu (Andrew) Maeng. Contents  QSAR Introduction  QSBR Introduction  Results and discussion  Current QSAR project in UNESCO-IHE.
Presented by Tienwei Tsai July, 2005
GHS CLASSIFICATION ONLINE. Registration: Click on “Register”
CONCLUSIONS CONCLUSIONS - Missing values of the principal physico-chemical properties are predicted by validated regression models by using different kinds.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
The aquatic toxicity values of 57 esters, with experimental and predicted LC50 in fish, EC50 in Daphnia and seaweed and IGC in Entosiphon sulcatum, were.
MINING MULTI-LABEL DATA BY GRIGORIOS TSOUMAKAS, IOANNIS KATAKIS, AND IOANNIS VLAHAVAS Published on July, 7, 2010 Team Members: Kristopher Tadlock, Jimmy.
Paola GRAMATICA a, Paola LORENZINI a, Angela SANTAGOSTINO b and Ezio BOLZACCHINI b a University of Insubria, Dep. of Structural and Functional Biology,
DESIRABILITY OF POPs ACCORDING TO THEIR ATMOSPHERIC MOBILITY The main goal pursued in this work is the formulation of a POP ranking by atmospheric mobility.
TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS TOXICITY MODELLING OF “EEC PRIORITY LIST 1” COMPOUNDS Council Directive 76/464/EEC of the European.
Martin Waldseemüller's World Map of 1507 Zanjan. Roberto Todeschini Viviana Consonni Davide Ballabio Andrea Mauri Alberto Manganaro chemometrics molecular.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
A B S T R A C T The study presents the application of selected chemometric techniques to the pollution monitoring dataset, namely, cluster analysis,
ABSTRACT The behavior and fate of chemicals in the environment is strongly influenced by the inherent properties of the compounds themselves, particularly.
P. Gramatica and F. Consolaro QSAR Research Unit, Dept. of Structural and Functional Biology, University of Insubria, Varese, Italy.
QSAR AND CHEMOMETRIC APPROACHES TO THE SCREENING OF POPs FOR ENVIRONMENTAL PERSISTENCE AND LONG RANGE TRANSPORT FOR ENVIRONMENTAL PERSISTENCE AND LONG.
Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors P.Gramatica, F.Consolaro and M.Pavan QSAR.
Log Koc = MW nNO – 0.19 nHA CIC MAXDP Ts s = 0.35 F 6, 134 = MW: molecular weight nNO: number of NO bonds.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
CS378 Final Project The Netflix Data Set Class Project Ideas and Guidelines.
F.Consolaro 1, P.Gramatica 1, H.Walter 2 and R.Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental.
MUTAGENICITY OF AROMATIC AMINES: MODELLING, PREDICTION AND CLASSIFICATION BY MOLECULAR DESCRIPTORS M.Pavan and P.Gramatica QSAR Research Unit, Dept. of.
Prioritization Process and Development of the Hazard Characterization Documents Office of Pollution Prevention and Toxics U.S. Environmental Protection.
P. Gramatica 1, H. Walter 2 and R. Altenburger 2 1 QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2 UFZ Centre for Environmental Research.
Use of Machine Learning in Chemoinformatics
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
BEAM Bridging Effect Assessment of Mixtures to ecosystem situations and regulation University of Bremen, Germany University of Göteborg, Sweden University.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
CPH Dr. Charnigo Chap. 14 Notes In supervised learning, we have a vector of features X and a scalar response Y. (A vector response is also permitted.
Toxicity vs CHEMICAL space
Principal Component Analysis (PCA)
Background on Classification
PHYSICO-CHEMICAL PROPERTIES MODELLING FOR ENVIRONMENTAL POLLUTANTS
School of Computer Science & Engineering
Background This is a step-by-step presentation designed to take the first time user of the Toolbox through the workflow of a data filling exercise.
Outlier Processing via L1-Principal Subspaces
Hierarchical Classification of Calculated Molecular Descriptors
Statistical Techniques
Multivariate Statistical Methods
P. Gramatica1, F. Consolaro1, M. Vighi2, A. Finizio2 and M. Faust3
Abdur Rahman Department of Statistics
M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini
Presentation transcript:

C A INTRODUCTION An Environmental Quality Objective (EQO), intended as a real “No Effect Concentration” (NEC), is not accessible experimentally. The usual procedure is the extrapolation of EQO or NEC from experimental data by using application factors (CSTE, 1994; EEC, 1994). The minimum requirement for setting an EQO or a NEC are data on organisms representative of three trophic levels (for a WQO: algae, Daphnia and fish). Even with such a reductive approach, the availability of reliable data remains a problem for a large number of existing chemicals. Therefore, there is the need for predictive approaches, capable to classify toxic substances in function of the potential danger for the environment, at least for a preliminary ranking of priority chemicals. M. Vighi 1a, P. Gramatica 2, F. Consolaro 2 and R. Todeschini 1b 1 Department of Environment and Landscape Sciences, University of Milano-Bicocca, a Environmental Research Group; b Milan Chemometric Research Group, Via Emanueli 15, I Milano, Italy; 2 Department of Structural and Functional Biology, QSAR Research Unit, University of Insubria, Via Dunant 3, I Varese, Italy REFERENCES CSTE, 1994; EEC, 1994) [1] Todeschini and Gramatica, 3D-modelling and Prediction by WHIM Descriptors. Part 5. Theory Development and Chemical Meaning of WHIM Descriptors. Quant. Struct.-Act. Relat. 16, (1997) [2] Todeschini and Gramatica, The WHIM Theory: new 3D molecular descriptors for QSAR in environmental modelling. SAR and QSAR in Environmental Research, 7, (1997) [3] Todeschini and Gramatica, 3D-modelling and Prediction by WHIM Descriptors. Part 6. Application of WHIM Descriptors in QSAR Studies. Quant. Struct.-Act. Relat. 16, (1997) MATERIALS AND METHODS Chemicals and toxicological data The data base starts from the European list of priority chemicals (Directive 76/464/EEC) modified by exclusion of chemicals which could not be evaluated with our QSAR approach (metals, salts, undefined mixtures) and by inclusion of chemical isomers. The final data set contains 125 chemicals with corresponding WQO. A complete toxicological data set was not available for all chemicals. Molecular descriptors The molecule structures have been represented by different set of descriptors: 38 mono-dimensional (count descriptors), 34 two-dimensional (topological) and 99 three-dimensional (3D-WHIM, 3D-Weighted Holistic Invariant Molecular) (Todeschini and Gramatica [1,2]) and all calculated by the software WHIM-3D/QSAR of R. Todeschini (free download from web-site: Statistical methods The variable selection has been done by a Genetic Algorithm approach. QSAR models were obtained by Ordinary Least Squares regression (OLS) validated with the leave-one-out and the leave-more-out procedures (Todeschini and Gramatica [2,3]). The possibility of application of the model to chemicals out of the training data set was evaluated by the leverage method. Similarity analysis of the chemicals was performed by Principal Component Analysis (PCA) and hierarchical Cluster Analysis. Variables for classification methods were selected by Stepwise Linear Discriminant Analysis (SLDA). Several classification methods (CART, KNN, RDA, LDA) were used successfully. Among them, the Regularised Discriminant Analysis (RDA) gave the most satisfying results. Toxicological analysis on experimental and predicted data PCA was applied to fish and Daphnia toxicological data (experimental plus predicted for 125 chemicals). Chemicals were divided into three toxicity classes in function of PC1 values, which represent the global toxicity. The classification corresponds quite well to the original five WQO classes with a few exceptions, mainly due to the underestimation of some herbicides and to some precautionary WQO (organotin, HCH isomers), as highlighted in the figure. A second PC analysis was made on fish, Daphnia and algae toxicity data, (97 chemicals). In this case too, a division into three toxicity classes was made according to PC1 values. Also for these classes there is a good correspondence with WQO classification. Besides some still described exceptions, other relevant differences are hexachlorobutadiene and, in particular, pentachlorophenol, which appear underestimated in the WQO classification. CONCLUSIONS The approach used has been proved a powerful method for the preliminary classification of chemicals. It could represent a useful tool for the set up of priority lists in function of the hazard for the aquatic environment for chemicals for which experimental data are lacking or inadequate. Structural analysis A preliminary evaluation of the relationship between structure and toxicological class evaluated through WQO was made using all molecular descriptors. According to the loadings of molecular descriptors, at the right side of PC1 are grouped compounds characterised by small dimension and higher symmetry which are the less toxic. Classification models Several classification methods (CART, Classification And Regression Tree; KNN, K-nearest neighbor; RDA, Regularised Discriminant Analysis, LDA, Linear Discriminant Analysis) were applied to the data set of 125 chemicals (Daphnia and fish toxicity) and also to the 97 chemicals set (algae, Daphnia and fish toxicity). The three a priori classes have been defined by PCA as reported in C. All models classify the chemicals in a good agreement with the previous classes definition by using mainly topological descriptors. As example, the results obtained with CART for 125 chemicals and RDA for the 97 chemicals data sets are shown in the schemes (Classification Tree for CART and misclassification matrices for RDA). Most chemicals are classified in agreement with the three a priori classes. The few misclassified chemicals are generally on the borderline between two classes. In some cases the new classification seems more reliable than the previous and a few discrepancies in the WQO classification has been underlined. RESULTS Single species QSAR models For each test organism, the model with the best predictive capability was selected. For chemicals not included in the training data set, leverage values are calculated in order to evaluate if the data estimated by the developed model can be reliable. This allowed to add a number of predicted data to the experimental data set B D In PC2 the differences among the specific toxicity are highlighted IDDM IDMT CHI1 IDMT Class assignment CART Classification tree NOMER=42.3% ER=5.1% ER c.v.=9.3% NOMER=42.3% ER=7.2% ER c.v.=10.3% RDA Confusion matrices