Evaluation of a Targeted-QSPR Based Pure Compound Property Prediction System Abstract The use of the DD – TQSPR (Dominant-Descriptor Targeted QSPR) method.

Slides:



Advertisements
Similar presentations
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Advertisements

Object Specific Compressed Sensing by minimizing a weighted L2-norm A. Mahalanobis.
Forecasting Using the Simple Linear Regression Model and Correlation
Biointelligence Laboratory, Seoul National University
Pattern Recognition and Machine Learning: Kernel Methods.
EXAMPLE 3 Use Cramer’s rule for a 2 X 2 system
1 RegionKNN: A Scalable Hybrid Collaborative Filtering Algorithm for Personalized Web Service Recommendation Xi Chen, Xudong Liu, Zicheng Huang, and Hailong.
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
PROBABILISTIC ASSESSMENT OF THE QSAR APPLICATION DOMAIN Nina Jeliazkova 1, Joanna Jaworska 2 (1) IPP, Bulgarian Academy of Sciences, Sofia, Bulgaria (2)
Case Studies Class 5. Computational Chemistry Structure of molecules and their reactivities Two major areas –molecular mechanics –electronic structure.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Evaluating Hypotheses
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Linear Regression MARE 250 Dr. Jason Turner.
Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Correlation and Linear Regression Chapter 13 Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Predicting Highly Connected Proteins in PIN using QSAR Art Cherkasov Apr 14, 2011 UBC / VGH THE UNIVERSITY OF BRITISH COLUMBIA.
Combining Statistical and Physical Considerations in Deriving Targeted QSPRs Using Very Large Molecular Descriptor Databases Inga Paster and Mordechai.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Process modelling and optimization aid FONTEIX Christian Professor of Chemical Engineering Polytechnical National Institute of Lorraine Chemical Engineering.
Computational Chemistry, WebMO, and Energy Calculations
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
Probabilistic and Statistical Techniques 1 Lecture 24 Eng. Ismail Zakaria El Daour 2010.
1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, TX 75265
Predicting a Variety of Constant Pure Compound Properties by the Targeted QSPR Method Abstract The possibility of obtaining a reliable prediction a wide.
How Science Works The following PowerPoint is aimed at enhancing skills learnt at GCSE when performing experiments. Pupils must commit the terminology.
Considering Physical Property Uncertainties in Process Design Abstract A systematic procedure has been developed for process unit design based on the “worst.
What is "In" and What is "Out" in Engineering Problem Solving Mordechai Shacham Chem. Eng. Dept., Ben-Gurion University, Beer-Sheva 84105,Israel Michael.
Identifying Applicability Domains for Quantitative Structure Property Relationships Mordechai Shacham a, Neima Brauner b Georgi St. Cholakov c and Roumiana.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Solution of a Partial Differential Equations using the Method of Lines
Veracity through variety (of methods): Simulating dipeptides with little volume Tanja van Mourik.
Identification and Estimation of the Influential Parameters in Bioreaction Systems Mordechai Shacham Ben Gurion University of the Negev Beer-Sheva, Israel.
Real Gas Relationships
MARE 250 Dr. Jason Turner Linear Regression. Linear regression investigates and models the linear relationship between a response (Y) and predictor(s)
Selection of Molecular Descriptor Subsets for Property Prediction Inga Paster a, Neima Brauner b and Mordechai Shacham a, a Department of Chemical Engineering,
A "Reference Series" Method for Prediction of Properties of Long-Chain Substances Inga Paster and Mordechai Shacham Dept. Chem. Eng. Ben-Gurion University.
Correlation of Solid Solubility for Biological Compounds in Supercritical Carbon Dioxide: Comparative Study Using Solution Model and Other Approaches Jaw-Shin.
PREDICTION Elsayed Hemayed Data Mining Course. Outline  Introduction  Regression Analysis  Linear Regression  Multiple Linear Regression  Predictor.
Theory of dilute electrolyte solutions and ionized gases
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
A molecular descriptor database for homologous series of hydrocarbons ( n - alkanes, 1-alkenes and n-alkylbenzenes) and oxygen containing organic compounds.
ICONIP 2010, Sydney, Australia 1 An Enhanced Semi-supervised Recommendation Model Based on Green’s Function Dingyan Wang and Irwin King Dept. of Computer.
CHAPTER – 1 UNCERTAINTIES IN MEASUREMENTS. 1.3 PARENT AND SAMPLE DISTRIBUTIONS  If we make a measurement x i in of a quantity x, we expect our observation.
Modular and Sequential Construction of Complex Process Models – Applications to Process Hazard Assessment Mordechai Shacham Dept. of Chem. Engng, Ben Gurion.
1 Prediction of Phase Equilibrium Related Properties by Correlations Based on Similarity of Molecular Structures N. Brauner a, M. Shacham b, R.P. Stateva.
Camera calibration from multiple view of a 2D object, using a global non linear minimization method Computer Engineering YOO GWI HYEON.
LOAD FORECASTING. - ELECTRICAL LOAD FORECASTING IS THE ESTIMATION FOR FUTURE LOAD BY AN INDUSTRY OR UTILITY COMPANY - IT HAS MANY APPLICATIONS INCLUDING.
We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.
Zhaoxia Fu, Yan Han Measurement Volume 45, Issue 4, May 2012, Pages 650–655 Reporter: Jing-Siang, Chen.
Chapter 13 Simple Linear Regression
Process Design Course Using the NIST, DIPPR and DDBSP databases for Finding Physical, Chemical and Thermodynamic Properties Process Design Course.
Problem Solving in Chemical Engineering with Numerical Methods
Bulgarian Academy of Sciences
Mordechai Shacham, Dept. of Chem
Chapter 12: Regression Diagnostics
Statistical Methods For Engineers
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Optimal prediction of x
1 Department of Engineering, 2 Department of Mathematics,
Qi Li,Qing Wang,Ye Yang and Mingshu Li
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
How Science Works The following PowerPoint is aimed at enhancing skills learnt at GCSE when performing experiments. Pupils must commit the terminology.
Tutorial: Writing a Lab Report CHEM 1154
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Facultad de Ingeniería, Centro de Cálculo
Qi Li,Qing Wang,Ye Yang and Mingshu Li
Presentation transcript:

Evaluation of a Targeted-QSPR Based Pure Compound Property Prediction System Abstract The use of the DD – TQSPR (Dominant-Descriptor Targeted QSPR) method for the prediction of a wide variety of constant properties is considered. Prediction of a property (the target property) for a particular compound (the target compound) is carried out in two stages. The first stage involves the identification of a training set whose members are structurally related to the target compound (typically of around 10 compounds, for which target property data are available). The training set is selected from the target compound similarity group. The latter is identified by using a large database of molecular descriptors. The similarity between a potential predictive compound and the target compound is measured by the correlation coefficient between the vectors of their molecular descriptors. In the second stage of the DD TQSPR method, a Dominant Descriptor, which is collinear with the target property values for the members of the training set is identified and a linear relationship (the DD-TQSPR) between the DD and the target property values, is derived. Finally, the target compounds DD value is introduced into the linear equation in order to predict its target property. The use of the of the proposed technique is demonstrated by predicting 34 constant properties (available in the DIPPR database) for a target compound. Mordechai Shacham, and Inga Paster, Dept. of Chem. Engng, Ben Gurion University of the Negev, Beer-Sheva, Israel Richard L. Rowley, Chem. Eng. Dept., Brigham Young University, Provo, UT Neima Brauner and Gretah Tovarovski, School of Engineering, Tel-Aviv University, Tel-Aviv, Israel,  The DD-TQSPR method was able to predict all 34 properties of the target compound within the experimental error level  The appropriate training set (similarity group) is dependent on the target property.  The TSAE (Training Set Average Error) has proven to be a good indicator for the appropriateness of the training set and the prediction accuracy. This criterion is independent of the target-compound properties.. Similarity Group of n-hexyl mercaptan Prediction of the NBT of n-hexyl mercaptan Prediction of Properties of n – hexyl mercaptan - Summary of Results for 34 Properties Conclusions Property and Descriptor Databases A property and molecular descriptor database containing 1798 compounds for which 34 constant properties (source: DIPPR database ) and 3224 descriptors (source: Dragon 5.5, ) are available Most of the 3-D molecular structures were optimized in Gaussian 03 using B3LYP/ G (3df, 2p), a density functional method with a large basis set. The rest were optimized using HF/6-31G*, a Hartree-Fock ab initio method with a medium-sized basis set. Constant Properties Included in the DIPPR Database Immediate neighbors of the target in the homologous series Range of the number of the carbon atoms Oxygen atom instead of sulfur - Property value (from DIPPR) p – No. of comps.in training set ζ - Descriptor Attainable accuracy measures (independent of the target comp. property value) 1. DIPPR uncertainty values for the properties of the training set members ; 2. Average (U avg ) and maximal (U max ) DIPPR uncertainty values 3. Training Set Average Error (TSAE) = 0.51% The ESpm01r is a 2D descriptor belonging to the "edje adjacency indices" group whose definition is: "Spectral moment 01 from edje adjacency matrix weighted by resonance integral". Prediction error for the target = 0.55% Comments: 1. n-hexanol outlier; 2. Different odd even populations; 3. No data for target 1-hexanol is a leverage point and an outlier Improved Training Set. Obtained by using only stable (non-3D descriptors. No oxygen atom containing compounds  The prediction accuracy can be enhanced by refinement of the training set and not by increasing the number of the descriptors in the TQSPR.  Further research is required for deriving training set refinement algorithms for various properties and various groups of compounds.