Predicting a Variety of Constant Pure Compound Properties by the Targeted QSPR Method Abstract The possibility of obtaining a reliable prediction a wide.

Slides:



Advertisements
Similar presentations
3.3 Hypothesis Testing in Multiple Linear Regression
Advertisements

Object Specific Compressed Sensing by minimizing a weighted L2-norm A. Mahalanobis.
1 Outliers and Influential Observations KNN Ch. 10 (pp )
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Calibration Methods Introduction
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Bioinformatics Vol. 21 no (Pages ) Reporter: Yu Lun Kuo (D )
PROBABILISTIC ASSESSMENT OF THE QSAR APPLICATION DOMAIN Nina Jeliazkova 1, Joanna Jaworska 2 (1) IPP, Bulgarian Academy of Sciences, Sofia, Bulgaria (2)
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
Elaine Martin Centre for Process Analytics and Control Technology University of Newcastle, England The Conjunction of Process and.
Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Combining Statistical and Physical Considerations in Deriving Targeted QSPRs Using Very Large Molecular Descriptor Databases Inga Paster and Mordechai.
Forecasting and Statistical Process Control MBA Statistics COURSE #5.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
Offset Free Tracking with MPC under Uncertainty: Experimental Verification Audun Faanes * and Sigurd Skogestad † Department of Chemical Engineering Norwegian.
MODEGAT Chalmers University of Technology Use of Latent Variables in the Parameter Estimation Process Jonas Sjöblom Energy and Environment Chalmers.
Fundamentals of Data Analysis Lecture 9 Management of data sets and improving the precision of measurement.
Using Partitioning in the Numerical Treatment of ODE Systems with Applications to Atmospheric Modelling Zahari Zlatev National Environmental Research Institute.
“Topological Index Calculator” A JavaScript application to introduce quantitative structure-property relationships (QSPR) in undergraduate organic chemistry.
1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, TX 75265
Experimental Design If a process is in statistical control but has poor capability it will often be necessary to reduce variability. Experimental design.
Temperature Controller A model predictive controller (MPC) based on the controller proposed by Muske and Rawlings (1993) is used. For the predictions we.
Considering Physical Property Uncertainties in Process Design Abstract A systematic procedure has been developed for process unit design based on the “worst.
1 Reg12M G Multiple Regression Week 12 (Monday) Quality Control and Critical Evaluation of Regression Results An example Identifying Residuals Leverage:
2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be.
What is "In" and What is "Out" in Engineering Problem Solving Mordechai Shacham Chem. Eng. Dept., Ben-Gurion University, Beer-Sheva 84105,Israel Michael.
Identifying Applicability Domains for Quantitative Structure Property Relationships Mordechai Shacham a, Neima Brauner b Georgi St. Cholakov c and Roumiana.
SAFETY ANALYSIS WITH MODEL-BASED DYNAMIC SIMULATION ON MOBILE DEVICES Mordechai Shacham and Michael Elly Ben Gurion University of the Negev Beer-Sheva,
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
Solution of a Partial Differential Equations using the Method of Lines
Identification and Estimation of the Influential Parameters in Bioreaction Systems Mordechai Shacham Ben Gurion University of the Negev Beer-Sheva, Israel.
Evaluation of a Targeted-QSPR Based Pure Compound Property Prediction System Abstract The use of the DD – TQSPR (Dominant-Descriptor Targeted QSPR) method.
Chapter One Chemical Foundations. Section 1.1 Chemistry an Overview Macroscopic World Macroscopic World Microscopic World Microscopic World Process for.
Organic pollutants environmental fate: modeling and prediction of global persistence by molecular descriptors P.Gramatica, F.Consolaro and M.Pavan QSAR.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Selection of Molecular Descriptor Subsets for Property Prediction Inga Paster a, Neima Brauner b and Mordechai Shacham a, a Department of Chemical Engineering,
A "Reference Series" Method for Prediction of Properties of Long-Chain Substances Inga Paster and Mordechai Shacham Dept. Chem. Eng. Ben-Gurion University.
Correlation of Solid Solubility for Biological Compounds in Supercritical Carbon Dioxide: Comparative Study Using Solution Model and Other Approaches Jaw-Shin.
1 Module One: Measurements and Uncertainties No measurement can perfectly determine the value of the quantity being measured. The uncertainty of a measurement.
Unit 1 How do we distinguish substances?
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
BME 353 – BIOMEDICAL MEASUREMENTS AND INSTRUMENTATION MEASUREMENT PRINCIPLES.
A molecular descriptor database for homologous series of hydrocarbons ( n - alkanes, 1-alkenes and n-alkylbenzenes) and oxygen containing organic compounds.
Modular and Sequential Construction of Complex Process Models – Applications to Process Hazard Assessment Mordechai Shacham Dept. of Chem. Engng, Ben Gurion.
1 Prediction of Phase Equilibrium Related Properties by Correlations Based on Similarity of Molecular Structures N. Brauner a, M. Shacham b, R.P. Stateva.
Experimental Ranging With Mica2 Motes M. Allen, E. Gaura, R. Newman, S. Mount Cogent Computing, Coventry University The experimental work here makes use.
1 Classification and purification of Organic Compounds.
Martina Uray Heinz Mayer Joanneum Research Graz Institute of Digital Image Processing Horst Bischof Graz University of Technology Institute for Computer.
Logistic Regression: Regression with a Binary Dependent Variable.
Research Design
Chapter 13 Simple Linear Regression
Process Design Course Using the NIST, DIPPR and DDBSP databases for Finding Physical, Chemical and Thermodynamic Properties Process Design Course.
Problem Solving in Chemical Engineering with Numerical Methods
Chapter 6 Diagnostics for Leverage and Influence
Bulgarian Academy of Sciences
Ben-Gurion University
Tirza Routtenberg Dept. of ECE, Ben-Gurion University of the Negev
Chapter 10.1: Organic chemistry Fundamentals
Mordechai Shacham, Dept. of Chem
Prediction of Coal Free-Swelling Index using Mathematical Modelling
Break and Noise Variance
The break signal in climate records: Random walk or random deviations
Physics and Chemistry 2019 General Syllabuses
Devon Walker* and John Kitchin
Statistical Prediction and Molecular Dynamics Simulation
M.Pavan, P.Gramatica, F.Consolaro, V.Consonni, R.Todeschini
Unfolding with system identification
Presentation transcript:

Predicting a Variety of Constant Pure Compound Properties by the Targeted QSPR Method Abstract The possibility of obtaining a reliable prediction a wide variety of constant properties is examined. To this aim, a modified version of the Targeted QSPR (Brauner et al., Ind. Eng. Chem. Res., 45, 8430, 2006) method is applied. The prediction of a particular property of a target compound is carried out in two stages. The first stage involves the identification of a similarity group and a small training set whose members are structurally similar to the target compound. This stage is carried out based on a robust sub-set of the descriptor data-base (no 2D or 3D descriptors) that reflects the diversity in the chemical structures. In the second stage, the full data-base of molecular descriptors is used to develop a single-descriptor linear QSPR (TQSPR1) based on the available property data for the training set. Statistical indicators are introduced which enable a reliable estimation of the prediction uncertainty for the (unknown) property of the target compound based on the training set data. It is shown that while increasing the number of descriptors in the QSPR enables better representation of the training set data, it may significantly deteriorate the prediction of the target compound property value. If necessary, improved prediction is achievable by using the statistical information to refine the training set, rather than by increasing the number of the descriptors used. It is demonstrated that by proper adjustment of the training set, the great majority of the constant properties can be predicted within the experimental error level. Mordechai Shacham,, Dept. of Chem. Engng, Ben Gurion University of the Negev, Beer-Sheva, Israel Neima Brauner, School of Engineering, Tel-Aviv University, Tel-Aviv, Israel,  The TQSPR1 method was able to predict 32 properties of the target compound within the experimental error level.  The appropriate training set (similarity group) is dependent on the target property.  The TSAE (Training Set Average Error) has proven to be a good indicator for the appropriateness of the training set and the prediction accuracy of TQSPR1. This criterion is independent of the target-compound properties. Prediction of properties of n-hexyl mercaptan – basic training set Summary of Results for 32 Properties – Optimal Training Sets Conclusions Constant Properties Included in the DIPPR Database - Property value (from DIPPR) p – No. of comps.in training set ζ - Descriptor Attainable accuracy measures (independent of the target comp. property value) 1. DIPPR uncertainty values for the properties of the training set members ; 2. Average (U avg ) and maximal (U max ) DIPPR uncertainty values 3. Training Set Average Error (TSAE) Mv – Mean atomic van der Waals volume –scaled on Carbon atom  The prediction accuracy can be enhanced by refinement of the training set and not by increasing the number of the descriptors in the TQSPR.  The descriptor subset used here for identifying a refined training set has proven to be appropriate for some homologous series. Work currently is underway to identify descriptors subsets that are appropriate for other groups of compounds. TSAE = 3.4 % Prediction Error = 10% 3D-Morse signal 29/weighted by atomic masses Statistical indicators: 1. Outlying (high leverage) descriptor values can be detected based on excessive values of the diagonal hat matrix elements: h ii. 2. Outlying property values can be detected by high value of the studentized deleted residual t i of component i h 99 = 1 TSAE= 60 % Prediction error = 37 % A property and molecular descriptor database containing 1798 compounds for which 34 constant properties (source: DIPPR database ) and 3224 descriptors (source: Dragon 5.5, ) are available. Several variations of training sets of compounds were used: 1. A “basic” training set identified using the full set of the available descriptors; 2. A “refined” training set identified using only “constitutional” and “functional group count” descriptors; 3. Use of only odd (or even) carbon number compounds in the training set; 4. Removal of compounds with outlying property values Training set Identification and Refinement Property and Descriptor Databases “Basic” and “Refined” Training Sets of n -hexyl mercaptan Oxygen atom instead of sulfur Range of the numbers of the carbon atoms Immediate neighbors of the target in the homologous series TSAE = 0.65 % Prediction Error = 0.45%