A molecular descriptor database for homologous series of hydrocarbons ( n - alkanes, 1-alkenes and n-alkylbenzenes) and oxygen containing organic compounds.

Slides:



Advertisements
Similar presentations
Design of Experiments Lecture I
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 12 Inference for Linear Regression
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Forecasting Using the Simple Linear Regression Model and Correlation
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Objectives (BPS chapter 24)
PROBABILISTIC ASSESSMENT OF THE QSAR APPLICATION DOMAIN Nina Jeliazkova 1, Joanna Jaworska 2 (1) IPP, Bulgarian Academy of Sciences, Sofia, Bulgaria (2)
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Cheminformatics II Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.
2DS00 Statistics 1 for Chemical Engineering Lecture 3.
Regression and Correlation
Evaluating Hypotheses
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. by Lale Yurttas, Texas A&M University Chapter 171 CURVE.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Business Statistics - QBM117 Statistical inference for regression.
Designing a Separations Process Without VLE Data by Thomas Schafer - Koch Modular Process Systems, LLC This presentation utilizes as it’s example a problem.
Correlation & Regression
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Multiple Linear and Polynomial Regression with Statistical Analysis Given a set of data of measured (or observed) values of a dependent variable: y i versus.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Combining Statistical and Physical Considerations in Deriving Targeted QSPRs Using Very Large Molecular Descriptor Databases Inga Paster and Mordechai.
Hydrocarbons Part 1 Nat
Biostatistics Unit 9 – Regression and Correlation.
Carbon Compounds 2b(i) Int
Adventures in Thermochemistry James S. Chickos * Department of Chemistry and Biochemistry University of Missouri-St. Louis Louis MO 63121
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Predicting a Variety of Constant Pure Compound Properties by the Targeted QSPR Method Abstract The possibility of obtaining a reliable prediction a wide.
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Considering Physical Property Uncertainties in Process Design Abstract A systematic procedure has been developed for process unit design based on the “worst.
What is "In" and What is "Out" in Engineering Problem Solving Mordechai Shacham Chem. Eng. Dept., Ben-Gurion University, Beer-Sheva 84105,Israel Michael.
Identifying Applicability Domains for Quantitative Structure Property Relationships Mordechai Shacham a, Neima Brauner b Georgi St. Cholakov c and Roumiana.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Regression Regression relationship = trend + scatter
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
An Introduction to Organic Chemistry. Orgins Originally defined as the chemistry of living materials or originating from living sources Wohler synthesized.
Identification and Estimation of the Influential Parameters in Bioreaction Systems Mordechai Shacham Ben Gurion University of the Negev Beer-Sheva, Israel.
Evaluation of a Targeted-QSPR Based Pure Compound Property Prediction System Abstract The use of the DD – TQSPR (Dominant-Descriptor Targeted QSPR) method.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
What does boiling temperature measure?. Figure. The boiling temperatures of the n-alkanes.
Are there any ways to estimate melting points? What do melting points measure? “Melting is a function of the detailed structure of the crystalline state,
Selection of Molecular Descriptor Subsets for Property Prediction Inga Paster a, Neima Brauner b and Mordechai Shacham a, a Department of Chemical Engineering,
A "Reference Series" Method for Prediction of Properties of Long-Chain Substances Inga Paster and Mordechai Shacham Dept. Chem. Eng. Ben-Gurion University.
Organic Chemistry Nathan Watson Lincoln High School Portland, OR.
A) I. I. Mechnikov National University, Chemistry Department, Dvorianskaya 2, Odessa 65026, Ukraine, b) Department of Molecular.
Error Analysis, Statistics, Graphing and Excel Necessary skills for Chem V01BL.
MBF1413 | Quantitative Methods Prepared by Dr Khairul Anuar 8: Time Series Analysis & Forecasting – Part 1
1 Prediction of Phase Equilibrium Related Properties by Correlations Based on Similarity of Molecular Structures N. Brauner a, M. Shacham b, R.P. Stateva.
732G21/732G28/732A35 Lecture 3. Properties of the model errors ε 4. ε are assumed to be normally distributed
DEMAND FORECASTING & MARKET SEGMENTATION. Why demand forecasting?  Planning and scheduling production  Acquiring inputs  Making provision for finances.
Stats Methods at IC Lecture 3: Regression.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Part 5 - Chapter
Part 5 - Chapter 17.
Problem Solving in Chemical Engineering with Numerical Methods
Regression and Correlation
Mordechai Shacham, Dept. of Chem
Statistical Methods For Engineers
Chapter 12 Curve Fitting : Fitting a Straight Line Gab-Byung Chae
Part 5 - Chapter 17.
Hydrocarbons Part 1 Nat
Adequacy of Linear Regression Models
Adequacy of Linear Regression Models
Adequacy of Linear Regression Models
Presentation transcript:

A molecular descriptor database for homologous series of hydrocarbons ( n - alkanes, 1-alkenes and n-alkylbenzenes) and oxygen containing organic compounds (1-alkanols and n-alkanoic acids) have been prepared. Dragon ( ) was used to calculate 1280 descriptors for the compounds in the database. Melting point data were taken from the DIPPR database ( ). The SROV stepwise regression program( C&ChE, 2003, 27 (5), ) was used to identify the most appropriate QSPRs. For development of the QSPR the members of the homologous series are divided into a training set which includes only compounds for which T m data are available (typically 10 compounds) and an evaluation set in which T m data are available only for part of the compounds. A linear structure-property (s-p) relationship between y (vector of property values for the training set members ) and ζ 1, ζ 2 … ζ m (vectors of predictive molecular descriptors for the same compounds) is assumed of the form: For example, plotting T m of the members of the n -Alkanoic Acid series versus the dominant descriptor EEig06x (see previous Figure) yields high correlation coefficient of R 2 = For predicting T m within experimental error level for the n -Alkanoic Acid series, the four descriptor linear QSPR: Prediction of the Melting Point Temperature Using a Linear QSPR for Homologous Series Abstract Methods for deriving Quantitative Structure Property Relationships (QSPR) are known to be challenged by the prediction of the normal melting temperature ( T m ). Even in a particular homologous series, the change of T m with the number of C atoms is highly irregular. Nevertheless, it is shown that by limiting the QSPR applicability domain to a particular homologous series, it is possible to identify a small set of descriptors whose linear combination represents T m within experimental error level. Confidence in the predicted values in both interpolation and extrapolation is considerably enhanced by ensuring random residual distribution in the training set used. The so- obtained prediction error is lower than that reported in the literature for all the homogenous series considered in this study. Inga Paster and Mordechai Shacham,, Dept. of Chem. Engng, Ben Gurion University of the Negev, Beer-Sheva, Israel Neima Brauner, School of Engineering, Tel-Aviv University, Tel-Aviv, Israel  It has been demonstrated limiting the range of applicability of the QSPR to a particular homologous series and using a very large bank of descriptors it is possible to identify a small set of descriptors whose linear combination represents T m within experimental error level, even if the change of T m with the number of C atoms is highly irregular.  The magnitude of the correlation coefficient between T m and the dominant descriptor for the training set plays a key role in determining the reliability of the prediction. If this value is not close enough to one a different training set for the targeted compounds has to be sought. Deriving Linear Quantitative Structure Property Relationships for Homologous Series The Role of the Dominant Descriptor in Obtaining Reliable Predictions Predicting Tm for the 1-alkene series Conclusions (1) The dominant descriptor is the descriptor with the highest correlation with T m for the training set and it is the first descriptor to enter the QSPR. If it represents the general trend of the change of the property as function of temperature well, there are good chances to obtain accurate predictions. Normal Melting Temperatures of n-Alkanoic Acids Versus the Number of Carbon Atoms The descriptors are selected to the model in a stepwise manner according to the value of the partial correlation coefficient, |  yj | between the vector of the property values y, and that of a potential predictive descriptor ζ j. T m values for the members of the evaluation set are estimated by: (2) Note high R 2 value T m = PJI IVDE EEig06x Mor16v can be used. This QSPR yields predictions with < 1% error for members of these series, for methanoic acid through eicosanoic acid range. For the 1-alkene series the following three descriptor QSPR found to be appropriate: T m = TIC BELp L2p This QSPR yields the following predicted values and prediction errors Note that the percent error is the highest for the low carbon number compounds (1-pentene). Similar results were obtained for the n -alkane, n -alkylbenzene and 1- alkanol series