Multiple Linear and Polynomial Regression with Statistical Analysis Given a set of data of measured (or observed) values of a dependent variable: y i versus.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Lesson 10: Linear Regression and Correlation
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Chapter 7 Statistical Data Treatment and Evaluation
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Chapter 10 Curve Fitting and Regression Analysis
Ch11 Curve Fitting Dr. Deshi Ye
Correlation and regression
Objectives (BPS chapter 24)
Econ 140 Lecture 121 Prediction and Fit Lecture 12.
Chapter 10 Simple Regression.
Curve-Fitting Regression
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.
Statistical Treatment of Data Significant Figures : number of digits know with certainty + the first in doubt. Rounding off: use the same number of significant.
BCOR 1020 Business Statistics
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
RLR. Purpose of Regression Fit data to model Known model based on physics P* = exp[A - B/(T+C)] Antoine eq. Assumed correlation y = a + b*x1+c*x2 Use.
Linear Regression/Correlation
Relationships Among Variables
Correlation & Regression
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Lecture 15 Basics of Regression Analysis
Objectives of Multiple Regression
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
AN ITERATIVE METHOD FOR MODEL PARAMETER IDENTIFICATION 4. DIFFERENTIAL EQUATION MODELS E.Dimitrova, Chr. Boyadjiev E.Dimitrova, Chr. Boyadjiev BULGARIAN.
Chapter 12 Multiple Regression and Model Building.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Simple Linear Regression Models
Bivariate Regression (Part 1) Chapter1212 Visual Displays and Correlation Analysis Bivariate Regression Regression Terminology Ordinary Least Squares Formulas.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
Quality of Curve Fitting P M V Subbarao Professor Mechanical Engineering Department Suitability of A Model to a Data Set…..
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
10B11PD311 Economics REGRESSION ANALYSIS. 10B11PD311 Economics Regression Techniques and Demand Estimation Some important questions before a firm are.
Prediction, Goodness-of-Fit, and Modeling Issues Prepared by Vera Tabakova, East Carolina University.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Example x y We wish to check for a non zero correlation.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
732G21/732G28/732A35 Lecture 3. Properties of the model errors ε 4. ε are assumed to be normally distributed
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Stats Methods at IC Lecture 3: Regression.
Chapter 4: Basic Estimation Techniques
Regression Analysis.
Regression Analysis AGEC 784.
Part 5 - Chapter
Basic Estimation Techniques
Basic Estimation Techniques
Statistical Methods For Engineers
Chapter 12 Curve Fitting : Fitting a Straight Line Gab-Byung Chae
Multiple Regression Models
Simple Linear Regression
Linear Regression and Correlation
Linear Regression and Correlation
Regression and Correlation of Data
Presentation transcript:

Multiple Linear and Polynomial Regression with Statistical Analysis Given a set of data of measured (or observed) values of a dependent variable: y i versus n independent variables x 1i, x 2i, … x ni, multiple linear regression attempts to find the “best” values of the parameters a 0, a 1, …a n for the equation is the calculated value of the dependent variable at point i. The “best” parameters have values that minimize the squares of the errors In polynomial regression there is only one independent variable, thus

Multiple Linear and Polynomial Regression with Statistical Analysis Typical examples of multiple linear and polynomial regressions include correlation of temperature dependent physical properties, correlation of heat transfer data using dimensionless groups, correlation of non-ideal phase equilibrium data and correlation of reaction rate data. The software packages enable high precision correlation of the data, however statistical analysis is essential to determine the quality of the fit (how well the regression model fits the data) and the stability of the model (the level of dependence of the model parameters on the particular set of data). The most important indicators for such studies are the residual plot (quality of the fit) and 95% confidence intervals (stability of the model)

Woods et al(1932) investigated the integral heat of hardening of cement as a function of composition. The independent variables represent weight percent of the clinker compounds: x 1 -tricalcium aluminate (3CaO ·Al2O3), x 2 - tricalcium silicate (3CaO ·SiO2), x 3 -tetracalcium alumino-ferrite (4CaO ·Al2O3·Fe2O3), and x 4 - β-dicalcium silicate (3CaO ·SiO2). The dependent variable, y is the total heat evolved (in calories per gram cement) in a 180- day period. Regression and Analysis of “Heat of Hardening” Data

Calculate the coefficients of a linear model representation of y as function of x 1, x 2, x 3, and x 4, calculate the variance and the correlation coefficient R 2 and the 95% confidence intervals. Prepare a residual plot. Consider the cases when the model includes and does not include a free parameter. Regression and Analysis of “Heat of Hardening” Data

“Heat of Hardening” Data – Regression and Analysis by Polymath Non-zero interceptModel type

“Heat of Hardening” Data –Analysis of the Linear Model that Includes a Free Parameter Highly unstable model. All 95% confidence intervals larger in absolute value than the respective parameters. R 2 value close to 1, indicates good fit between model and data. May be misleading occasionally. Rmsd and variance values used for comparison between different models

“Heat of Hardening” Data – Residual Plot of the Linear Model that Includes a Free Parameter Random residual distribution indicates good fit between the model and the data Maximal error ~ 4%

“Heat of Hardening” Data – Demonstration of the Harmful Effect of the Instability Last data point removed Removal of the last data point causes substantial change in all the parameter values. In case of the parameter a 4 even its sign is changed

“Heat of Hardening” Data – A Stable Model is Obtained After Removing the Free Parameter Last data point removed

Correlation of Heat Capacity Data for Ethane A polynomial has to be fitted to heat capacity data provided by Ingham et al*. This data set includes 41 data points in the temperature range of 100 K – 400 K. The degree of the polynomial: where C p is the heat capacity in J/kg-mol·K, T is the temperature in K, and a 0, a 1,... are the regression model parameters, which best represents the data, has to be found. The goodness of fit should be determined based on the variance, the correlation coefficient (R 2 ), the confidence intervals of the parameters, and the residual plot. Ingham, H.; Friend, D.G.; Ely, J.F.; "Thermophysical Properties of Ethane"; J. Phys. Ref. Data 1991, 20, 275

Heat Capacity Data for Ethane, Fitting a 3 rd Degree Polynomial Model type

Heat Capacity Data for Ethane, Fitting a 3rd Degree Polynomial All the parameters indicate satisfactory model. Note, however, the differences of orders of magnitude between the parameter values. This may limit the highest degree of polynomial to be fitted Rmsd and variance values used for comparison between different models

Heat Capacity Data for Ethane, Calculated (3rd Degree Polynomial) and Experimental Values On the scale of the entire range of the C p data the fit seems to be excellent

Heat Capacity Data for Ethane, Residual Plot for the 3rd Degree Polynomial Maximal Error ~ 1% High resolution residual plot shows oscillatory behavior which is not explained by the 3 rd degree polynomial

Heat Capacity Data for Ethane, Defining Standardized Temperature Values for High Order Polynomial Fitting

Heat Capacity Data for Ethane, Fitting a 5rd Degree Polynomial Using standardized values yields model parameters of similar magnitude, enables fitting higher order polynomials and improves considerably all the statistical indicators

Heat Capacity Data for Ethane, Residual Plot of a 5 th Degree Polynomial Maximal Error ~ 0.03% Using standardized independent variable values enables fitting polynomials with precision higher than justified by the experimental error.

Modeling Vapor Pressure Data for Ethane A vapor pressure data set provided by Ingham et al* includes 107 data points in the temperature range of 92 K – 304 K. This temperature range covers almost completely the range between the tripe point temperature (= K) and the critical temperature (T C = K). The temperature dependence of the vapor pressure should be modeled by the Clapeyron, Antoine and Wagner equations The Clapeyron equation is a two parameter equation: *Ingham, H.; Friend, D.G.; Ely, J.F.; "Thermophysical Properties of Ethane"; J. Phys. Ref. Data 1991, 20, 275 where P is the vapor pressure (Pa), T – temperature (K), A and B are parameters

Modeling Vapor Pressure Data for Ethane by the Clapeyron Equation using Linear Regression = ln(P_Pa) = 1/T_K

Modeling Vapor Pressure Data for Ethane by the Clapeyron Equation using Linear Regression All the indicators show good, acceptable fit!

Modeling Vapor Pressure Data for Ethane by the Clapeyron Equation using Linear Regression The residual plot reveals large unexplained curvature in the data Maximal error in ln(P) ~80% in P ~ 52%

Modeling Vapor Pressure Data for Ethane by the Antoine Equation using Non-linear Regression Initial guess from Clapeyron eqn. Model type and solution algorithm

Modeling Vapor Pressure Data for Ethane by the Antoine Equation using Non-linear Regression Experimental and calculated values cannot be distinguished. Variance smaller by 2 orders of magnitude than Clapeyron

Modeling Vapor Pressure Data for Ethane by the Antoine Equation using Non-linear Regression Maximal error in ln(P) ~1% in P ~ 5% Random residual distribution in the low pressure range, unexplained curvature in the high pressure range

Modeling Vapor Pressure Data for Ethane with the Wagner Equation Where T R = T/T C is the reduced temperature P R = P/P C is the reduced pressure and τ = 1 - T R. For ethane T C = K, P C =4.8720E+06 Pa In order to obtain the model parameters using linear regression the following variables are defined: Tr = T_K / lnPr = ln(P_Pa / ) t = (1 - Tr) / Tr t15 = (1 - Tr) ^ 1.5 / Tr t3 = (1 - Tr) ^ 3 / Tr t6 = (1 - Tr) ^ 6 / Tr

Modeling Vapor Pressure Data for Ethane with the Wagner Equation using Multiple Linear Regression Tr = T_K / lnPr = ln(P_Pa / ) t = (1 - Tr) / Tr t15 = (1 - Tr) ^ 1.5 / Tr t3 = (1 - Tr) ^ 3 / Tr t6 = (1 - Tr) ^ 6 / Tr

Modeling Vapor Pressure Data for Ethane with the Wagner Equation using Multiple Linear Regression Maximal Error in ln(Pr) ~ 0.6% Note random residuals distribution in the entire data range