Polynomial regression models Possible models for when the response function is “curved”

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Qualitative predictor variables
Multicollinearity.
More on understanding variance inflation factors (VIFk)
Objectives (BPS chapter 24)
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Simple Linear Regression Estimates for single and mean responses.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.
Design and Analysis of Experiments Dr. Tai-Yue Wang Department of Industrial and Information Management National Cheng Kung University Tainan, TAIWAN,
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Linear Regression MARE 250 Dr. Jason Turner.
REGRESSION AND CORRELATION
Slide 1 Larger is better case (Golf Ball) Linear Model Analysis: SN ratios versus Material, Diameter, Dimples, Thickness Estimated Model Coefficients for.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.
Lecture 5 Correlation and Regression
Correlation & Regression
Objectives of Multiple Regression
Introduction to Linear Regression and Correlation Analysis
Simple linear regression Linear regression with one predictor variable.
STA302/ week 111 Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible.
M23- Residuals & Minitab 1  Department of ISM, University of Alabama, ResidualsResiduals A continuation of regression analysis.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons Business Statistics, 4e by Ken Black Chapter 15 Building Multiple Regression Models.
Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression.
Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
Chapter 8: Regression Models for Quantitative and Qualitative Predictors Ayona Chatterjee Spring 2008 Math 4813/5813.
An alternative approach to testing for a linear association The Analysis of Variance (ANOVA) Table.
Detecting and reducing multicollinearity. Detecting multicollinearity.
1 Lecture 4 Main Tasks Today 1. Review of Lecture 3 2. Accuracy of the LS estimators 3. Significance Tests of the Parameters 4. Confidence Interval 5.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Copyright ©2011 Nelson Education Limited Linear Regression and Correlation CHAPTER 12.
Solutions to Tutorial 5 Problems Source Sum of Squares df Mean Square F-test Regression Residual Total ANOVA Table Variable.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Non-linear Regression Example.
Time Series Analysis – Chapter 6 Odds and Ends
Lack of Fit (LOF) Test A formal F test for checking whether a specific type of regression function adequately fits the data.
Multiple regression. Example: Brain and body size predictive of intelligence? Sample of n = 38 college students Response (Y): intelligence based on the.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Overview of our study of the multiple linear regression model Regression models with more than one slope parameter.
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
A first order model with one binary and one quantitative predictor variable.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Statistics and Numerical Method Part I: Statistics Week VI: Empirical Model 1/2555 สมศักดิ์ ศิวดำรงพงศ์ 1.
Regression through the origin
Chapter 12 Simple Linear Regression.
732G21/732G28/732A35 Lecture 4. Variance-covariance matrix for the regression coefficients 2.
Inference with Computer Printouts. Leaning Tower of Pisa Find a 90% confidence interval. Year Lean
Multicollinearity. Multicollinearity (or intercorrelation) exists when at least some of the predictor variables are correlated among themselves. In observational.
Interaction regression models. What is an additive model? A regression model with p-1 predictor variables contains additive effects if the response function.
Agenda 1.Exam 2 Review 2.Regression a.Prediction b.Polynomial Regression.
732G21/732G28/732A35 Lecture 6. Example second-order model with one predictor 2 Electricity consumption (Y)Home size (X)
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
Descriptive measures of the degree of linear association R-squared and correlation.
Design and Analysis of Experiments (5) Fitting Regression Models Kyung-Ho Park.
Analysis of variance approach to regression analysis … an (alternative) approach to testing for a linear association.
1 Multiple Regression. 2 Model There are many explanatory variables or independent variables x 1, x 2,…,x p that are linear related to the response variable.
David Housman for Math 323 Probability and Statistics Class 05 Ion Sensitive Electrodes.
Chapter 15 Inference for Regression. How is this similar to what we have done in the past few chapters?  We have been using statistics to estimate parameters.
Announcements There’s an in class exam one week from today (4/30). It will not include ANOVA or regression. On Thursday, I will list covered material and.
Chapter 20 Linear and Multiple Regression
9/19/2018 ST3131, Lecture 6.
Inference for Regression Lines
Solutions for Tutorial 3
Presentation transcript:

Polynomial regression models Possible models for when the response function is “curved”

Uses of polynomial models When the true response function really is a polynomial function. (Very common!) When the true response function is unknown or complex, but a polynomial function approximates the true function well.

Example What is impact of exercise on human immune system? Is amount of immunoglobin in blood (y) related to maximal oxygen uptake (x) (in a curved manner)?

Scatter plot

A quadratic polynomial regression function where: Y i = amount of immunoglobin in blood (mg) X i = maximal oxygen uptake (ml/kg) typical assumptions about error terms (“INE”)

Estimated quadratic function

Interpretation of the regression coefficients If 0 is a possible x value, then b 0 is the predicted response. Otherwise, interpretation of b 0 is meaningless. b 1 does not have a very helpful interpretation. It is the slope of the tangent line at x = 0. b 2 indicates the up/down direction of curve –b 2 < 0 means curve is concave down –b 2 > 0 means curve is concave up

The regression equation is igg = oxygen oxygensq Predictor Coef SE Coef T P VIF Constant oxygen oxygensq S = R-Sq = 93.8% R-Sq(adj) = 93.3% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Source DF Seq SS oxygen oxygensq

A multicollinearity problem Pearson correlation of oxygen and oxygensq = 0.995

“Center” the predictors Mean of oxygen = oxygen oxcent oxcentsq

Does it really work? Pearson correlation of oxcent and oxcentsq = 0.219

A better quadratic polynomial regression function wheredenotes the centered predictor, and β * 0 = mean response at the predictor mean β * 1 = “linear effect coefficient” β * 11 = “quadratic effect coefficient”

The regression equation is igg = oxcent oxcentsq Predictor Coef SE Coef T P VIF Constant oxcent oxcentsq S = R-Sq = 93.8% R-Sq(adj) = 93.3% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Source DF Seq SS oxcent oxcentsq

Interpretation of the regression coefficients b 0 is predicted response at the predictor mean. b 1 is the estimated slope of the tangent line at the predictor mean; and, typically, also the estimated slope in the simple model. b 2 indicates the up/down direction of curve –b 2 < 0 means curve is concave down –b 2 > 0 means curve is concave up

Estimated regression function

Similar estimates

The relationship between the two forms of the model Centered model: Original model: Where:

Mean of oxygen =

What is predicted IgG if maximal oxygen uptake is 90? There is an even greater danger in extrapolation when modeling data with a polynomial function, because of changes in direction. Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI (1689.8,2589.5) (1639.6,2639.7) XX X denotes a row with X values away from the center XX denotes a row with very extreme X values Values of Predictors for New Observations New Obs oxcent oxcentsq

It is possible to “overfit” the data with polynomial models.

It is even theoretically possible to fit the data perfectly. If you have n data points, then a polynomial of order n-1 will fit the data perfectly, that is, it will pass through each data point. ** Error ** Not enough non-missing observations to fit a polynomial of this order; execution aborted But, good statistical software will keep an unsuspecting user from fitting such a model.

The hierarchical approach to model fitting Widely accepted approach is to fit a higher-order model and then explore whether a lower-order (simpler) model is adequate. Is a first-order linear model (“line”) adequate?

The hierarchical approach to model fitting But then … if a polynomial term of a given order is retained, then all related lower-order terms are also retained. That is, if a quadratic term was significant, you would use this regression function: and not this one:

Example Quality of a product (y) – a score between 0 and 100 Temperature (x 1 ) – degrees Fahrenheit Pressure (x 2 ) – pounds per square inch

A two-predictor, second-order polynomial regression function where: Y i = quality X i1 = temperature X i2 = pressure β 12 = “interaction effect coefficient”

The regression equation is quality = temp pressure tempsq presssq tp Predictor Coef SE Coef T P VIF Constant temp pressure tempsq Press tp S = R-Sq = 99.3% R-Sq(adj) = 99.1%

Again, some correlation quality temp pressure tempsq presssq temp pressure tempsq presssq tp Cell Contents: Pearson correlation

A better two-predictor, second-order polynomial regression function where: Y i = quality x i1 = centered temperature x i2 = centered pressure β * 12 = “interaction effect coefficient”

Reduced correlation quality tcent pcent tpcent tcentsq tcent pcent tpcent tcentsq pcentsq Cell Contents: Pearson correlation

The regression equation is quality = tcent pcent tpcent tcentsq pcentsq Predictor Coef SE Coef T P VIF Constant tcent pcent tpcent tcentsq pcentsq S = R-Sq = 99.3% R-Sq(adj) = 99.1%

Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI (93.424,96.428) (91.125,98.726) Values of Predictors for New Observations New Obs tcent pcent tpcent tcentsq pcentsq