Multicollinearity. Multicollinearity (or intercorrelation) exists when at least some of the predictor variables are correlated among themselves. In observational.

Slides:



Advertisements
Similar presentations
Qualitative predictor variables
Advertisements

Multicollinearity.
More on understanding variance inflation factors (VIFk)
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Chicago Insurance Redlining Example Were insurance companies in Chicago denying insurance in neighborhoods based on race?
Objectives (BPS chapter 24)
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.
Analysis of Economic Data
Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)
Note 14 of 5E Statistics with Economics and Business Applications Chapter 12 Multiple Regression Analysis A brief exposition.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Linear Regression MARE 250 Dr. Jason Turner.
REGRESSION AND CORRELATION
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Slide 1 Larger is better case (Golf Ball) Linear Model Analysis: SN ratios versus Material, Diameter, Dimples, Thickness Estimated Model Coefficients for.
Correlation and Regression Analysis
Polynomial regression models Possible models for when the response function is “curved”
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.
Model selection Stepwise regression. Statement of problem A common problem is that there is a large set of candidate predictor variables. (Note: The examples.
Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.
A (second-order) multiple regression model with interaction terms.
Simple linear regression Linear regression with one predictor variable.
M23- Residuals & Minitab 1  Department of ISM, University of Alabama, ResidualsResiduals A continuation of regression analysis.
Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
TEAS prep course trends YOUR FUTURE BEGINS TODAY Joel Collazo, MD Maria E Guzman, MPM.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Detecting and reducing multicollinearity. Detecting multicollinearity.
Copyright ©2011 Nelson Education Limited Linear Regression and Correlation CHAPTER 12.
Solutions to Tutorial 5 Problems Source Sum of Squares df Mean Square F-test Regression Residual Total ANOVA Table Variable.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Sequential sums of squares … or … extra sums of squares.
Multiple regression. Example: Brain and body size predictive of intelligence? Sample of n = 38 college students Response (Y): intelligence based on the.
Inference with computer printouts. Coefficie nts Standard Errort StatP-value Lower 95% Upper 95% Intercept
Overview of our study of the multiple linear regression model Regression models with more than one slope parameter.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
A first order model with one binary and one quantitative predictor variable.
Review Session Linear Regression. Correlation Pearson’s r –Measures the strength and type of a relationship between the x and y variables –Ranges from.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons Business Statistics, 4e by Ken Black Chapter 14 Multiple Regression Analysis.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Regression through the origin
732G21/732G28/732A35 Lecture 4. Variance-covariance matrix for the regression coefficients 2.
Multiple Regression II 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 2) Terry Dielman.
Variable selection and model building Part I. Statement of situation A common situation is that there is a large set of candidate predictor variables.
Inference with Computer Printouts. Leaning Tower of Pisa Find a 90% confidence interval. Year Lean
Interaction regression models. What is an additive model? A regression model with p-1 predictor variables contains additive effects if the response function.
Go to Table of Content Correlation Go to Table of Content Mr.V.K Malhotra, the marketing manager of SP pickles pvt ltd was wondering about the reasons.
Agenda 1.Exam 2 Review 2.Regression a.Prediction b.Polynomial Regression.
732G21/732G28/732A35 Lecture 6. Example second-order model with one predictor 2 Electricity consumption (Y)Home size (X)
Correlation and Regression Elementary Statistics Larson Farber Chapter 9 Hours of Training Accidents.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
Descriptive measures of the degree of linear association R-squared and correlation.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
1 Multiple Regression. 2 Model There are many explanatory variables or independent variables x 1, x 2,…,x p that are linear related to the response variable.
Model selection and model building. Model selection Selection of predictor variables.
Chapter 20 Linear and Multiple Regression
Regression Diagnostics
Least Square Regression
9/19/2018 ST3131, Lecture 6.
Multiple Regression II
Correlation and Simple Linear Regression
Solutions for Tutorial 3
Multiple Regression II
Correlation and Simple Linear Regression
Ch 4.1 & 4.2 Two dimensions concept
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Multicollinearity

Multicollinearity (or intercorrelation) exists when at least some of the predictor variables are correlated among themselves. In observational studies, multicollinearity happens more often than not. So, we need to understand the effects of multicollinearity on regression analyses.

Example #1 n = 20 hypertensive individuals p-1 = 6 predictor variables

Example #1 BP Age Weight BSA Duration Pulse Age Weight BSA Duration Pulse Stress Blood pressure (BP) is the response.

What is effect on regression analyses if predictors are perfectly uncorrelated? x1 x2 y Pearson correlation of x1 and x2 = 0.000

The regression equation is y = x1 Predictor Coef SE Coef T P Constant x Analysis of Variance Source DF SS MS F P Regression Error Total Regress Y on X 1

Regress Y on X 2 The regression equation is y = x2 Predictor Coef SE Coef T P Constant x Analysis of Variance Source DF SS MS F P Regression Error Total

Regress Y on X 1 and X 2 The regression equation is y = x x2 Predictor Coef SE Coef T P Constant x x Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS x x

Regress Y on X 2 and X 1 The regression equation is y = x x1 Predictor Coef SE Coef T P Constant x x Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS x x

If predictors are perfectly uncorrelated, then… You get the same slope estimates regardless of the first-order regression model used. That is, the effect on the response ascribed to a predictor doesn’t depend on the other predictors in the model.

If predictors are perfectly uncorrelated, then… The sum of squares SSR(X 1 ) is the same as the sequential sum of squares SSR(X 1 |X 2 ). The sum of squares SSR(X 2 ) is the same as the sequential sum of squares SSR(X 2 |X 1 ). That is, the marginal contribution of one predictor variable in reducing the error sum of squares doesn’t depend on the other predictors in the model.

Same effects for “real data” with nearly uncorrelated predictors? BP Age Weight BSA Duration Pulse Age Weight BSA Duration Pulse Stress

Regress BP on Stress The regression equation is BP = Stress Predictor Coef SE Coef T P Constant Stress S = R-Sq = 2.7% R-Sq(adj) = 0.0% Analysis of Variance Source DF SS MS F P Regression Error Total

Regress BP on BSA The regression equation is BP = BSA Predictor Coef SE Coef T P Constant BSA S = R-Sq = 75.0% R-Sq(adj) = 73.6% Analysis of Variance Source DF SS MS F P Regression Error Total

Regress BP on BSA and Stress The regression equation is BP = BSA Stress Predictor Coef SE Coef T P Constant BSA Stress Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS BSA Stress

Regress BP on Stress and BSA The regression equation is BP = Stress BSA Predictor Coef SE Coef T P Constant Stress BSA Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS Stress BSA

If predictors are nearly uncorrelated, then… You get similar slope estimates regardless of the first-order regression model used. The sum of squares SSR(X 1 ) is similar to the sequential sum of squares SSR(X 1 |X 2 ). The sum of squares SSR(X 2 ) is similar to the sequential sum of squares SSR(X 2 |X 1 ).

What happens if the predictor variables are highly correlated? BP Age Weight BSA Duration Pulse Age Weight BSA Duration Pulse Stress

Regress BP on Weight The regression equation is BP = Weight Predictor Coef SE Coef T P Constant Weight S = R-Sq = 90.3% R-Sq(adj) = 89.7% Analysis of Variance Source DF SS MS F P Regression Error Total

Regress BP on BSA The regression equation is BP = BSA Predictor Coef SE Coef T P Constant BSA S = R-Sq = 75.0% R-Sq(adj) = 73.6% Analysis of Variance Source DF SS MS F P Regression Error Total

Regress BP on BSA and Weight The regression equation is BP = BSA Weight Predictor Coef SE Coef T P Constant BSA Weight Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS BSA Weight

Regress BP on Weight and BSA The regression equation is BP = Weight BSA Predictor Coef SE Coef T P Constant Weight BSA Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS Weight BSA

Effect #1 of multicollinearity When predictor variables are correlated, the regression coefficient of any one variable depends on which other predictor variables are included in the model. Variables in model b1b1 b2b2 X1X X2X X 1, X

Even correlated predictors not in the model can have an impact! Regression of territory sales on territory population, per capita income, etc. Against expectation, coefficient of territory population was determined to be negative. Competitor’s market penetration, which was strongly positively correlated with territory population, was not included in model. But, competitor kept sales down in territories with large populations.

Effect #2 of multicollinearity When predictor variables are correlated, the marginal contribution of any one predictor variable in reducing the error sum of squares varies, depending on which other variables are already in model. SSR(X 1 ) = SSR(X 1 |X 2 ) = SSR(X 2 ) = SSR(X 2 |X 1 ) = 2.81

Effect #3 of multicollinearity When predictor variables are correlated, the precision of the estimated regression coefficients decreases as more predictor variables are added to the model. Variables in model se(b 1 )se(b 2 ) X1X X2X X 1, X

What is the effect on estimating mean or predicting new response?

Weight Fit SE Fit 95.0% CI 95.0% PI (111.85,113.54) (108.94,116.44) BSA Fit SE Fit 95.0% CI 95.0% PI (112.76,115.38) (108.06,120.08) BSA Weight Fit SE Fit 95.0% CI 95.0% PI (111.93,113.83) (109.08, ) Effect #4 of multicollinearity on estimating mean or predicting Y High multicollinearity among predictor variables does not prevent good, precise predictions of the response (within scope of model).

The regression equation is BP = BSA Predictor Coef SE Coef T P Constant BSA S = R-Sq = 75.0% R-Sq(adj) = 73.6% Analysis of Variance Source DF SS MS F P Regression Error Total What is effect on tests of individual slopes?

The regression equation is BP = Weight Predictor Coef SE Coef T P Constant Weight S = R-Sq = 90.3% R-Sq(adj) = 89.7% Analysis of Variance Source DF SS MS F P Regression Error Total What is effect on tests of individual slopes?

The regression equation is BP = Weight BSA Predictor Coef SE Coef T P Constant Weight BSA Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS Weight BSA

Effect #5 of multicollinearity on slope tests When predictor variables are correlated, hypothesis tests for β k = 0 may yield different conclusions depending on which predictor variables are in the model. Variables in model b2b2 se(b 2 )tP-value X2X X 1, X

Summary comments Tests for slopes should generally be used to answer a scientific question and not for model building purposes. Even then, caution should be used when interpreting results when multicollinearity exists. (Think marginal effects.)

Summary comments (cont’d) Multicollinearity has little to no effect on estimation of mean response or prediction of future response.

Diagnosing multicollinearity Realized effects (changes in coefficients, changes in sequential sums of squares, etc.) of multicollinearity. Scatter plot matrices. Pairwise correlation coefficients among predictor variables. Variance inflation factors (VIF).