Chapter 9 Multicollinearity

Slides:



Advertisements
Similar presentations
3.3 Hypothesis Testing in Multiple Linear Regression
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
The Multiple Regression Model.
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
12-1 Multiple Linear Regression Models Introduction Many applications of regression analysis involve situations in which there are more than.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
12 Multiple Linear Regression CHAPTER OUTLINE
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Ch11 Curve Fitting Dr. Deshi Ye
A Short Introduction to Curve Fitting and Regression by Brad Morantz
Lecture 7: Principal component analysis (PCA)
Multiple regression analysis
The Simple Linear Regression Model: Specification and Estimation
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Chapter 13 Additional Topics in Regression Analysis
Chapter 10 Simple Regression.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Additional Topics in Regression Analysis
A quick introduction to the analysis of questionnaire data John Richardson.
The Simple Regression Model
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 11 Multiple Regression.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Ekonometrika 1 Ekonomi Pembangunan Universitas Brawijaya.
Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 5 Transformations and Weighting to Correct Model Inadequacies
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Introduction to Regression Analysis, Chapter 13,
Objectives of Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Regression Analysis (2)
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
1 Chapter 3 Multiple Linear Regression Multiple Regression Models Suppose that the yield in pounds of conversion in a chemical process depends.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Lecture 12 Factor Analysis.
MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED?
DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Chapter 4 Basic Estimation Techniques
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 6 Diagnostics for Leverage and Influence
Basic Estimation Techniques
Analysis of Variance in Matrix form
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Basic Estimation Techniques
6-1 Introduction To Empirical Models
Chapter 3 Multiple Linear Regression
MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED
Interval Estimation and Hypothesis Testing
MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED
Simple Linear Regression
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
Financial Econometrics Fin. 505
Model Adequacy Checking
Presentation transcript:

Chapter 9 Multicollinearity Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 9.1 Introduction Multicollinearity is a problem that plagues many regression models. It impacts the estimates of the individual regression coefficients. Uses of regression: 1. Identifying the relative effects of the regressor variables 2.  Prediction and/or estimation, and 3.  Selection of an appropriate set of variables for the model. Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 9.1 Introduction If all regressors are orthogonal, then multicollinearity is not a problem. This is a rare situation in regression analysis. More often than not, there are near-linear dependencies among the regressors such that is approximately true. If this sum holds exactly for a subset of regressors, then (X’X)-1 does not exist. Linear Regression Analysis 5E Montgomery, Peck & Vining

9.2 Sources of Multicollinearity Four primary sources The data collection method employed Constraints on the model or in the population Model specification An overdefined model Linear Regression Analysis 5E Montgomery, Peck & Vining

9.2 Sources of Multicollinearity Data collection method employed Occurs when only a subsample of the entire sample space has been selected. (Soft drink delivery: number of cases and distance tend to be correlated. That is, we may have data where only a small number of cases are paired with short distances, large number of cases paired with longer distances. ) We may be able to reduce this multicollinearity through the sampling technique used. There is no physical reason why you can’t sample in that area. Linear Regression Analysis 5E Montgomery, Peck & Vining

9.2 Sources of Multicollinearity Linear Regression Analysis 5E Montgomery, Peck & Vining

9.2 Sources of Multicollinearity Constraints on the model or in the population. (Electricity consumption: two variables x1 – family income and x2 – house size). Physical constraints are present, multicollinearity will exist regardless of collection method. Linear Regression Analysis 5E Montgomery, Peck & Vining

9.2 Sources of Multicollinearity Model Specification Polynomial terms can cause ill-conditioning in the X’X matrix. This is especially true if range on a regressor variable, x, is small. Linear Regression Analysis 5E Montgomery, Peck & Vining

9.2 Sources of Multicollinearity Overdefined model More regressor variables than observations. The best way to counter this is to remove regressor variables. Recommendations: 1) Redefine the model using smaller set of regressors; 2) do preliminary studies using subsets of regressors; or 3) use principal components type regressor methods to remove regressors. Linear Regression Analysis 5E Montgomery, Peck & Vining

9.3 Effects of Multicollinearity Strong multicollinearity can result in large variances and covariances for the least squares estimates of the coefficients. Recall from chapter 3, C = (X’X)-1 and   Strong multicollinearity between xj and any other regressor variable will cause Rj2 to be large, and thus Cjj to be large. In other words, the variance of the least squares estimate of the coefficient will be very large. Linear Regression Analysis 5E Montgomery, Peck & Vining

9.3 Effects of Multicollinearity Strong multicollinearity can also produce least-squares estimates of the coefficients that are too large in absolute value. The squared distance between the least squares estimate and the true parameter is denoted   Linear Regression Analysis 5E Montgomery, Peck & Vining

9.3 Effects of Multicollinearity Tr(X’X)-1 is the trace of a matrix – which is the sum of the main diagonal elements. With multicollinearity present, some of the eigenvalues of X’X will be small. Tr(matrix) = sum of the eigenvalues of the matrix. Let j > 0 be the jth eigenvalue of X’X. Then   Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

9.4 Multicollinearity Diagnostics Ideal characteristics of a multicollinearity diagnostic: We want the procedure to correctly indicate if multicollinearity is present; and, We want the procedure to provide some insight as to which regressors are causing the problem. Linear Regression Analysis 5E Montgomery, Peck & Vining

9.4.1 Examination of the Correlation Matrix If we scale and center the regressors in the X’X matrix, we have the correlation matrix. The pairwise correlation between two variables xi and xj is denoted rij. The off diagonal elements of the centered and scaled X’X matrix (X’X matrix in correlation form) are the pairwise correlations. If |rij| is close to unity, then there may be an indication of multicollinearity. But, the opposite does not always hold. That is, there may be instances when multicollinearity is present, but the pairwise correlations do not indicate a problem. This can happen when using pairwise correlations in a problem with more than two variables involved. Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining The correlation matrix fails to identify the multicollinearity problem in the Mason, Gunst & Webster data in Table 9.4, page 296. Linear Regression Analysis 5E Montgomery, Peck & Vining

9.4.2 Variance Inflation Factors As discussed in Chapter 3, variance inflation factors are very useful in determining if multicollinearity is present. VIFs > 5 to 10 are considered significant. The regressors that have high VIFs probably have poorly estimated regression coefficients Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

9.4.2 Variance Inflation Factors VIFs: A Second Look and Interpretation The length of the normal-theory confidence interval on the jth regression coefficient can be written as Linear Regression Analysis 5E Montgomery, Peck & Vining

9.4.2 Variance Inflation Factors VIFs: A Second Look and Interpretation The length of the corresponding normal-theory confidence interval based on a design with orthogonal regressors (with same sample size, same root-mean square (rms) values) is Linear Regression Analysis 5E Montgomery, Peck & Vining

9.4.2 Variance Inflation Factors VIFs: A Second Look and Interpretation Take the ratio of these two: Lj/L* = . That is, the square root of the jth VIF gives us a measure of how much longer the confidence interval for the jth regression coefficient is because of multicollinearity. For example, say VIF3 = 10. Then . This tells us that that the confidence interval is 3.3 times longer than if the regressors had been orthogonal (the best case scenario). Linear Regression Analysis 5E Montgomery, Peck & Vining

9.4.3 Eigensystem Analysis of X’X The eigenvalues of X’X (denoted 1, 2, …, p) can be used to measure multicollinearity. Small eigenvalues are indications of multicollinearity.   The condition number of X’X is This number measures the spread in the eigenvalues.  < 100, no serious problem 100 <  < 1000, moderate to strong multicollinearity  > 1000, strong multicollinearity. Linear Regression Analysis 5E Montgomery, Peck & Vining

9.4.3 Eigensystem Analysis of X’X A large condition number indicates multicollinearity exists. It does not tell us how many regressors are involved.   The condition indices of X’X are The number of condition indices that are large (greater than 1000) provide a measure of the number of near linear dependencies in X’X.   In SAS, PROC REG, in the model statement of your program, you can use the option COLLIN; this will produce out eigenvalues, condition indices, etc. Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

9.5 Methods for Dealing with Multicollinearity Collect more data Respecify the model Ridge Regression Linear Regression Analysis 5E Montgomery, Peck & Vining

9.5 Methods for Dealing with Multicollinearity Least squares estimation gives an unbiased estimate, with minimum variance – but this variance may still be very large, resulting in unstable estimates of the coefficients.  Alternative: Find an estimate that is biased but with smaller variance than the unbiased estimator Linear Regression Analysis 5E Montgomery, Peck & Vining

9.5 Methods for Dealing with Multicollinearity Ridge Estimator k is a “biasing parameter” usually between 0 and 1. Linear Regression Analysis 5E Montgomery, Peck & Vining

9.5 Methods for Dealing with Multicollinearity The effect of k on the MSE Recall: Now, As k , Var , and bias  Choose k such that the reduction in variance > increase in bias. Linear Regression Analysis 5E Montgomery, Peck & Vining

9.5 Methods for Dealing with Multicollinearity Ridge Trace Plots k against the coefficient estimates. If multicollinearity is severe, the ridge trace will show it. Choose k such that is stable and hope the MSE is acceptable Ridge regression is a good alternative if the model user wants to have all regressors in the model. Linear Regression Analysis 5E Montgomery, Peck & Vining

9.5 Methods for Dealing with Multicollinearity Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

More About Ridge Regression Methods for choosing k Relationship to other estimators Ridge regression and variable selection Generalized ridge regression (a procedure with a biasing parameter k for each regressor Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining 9.5.4 Principal-Component Regression Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining The eigenvalues suggest that a model based on 4 or 5 of the PCs would probably be adequate Linear Regression Analysis 5E Montgomery, Peck & Vining

Linear Regression Analysis 5E Montgomery, Peck & Vining

Models D and E are pretty similar Linear Regression Analysis 5E Montgomery, Peck & Vining