MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED

Slides:



Advertisements
Similar presentations
Welcome to Econ 420 Applied Regression Analysis
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Specification Error II
Introduction and Overview
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +
Chapter 9 Multicollinearity
Ekonometrika 1 Ekonomi Pembangunan Universitas Brawijaya.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Chapter 13.3 Multicollinearity.
Objectives of Multiple Regression
Lecture 17 Summary of previous Lecture Eviews. Today discussion  R-Square  Adjusted R- Square  Game of Maximizing Adjusted R- Square  Multiple regression.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Twelve.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Chap 6 Further Inference in the Multiple Regression Model
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
5-1 MGMG 522 : Session #5 Multicollinearity (Ch. 8)
MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED?
1/25 Introduction to Econometrics. 2/25 Econometrics Econometrics – „economic measurement“ „May be defined as the quantitative analysis of actual economic.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 15 Multiple Regression Model Building
Chapter 15 Multiple Regression and Model Building
Linear Regression.
Chapter 9 Multiple Linear Regression
Regression Chapter 6 I Introduction to Regression
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY
THE LINEAR REGRESSION MODEL: AN OVERVIEW
Chapter 5: The Simple Regression Model
Multiple Regression Analysis and Model Building
Chapter 4: The Nature of Regression Analysis
EED 401: ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati Haruna Issahaku.
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Multiple Regression and Model Building
Fundamentals of regression analysis
ECONOMETRICS DR. DEEPTI.
Multiple Regression Analysis
HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT?
Further Inference in the Multiple Regression Model
Two-Variable Regression Model: The Problem of Estimation
Chapter 6: MULTIPLE REGRESSION ANALYSIS
Multiple Regression Models
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
I271b Quantitative Methods
Undergraduated Econometrics
HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT?
MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED
Some issues in multivariate regression
CHAPTER 14 MULTIPLE REGRESSION
Multicollinearity Susanta Nag Assistant Professor Department of Economics Central University of Jammu.
Chapter 7: The Normality Assumption and Inference with OLS
Chapter 8: DUMMY VARIABLE (D.V.) REGRESSION MODELS
Regression Forecasting and Model Building
Chapter 13 Additional Topics in Regression Analysis
Financial Econometrics Fin. 505
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
Chapter 4: The Nature of Regression Analysis
Financial Econometrics Fin. 505
Financial Econometrics Fin. 505
Introduction to Regression
Presentation transcript:

MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED Econometrics Econ. 405 Chapter 9: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED

I. The Natural of Multicollinearity Multicollinearity arises when a linear relationship exists between two or more independent variables in a regression model. It practice, you rarely encounter perfect multicollinearity, but high multicollinearity is quite common. There are two types of multicollinearity.

1- Perfect multicollinearity: Occurs when two or more independent variables in a regression model exhibit a deterministic ( perfectly predictable) linear relationship. Perfect multicollinearity violates one of the CLRM assumptions. In such a case, OLS can not be used to estimate the value of parameters (β’s).

2- High multicollinearity: it results from a linear relationship between your independent variables with a high degree of correlation but aren’t completely deterministic. It is much more common than its perfect counterpart and can be equally problematic when it comes to estimating an econometric model. In practice, when econometricians point to a multicollinearity issue, they are typically referring to high multicollinearity rather than perfect multicollinearity.

Technically, the presence of high multicollinearity does not violate any CLRM assumption. Consequently, OLS estimates can be obtained and are BLUE with high multicollinearity.

As a numerical example: consider the following hypothetical data: It is apparent that X3i = 5X2i . Therefore, there is perfect collinearity between X2 and X3 since the coefficient of correlation r23 is unity. X2 X3 X*3 10 50 52 15 75 18 90 97 24 120 129 30 150 152

The variable X*3 was created from X3 by simply adding to it the following numbers, which were taken from a table of random numbers: 2, 0, 7, 9, 2. Now there is no longer perfect collinearity between X2 and X*3 as of (X3i = 5X2i + vi ). However, the two variables are highly correlated because calculations will show that the coefficient of correlation between them is 0.9959.

The preceding algebraic approach to multicollinearity can be portrayed in the following Figure :

II. Sources of Multicollinearity There are several sources of multicollinearity; 1. The data collection method employed: for example, sampling over a limited range of the values taken by the regressors in the population. 2. Constraints on the model or in the population being sampled: For example, in the regression of electricity consumption (Y) on income (X2) and house size (X3) (High X2 always mean high X3).

3. Model specification: for example, adding polynomial terms to a regression model, especially when the range of the X variable is small. 4. An overdetermined model: This happens when the model has more explanatory variables than the number of observations. 5. Common time trend component: especially in time series data, may be that the regressors included in the model share a common trend, that is, they all increase or decrease over time.

III. Consequences of high Multicollinearity Larger standard errors and insignificant t-statistic: the estimated variance of a coefficient in a multiple regression: Variance of the error term R-squared from a regression of explanatory variable xj on all other independent variables (including a constant)  

 

Coefficient estimates that are sensitive to changes in specification: If the IVs are highly collinear, the estimates must emphasize small differences in the variables in order to assign an independent effect to each of them. Adding or removing variables from the model can change the nature of the small differences and drastically change your coefficient estimates. In other words, your result s aren’t robust.

Nonsensical coefficient signs and magnitudes: with higher multicollinearity, the variance of the estimated coefficients increases, which in turn increases the changes of obtaining coefficient estimates with extreme values. Consequently, these estimates my have unbelievably large magnitudes and/or signs that counter the expected relationship between IVs and DV.

IV. Identifying Multicollinearity Because high multicollinearity does not violate a CLRM assumption and is a sample specific issue, researchers typically choose from a couple popular alternative to measure the degree or severity of multicollinearity. The most two common measurements to identify multicollinearity are Pairwise Correlation Coefficients, and Variance Inflation Factors.

1- Pairwise Correlation Coefficients The sample correlation coefficient measures the linear association between two independent variables ( Xk and Xj ):

The correlation matrix contains the correlation coefficients (ranges from 0 to |1| ) for each pair of independent variables. As a result , correlation coefficients around 0.8 or above may signal a multicollinearity problom. Of course, before offically determing the multicollinearity problem, you should check your results for evidence of multicollinearity ( insignificant t-statistic, sensitive coefficient estimates, and nonsensical coefficient signs and values).

Lower correlation coefficients do not necessairly indicate that you are a clear of multicollinearity. The pairwise correlation coefficients only identify the linear relationship of a variable with one other variable.

Example of correlation coefficients Matrix:

2- Variance Inflation Factors (VIF) Multicollinearity may be detected through: “Variance Inflation Factors (VIF)“ Where Rj² is R-squard value obtained by regressiong independent variable Xk on all the other independent variables in the model. The above test is used for each independent variable to determine the relationship.

Steps to conduct Variance Inflation Factors (VIF) test: Determine the econometric model & obtain the OLS estimates: Yi = β0 + β1Xi1 + β2Xi2+ β3Xi3 + ui Estimate auxiliary regression by regressing each IV on all other IVs and obtain the R² of each auxiliary regression: Xi1 = ծ0 + ծ2Xi2 + ծ3Xi3 + ui1 , Xi2 = λ0 + λ1Xi1 + λ3Xi3 + ui2 Xi3 = Π0 + Π1Xi1 + Π2 Xi2 + ui3 3) Obtain the VIF for each IV with the VIF formula. 4) When VIFs greater than 10 means a high likely multicollinearity problem. When VIFs between 5-10 means a somewhat likely multicollinearity issue. 5) Solusions for multicollinearity problem (Next).

V. Resolve Multicollinearity Issues Get more data: Gathering additional data not only improves the quality with your data but also helps with multicollinearity as it is a problem that related to the sample it self.

Use a new model: In order to address a multicollinearity issue, you should rethink of uour theoritical model or the way in which you expect your independent variables to influance your dependent variable. Expel the problem variable(s): Droping highly collinear independent variabls from your model is one way to address high multicollinearity. If variables are redundant, however, then droping a variable improves an overspecified model.