Multicollinearity Multicollinearity - violation of the assumption that no independent variable is a perfect linear function of one or more other independent.

Slides:



Advertisements
Similar presentations
Autocorrelation and Heteroskedasticity
Advertisements

Managerial Economics in a Global Economy
Welcome to Econ 420 Applied Regression Analysis
The Multiple Regression Model.
Hypothesis Testing Steps in Hypothesis Testing:
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Correlation and Regression
Introduction and Overview
Correlation and regression
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
Objectives (BPS chapter 24)
The Simple Linear Regression Model: Specification and Estimation
Multiple Linear Regression Model
The Simple Regression Model
CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +
Chapter Topics Types of Regression Models
Econ 140 Lecture 191 Heteroskedasticity Lecture 19.
Topic 3: Regression.
Multiple Regression and Correlation Analysis
Chapter 12 Section 1 Inference for Linear Regression.
Relationships Among Variables
Ordinary Least Squares
Multiple Linear Regression Analysis
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
Lecture 5 Correlation and Regression
Regression and Correlation Methods Judy Zhong Ph.D.
ECON 7710, Heteroskedasticity What is heteroskedasticity? What are the consequences? How is heteroskedasticity identified? How is heteroskedasticity.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Hypothesis Testing in Linear Regression Analysis
Understanding Multivariate Research Berry & Sanders.
4.2 One Sided Tests -Before we construct a rule for rejecting H 0, we need to pick an ALTERNATE HYPOTHESIS -an example of a ONE SIDED ALTERNATIVE would.
Chapter 10 Hetero- skedasticity Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Pure Serial Correlation
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
9-1 MGMG 522 : Session #9 Binary Regression (Ch. 13)
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
2.4 Units of Measurement and Functional Form -Two important econometric issues are: 1) Changing measurement -When does scaling variables have an effect.
Chapter 5 Demand Estimation Managerial Economics: Economic Tools for Today’s Decision Makers, 4/e By Paul Keat and Philip Young.
3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of B j hat depends on three factors: σ 2, SST j and R j 2.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Principles of Econometrics, 4t h EditionPage 1 Chapter 8: Heteroskedasticity Chapter 8 Heteroskedasticity Walter R. Paczkowski Rutgers University.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
8-1 MGMG 522 : Session #8 Heteroskedasticity (Ch. 10)
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
5-1 MGMG 522 : Session #5 Multicollinearity (Ch. 8)
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Chapter 4 Basic Estimation Techniques
Correlation and Simple Linear Regression
Basic Estimation Techniques
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY
Fundamentals of regression analysis
Correlation and Simple Linear Regression
Basic Estimation Techniques
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Correlation and Simple Linear Regression
Undergraduated Econometrics
Simple Linear Regression and Correlation
Chapter 7: The Normality Assumption and Inference with OLS
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Multicollinearity Multicollinearity - violation of the assumption that no independent variable is a perfect linear function of one or more other independent variables. β1 is the impact X1 has on Y holding all other factors constant. If X1 is related to X2 then β1 will also capture the impact of changes in X2. In other words, interpretation of the parameters becomes difficult.

Perfect vs. Imperfect With perfect multicollinearity one cannot technically estimate the parameters. Dominant variable - a variable that is so highly correlated with the dependent variable that it completely masks the effects of all other independent variables in the equation. Example: Wins = f(PTS, DPTS) Any other explanatory variable will tend to be insignificant. Imperfect Multicollinearity - a statistical relationship exists between two or more independent variables that significantly affects the estimation of the model.

Consequences of Multicollinearity Estimates will remain unbiased: Estimates will still be centered around the true values. The variances of the estimates will increase. In essence we are asking the model to tell us something we know very little about (i.e. what is the impact of changing X1 on Y holding everything else constant). Because we are not holding all else constant, the error associated with our estimate increases. Hence it is possible for us to observe coefficients of the opposite sign than expected due to multicollinearity. The computed t-stat will fall Variance and standard error are increased. WHY?

More Consequences Estimates will become very sensitive to changes in specification Overall fit of the equation will be generally unaffected. If the multicollinearity occurs in the population as well as the sample, then the predictive power of the model is unaffected. Note: It is possible that multicollinearity is a result of the sample, so the above may not always be true. The severity of the multicollinearity worsens its consequences.

Detection of Multicollinearity High R2 with all low t-scores If this is the case, you have multicollinearity. If this is not the case, you may or may not have multicollinearity. If all the t-scores are significant and in the expected direction than we can conclude that multicollinearity is not likely to be a problem. High Simple Correlation Coefficients A high r between two variables (.80 is the rule of thumb) indicates the potential for multicollinearity. In a model with more than two independent variables, though, this test will not tell us if a relationship exists between a collection of independent variables.

Variance Inflation Factor Run an OLS regression that has Xi as a function of all the other explanatory variables in the equation. Calculate the VIF VIF(βi) = 1 / (1-R2) Analyze the degree of multicollinearity by evaluating the size of VIF. There is no table of critical VIF values. The rule of thumb is if VIF > 5 then multicollinearity is an issue. Other authors suggest a rule of thumb of VIF > 10. What is the R2 necessary to reach the rules of thumb?

Remedies Do nothing Drop one or more of the multicollinear variables. If you are only interested in prediction, multicollinearity is not an issue. t-stats may be deflated, but still significant, hence multicollinearity is not significant. The cure is often worse than the disease. Drop one or more of the multicollinear variables. This solution can introduce specification bias. WHY? In an effort to avoid specification bias a researcher can introduce multicollinearity, hence it would be appropriate to drop a variable.

More Remedies Transform the multicollinear variables. Form a linear combination of the multicollinear variables. Transform the equation into first differences or logs. Increase the sample size. The issue of micronumerosity. Micronumerosity is the problem of (n) not exceeding (k). The symptoms are similar to the issue of multicollinearity (lack of variation in the independent variables). A solution to each problem is to increase the sample size. This will solve the problem of micronumerosity but not necessarily the problem of multicollinearity.

Heteroskedasticity Heteroskedasticity - violation of the classic assumption that the observations of the error term are drawn from distributions that have a constant variance.

Why does this occur? Substantial differences in the dependent variables across units of observation in the cross-sectional data set. People learn over time. Improvements in data collection over time The presence of outliers (an observation that is significantly different than all other observations in the sample).

Pure Heteroskedasticiy Classic assumption: Homeoskedasticity or var(ei) = σ2 = a constant If this assumption is violated then var(ei) = σi2 What is the difference? The variance is not a constant but varies across the sample. The precise form of heteroskedasticity can take on a variety of forms. Our discussion will focus primarily on one form. Proportionality factor (Z) - variance of the error term changes proportionally to some factor (Z). Therefore var(ei) = σ2Zi

Consequences Pure heteroskedasticity does not cause bias in the coefficient estimates The variance of the coefficient estimates is now a function of the proportionality factor. Hence the variance of the estimates of β increases. These estimates are still unbiased, since over-estimation and under-estimation are still as likely. Heteroskedasticity causes OLS to underestimate the variances (and standard errors) of the coefficients. This is true as long as increases in the independent variable is related to increases in the variance of the error term. This positive relationship will cause the standard error of the coefficient to be biased negatively. In most economic applications, this is the nature of the heteroskedasticity problem. Hence the t-stats and F-stats can not be relied upon for statistical inference.

Testing for Heteroskedasticity: First Questions Are there any specification errors? Is research in this area prone to the problem of heteroskedasticity? Cross-sectional studies tend to have this problem. Does a graph of the residuals show evidence of heteroskedasticity?

Park Test Estimate the model and save the residuals. Log the squared residuals and regress these upon the log of the proportionality factor (Z). Use the t-test to test the significance of Z. PROBLEM: Picking Z

White Test Estimate the model and save the residuals. Square the residuals (don’t log) and regress these on each X, the square of each X, and product of each X times every other X. Use the chi-square test to test the overall significance of the equation. The test stat is NR2, where N is the sample size and R2 is the unadjusted R2. Degrees of freedom equals the number of slope coefficients in equation. If the test stat is larger than the critical value, reject the null hypothesis and conclude that you probably have heteroskedasticity. I GOT MORE TESTS!!!

Remedies for Heteroskedasticity Use weighted least squares. Dividing the dependent and independent variables by Z will remove the problem of heteroskedasticity. This is a form of generalized least squares. Problem: How do we identify Z? How do we identify the form of Z? White’s heteroskedasticity-consistent variances and standard errors The math is beyond the scope of the class, but statitical packages (like E-Views) do allow you to estimate the standard errors so that asymptotically valid statistical inferences can be made about the true parameter values.

More Remedies Estimate a linear model with a double-logged model. Such a model may be theoretically appealing since one is estimating the elasticities. Logging the variables compresses the scales in which the variables are measured hence reducing the problem of heteroskedasticity. In the end, you may be unable to resolve the problem of heteroskedasticity. In this case, report White’s heteroskedasticity- consistent variances and standard errors and note the substantial efforts you made to reach this conclusion

Serial Correlation Which classic assumption are we talking about? The correlation between any two observations of the error term is zero. Issue: Is et related to et-1? Such would be the case in a time-series when a random shock has an impact over a number of time periods. Why is this important? Y = a + bX + et With serial correlation, et = f(et-1) hence Y = f(Xt, et, et-1) Therefore, serial correlation also impacts the accuracy of our estimates of the parameters.

Pure Serial Correlation Pure serial correlation - a violation of the classical assumption that assumes uncorrelated observations of the error term. In other words, we assume cov(ei, ej) = 0 If the covariance of two error terms is not equal to zero, then serial correlation exists.

First Order Serial Correlation The most common form of serial correlation is first order serial correlation. et = ρet-1 + ut where e = error term of the equation in question ρ = (rho) parameter depicting the functional relationship between observations of the error term. u = classical (non-serially correlated) error term

Strength of Serial Correlation Magnitude of ρ indicates the strength of the serial correlation. If ρ = 0 no serial correlation as ρ approaches 1 in absolute value, the more the error terms are serially correlated. If ρ > 1 then the error term continually increases over time, or explodes. Such a result is unreasonable. So we expect -1 < ρ < 1 Values greater than zero indicate positive serial correlation values less than zero indicate negative serial correlation. Negative serial correlation indicates that the error term switches signs from observation to observation. In the data utilized most commonly in economics, negative serial correlation is not expected.

Consequences REMEDIES??? Pure serial correlation does not cause bias in the coefficient estimates Serial correlation increases the variance of the β distributions Pure serial correlation causes the dependent variable to fluctuate in a fashion that the estimation procedure (OLS) attributes to the independent variables. Hence the variance of the estimates of β increases. These estimates are still unbiased, since over-estimation and under-estimation are still as likely Serial correlation causes OLS to underestimate the variances (and standard errors) of the coefficients. Intuitively - Serial correlation increases the fit of the model. Hence the estimation of the variance and standard errors is lower. This can lead the researcher to conclude a relationship exists when in fact the variables in question are unrelated. Hence the t-stats and F-stats can not be relied upon for statistical inference. Spurious Regressions. REMEDIES???