REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY

Slides:

Advertisements

Similar presentations

Autocorrelation and Heteroskedasticity

Advertisements

Applied Econometrics Second edition

Multiple Regression Analysis

Multivariate Regression

CHAPTER 3: TWO VARIABLE REGRESSION MODEL: THE PROBLEM OF ESTIMATION

CHAPTER 8 MULTIPLE REGRESSION ANALYSIS: THE PROBLEM OF INFERENCE

Multicollinearity Multicollinearity - violation of the assumption that no independent variable is a perfect linear function of one or more other independent.

1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce

CHAPTER 3 ECONOMETRICS x x x x x Chapter 2: Estimating the parameters of a linear regression model. Y i = b 1 + b 2 X i + e i Using OLS Chapter 3: Testing.

Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.

CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +

Chapter 11 Multiple Regression.

Econ 140 Lecture 191 Heteroskedasticity Lecture 19.

Topic 3: Regression.

Ordinary Least Squares

Objectives of Multiple Regression

What does it mean? The variance of the error term is not constant

Lecture 17 Summary of previous Lecture Eviews. Today discussion  R-Square  Adjusted R- Square  Game of Maximizing Adjusted R- Square  Multiple regression.

Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.

12.1 Heteroskedasticity: Remedies Normality Assumption.

1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.

Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION

Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin The Two-Variable Model: Hypothesis Testing chapter seven.

Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.

EC 532 Advanced Econometrics Lecture 1 : Heteroscedasticity Prof. Burak Saltoglu.

Principles of Econometrics, 4t h EditionPage 1 Chapter 8: Heteroskedasticity Chapter 8 Heteroskedasticity Walter R. Paczkowski Rutgers University.

Chap 6 Further Inference in the Multiple Regression Model

1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.

I271B QUANTITATIVE METHODS Regression and Diagnostics.

8-1 MGMG 522 : Session #8 Heteroskedasticity (Ch. 10)

MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED?

Heteroscedasticity Chapter 8

Ch5 Relaxing the Assumptions of the Classical Model

Chapter 4 Basic Estimation Techniques

REGRESSION DIAGNOSTIC III: AUTOCORRELATION

Multiple Regression Analysis: Estimation

Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3

REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY

THE LINEAR REGRESSION MODEL: AN OVERVIEW

REGRESSION DIAGNOSTIC IV: MODEL SPECIFICATION ERRORS

Chapter 5: The Simple Regression Model

Multivariate Regression

REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY

Fundamentals of regression analysis

Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation

Multiple Regression Analysis

Fundamentals of regression analysis 2

STOCHASTIC REGRESSORS AND THE METHOD OF INSTRUMENTAL VARIABLES

...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001

HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT?

I271B Quantitative Methods

Serial Correlation and Heteroskedasticity in Time Series Regressions

Two-Variable Regression Model: The Problem of Estimation

Chapter 6: MULTIPLE REGRESSION ANALYSIS

MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED

Interval Estimation and Hypothesis Testing

Serial Correlation and Heteroscedasticity in

HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT?

MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED

Simple Linear Regression

Multicollinearity Susanta Nag Assistant Professor Department of Economics Central University of Jammu.

Heteroskedasticity.

Linear Regression Summer School IFPRI

Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.

Heteroskedasticity.

Financial Econometrics Fin. 505

Financial Econometrics Fin. 505

Lecturer Dr. Veronika Alhanaqtah

Serial Correlation and Heteroscedasticity in

Presentation transcript:

REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY Lecturing 05 REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY

RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL Assumption 1. The regression model is linear in the parameters. Assumption 2. The values of the regressors, the X ’s, are fixed in repeated sampling. Assumption 3. For given X ’s, the mean value of the disturbance ui is zero. Assumption 4. For given X ’s, the variance of ui is constant or homoscedastic.

RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL Assumption 5. For given X ’s, there is no autocorrelation in the disturbances. Assumption 6. If the X ’s are stochastic, the disturbance term and the (stochastic) X ’s are independent or at least uncorrelated. Assumption 7. The number of observations must be greater than the number of regressors. Assumption 8. There must be sufficient variability in the values taken by the regressors.

RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL Assumption 9. The regression model is correctly specified. Assumption 10. There is no exact linear relationship (i.e., multicollinearity) in the regressors. Assumption 11. The stochastic (disturbance) term ui is normally distributed.

MULTICOLLINEARITY One of the assumptions of the classical linear regression (CLRM) is that there is no exact linear relationship among the regressors. If there are one or more such relationships among the regressors, we call it multicollinearity, or collinearity for short. Perfect collinearity: A perfect linear relationship between the two variables exists. Imperfect collinearity: The regressors are highly (but not perfectly) collinear.

MULTICOLLINEARITY Perfect collinearity

MULTICOLLINEARITY Imperfect collinearity

MULTICOLLINEARITY

MULTICOLLINEARITY

MULTICOLLINEARITY There are several sources of multicollinearity 1. The data collection method employed 2. Constraints on the model or in the population being sampled. 3. Model specification, for example, adding polynomial terms to a regression model, especially when the range of the X variable is small 4. An overdetermined model

CONSEQUENCES If collinearity is not perfect, but high, several consequences ensue: The OLS estimators are still BLUE, but one or more regression coefficients have large standard errors relative to the values of the coefficients, thereby making the t ratios small. Even though some regression coefficients are statistically insignificant, the R2 value may be very high. Therefore, one may conclude (misleadingly) that the true values of these coefficients are not different from zero. Also, the regression coefficients may be very sensitive to small changes in the data, especially if the sample is relatively small. Damodar Gujarati Econometrics by Example

CONSEQUENCES If collinearity is not perfect, but high, several consequences ensue: The OLS estimators are still BLUE, but one or more regression coefficients have large standard errors relative to the values of the coefficients, thereby making the t ratios small. Even though some regression coefficients are statistically insignificant, the R2 value may be very high. Therefore, one may conclude (misleadingly) that the true values of these coefficients are not different from zero. Also, the regression coefficients may be very sensitive to small changes in the data, especially if the sample is relatively small. Damodar Gujarati Econometrics by Example

The Gauss—Markov Theorem and the Properties of OLS Estimators

The Gauss—Markov Theorem and the Properties of OLS Estimators OLS is BLUE, where BLUE stand for “ Best (meaning minimum variance), Linear (they are linear function of the dependent variable Y), Unbiased (in repeated applications of the method, on average, the estimators approach their true values. In the class of linear unbiased estimators, OLS estimator have minimum variance. As a result, the true parameter values can be estimated with least possible uncertainty; an unbiased estimator with the least variance is called an efficient estimator Damodar Gujarati Econometrics by Example

Assume that X3i = λX2i , where λ is a nonzero constant. CONSEQUENCES Assume that X3i = λX2i , where λ is a nonzero constant.

CONSEQUENCES

Recalling OLS Estimator

Recalling OLS Estimator Equivalently

VARIANCE INFLATION FACTOR For the following regression model: It can be shown that: and where σ2 is the variance of the error term ui, and r23 is the coefficient of correlation between X2 and X3. Damodar Gujarati Econometrics by Example

VARIANCE INFLATION FACTOR

VARIANCE INFLATION FACTOR (CONT.) is the variance-inflating factor. VIF is a measure of the degree to which the variance of the OLS estimator is inflated because of collinearity.

An Example

DETECTION OF MULTICOLLINEARITY 1. High R2 but few significant t ratios 2. High pair-wise correlations among explanatory variables or regressors 3. High partial correlation coefficients 4. Significant F test for auxiliary regressions (regressions of each regressor on the remaining regressors) 5. High Variance Inflation Factor (VIF) and low Tolerance Factor (TOL, the inverse of VIF) Damodar Gujarati Econometrics by Example

DETECTION OF MULTICOLLINEARITY

REMEDIAL MEASURES What should we do if we detect multicollinearity? Nothing, for we often have no control over the data. Redefine the model by excluding variables may attenuate the problem, provided we do not omit relevant variables. Principal components analysis: Construct artificial variables from the regressors such that they are orthogonal to one another. These principal components become the regressors in the model. Yet the interpretation of the coefficients on the principal components is not as straightforward. Damodar Gujarati Econometrics by Example

REMEDIAL MEASURES

REMEDIAL MEASURES

REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY Lecturing 04 (Lanjutan) REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY Damodar Gujarati Econometrics by Example

Heteroskedasticity We seek answers to the following questions: 1. What is the nature of heteroscedasticity? 2. What are its consequences? 3. How does one detect it? 4. What are the remedial measures?

THE NATURE OF HETEROSCEDASTICITY One of the important assumptions of the classical linear regression model is that the variance of each disturbance term ui, conditional on the chosen values of the explanatory variables, is some constant number equal to σ2. This is the assumption of homoscedasticity, or equal (homo) spread (scedasticity), that is, equal variance. Symbolically, Eu2i = σ2 i = 1, 2, . . . , n (11.1.1) Look at Figure 11.1. In contrast, consider Figure 11.2, the variances of Yi are not the same. Hence, there is heteroscedasticity. Symbolically, Eu2i = σ2i (11.1.2) Notice the subscript of σ2, which reminds us that the conditional variances of ui (= conditional variances of Yi) are no longer constant.

Heteroskedasticity

Heteroskedasticity

HETEROSCEDASTICITY One of the assumptions of the classical linear regression (CLRM) is that the variance of ui, the error term, is constant, or homoscedastic. Reasons are many, including: Following the error models As income grow, people have more discretionary income As data collection techniques improve, variance is likely to decrease The presence of outliers in the data Damodar Gujarati Econometrics by Example

Outlier

HETEROSCEDASTICITY Reasons are many, including: Incorrect functional form of the regression model Incorrect transformation of data Another source of heteroskedasticity is skewnes Mixing observations with different measures of scale (such as mixing high-income households with low-income households) Damodar Gujarati Econometrics by Example

CONSEQUENCES If heteroscedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are less efficient, making statistical inference less reliable (i.e., the estimated t values may not be reliable). Thus, estimators are not best linear unbiased estimators (BLUE); they are simply linear unbiased estimators (LUE). In the presence of heteroscedasticity, the BLUE estimators are provided by the method of weighted least squares (WLS). Damodar Gujarati Econometrics by Example

Heteroskedasticity Unfortunately, the usual OLS method does not follow this strategy, but a method of estimation, known as generalized least squares (GLS), takes such information into account explicitly and is therefore capable of producing estimators that are BLUE. To see how this is accomplished, let us continue with the now-familiar two-variable model: Yi = β1 + β2Xi + ui (11.3.1) which for ease of algebraic manipulation we write as Yi = β1X0i + β2Xi + ui (11.3.2) where X0i = 1 for each i. Now assume that the heteroscedastic variances σ2i are known. Divide through by σi to obtain: which for ease of exposition we write as Y*i = β*1X*0i + β*2X*i + u*i

DETECTION OF HETEROSCEDASTICITY Graph histogram of squared residuals Graph squared residuals against predicted Y Breusch-Pagan (BP) Test White’s Test of Heteroscedasticity Other tests such as Park, Glejser, Spearman’s rank correlation, and Goldfeld-Quandt tests of heteroscedasticity Damodar Gujarati Econometrics by Example

BREUSCH-PAGAN (BP) TEST Estimate the OLS regression, and obtain the squared OLS residuals from this regression. Regress the square residuals on the k regressors included in the model. You can choose other regressors also that might have some bearing on the error variance. The null hypothesis here is that the error variance is homoscedastic – that is, all the slope coefficients are simultaneously equal to zero. Use the F statistic from this regression with (k-1) and (n-k) in the numerator and denominator df, respectively, to test this hypothesis. If the computed F statistic is statistically significant, we can reject the hypothesis of homoscedasticity. If it is not, we may not reject the null hypothesis. Damodar Gujarati Econometrics by Example

WHITE’S TEST OF HETEROSCEDASTICITY Regress the squared residuals on the regressors, the squared terms of these regressors, and the pair-wise cross-product term of each regressor. Obtain the R2 value from this regression and multiply it by the number of observations. Under the null hypothesis that there is homoscedasticity, this product follows the Chi-square distribution with df equal to the number of coefficients estimated. The White test is more general and more flexible than the BP test. Damodar Gujarati Econometrics by Example

REMEDIAL MEASURES What should we do if we detect heteroscedasticity? Use method of Weighted Least Squares (WLS) Divide each observation by the (heteroscedastic) σi and estimate the transformed model by OLS (yet true variance is rarely known) If the true error variance is proportional to the square of one of the regressors, we can divide both sides of the equation by that variable and run the transformed regression Take natural log of dependent variable Use White’s heteroscedasticity-consistent standard errors or robust standard errors Valid in large samples Damodar Gujarati Econometrics by Example