Inference issues in OLS

Slides:



Advertisements
Similar presentations
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Advertisements

Dynamic Panel Data: Challenges and Estimation Amine Ouazad Ass. Prof. of Economics.
Multiple Regression Analysis
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Hypothesis Testing Steps in Hypothesis Testing:
Ch11 Curve Fitting Dr. Deshi Ye
Random effects estimation RANDOM EFFECTS REGRESSIONS When the observed variables of interest are constant for each individual, a fixed effects regression.
8. Heteroskedasticity We have already seen that homoskedasticity exists when the error term’s variance, conditional on all x variables, is constant: Homoskedasticity.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Multiple regression analysis
The Simple Linear Regression Model: Specification and Estimation
Economics Prof. Buckles1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Chapter 10 Simple Regression.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
Additional Topics in Regression Analysis
Generalized Regression Model Based on Greene’s Note 15 (Chapter 8)
12.3 Correcting for Serial Correlation w/ Strictly Exogenous Regressors The following autocorrelation correction requires all our regressors to be strictly.
Multiple Regression Analysis
The Simple Regression Model
Multiple Regression Analysis
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 3. Asymptotic Properties.
Economics Prof. Buckles
Introduction to Regression Analysis, Chapter 13,
Part 14: Generalized Regression 14-1/46 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Ordinary Least Squares
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
1 MADE WHAT IF SOME OLS ASSUMPTIONS ARE NOT FULFILED?
Hypothesis Testing in Linear Regression Analysis
Returning to Consumption
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Serial Correlation and the Housing price function Aka “Autocorrelation”
What does it mean? The variance of the error term is not constant
Chapter 10 Hetero- skedasticity Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 Copyright © 2007 Thomson Asia Pte. Ltd. All rights reserved. CH5 Multiple Regression Analysis: OLS Asymptotic 
1Spring 02 Problems in Regression Analysis Heteroscedasticity Violation of the constancy of the variance of the errors. Cross-sectional data Serial Correlation.
3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of B j hat depends on three factors: σ 2, SST j and R j 2.
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin The Two-Variable Model: Hypothesis Testing chapter seven.
1 Javier Aparicio División de Estudios Políticos, CIDE Primavera Regresión.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the.
Correlation & Regression Analysis
Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
11.1 Heteroskedasticity: Nature and Detection Aims and Learning Objectives By the end of this session students should be able to: Explain the nature.
Metrics Lab Econometric Problems Lab. Import the Macro data from Excel and use first row as variable names Time set the year variable by typing “tsset.
Econometrics III Evgeniya Anatolievna Kolomak, Professor.
Lecturer: Ing. Martina Hanová, PhD..  How do we evaluate a model?  How do we know if the model we are using is good?  assumptions relate to the (population)
6. Simple Regression and OLS Estimation
REGRESSION DIAGNOSTIC III: AUTOCORRELATION
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
Evgeniya Anatolievna Kolomak, Professor
Fundamentals of regression analysis
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
Serial Correlation and Heteroskedasticity in Time Series Regressions
Migration and the Labour Market
Serial Correlation and Heteroscedasticity in
Product moment correlation
Lecturer Dr. Veronika Alhanaqtah
3.2. SIMPLE LINEAR REGRESSION
Serial Correlation and Heteroscedasticity in
Presentation transcript:

Inference issues in OLS Amine Ouazad Ass. Prof. of Economics

Outline Heteroscedasticity Clustering Generalized Least Squares For heteroscedasticity For autocorrelation

Heteroscedasticity

Issue The issue arises whenever the residual’s variance depends on the observation, or depends on the value of the covariates.

Example #1

Example #2 Here Var(y|x) is clearly increasing in x. Notice the underestimation of the size of the confidence intervals.

Visual checks with multiple variables Use the vector of estimates b, and predict E(Y|X) using the predict xb, xb stata command. Draw the scatter plot of the dependent y and the prediction Xb on the horizontal axis.

Causes Unobservable that affects the variance of the residuals, but not the mean conditional on x. y=a+bx+e. with e=hz. The shock h satisfies E(h |x)=0, and E(z|x)=0 but the variance Var(z|x) depends on an unobservable z. E(e|x)=0 (exogeneity), but Var(e|x)=Var(hz|x) depends on x. (previous example #1). In practice, most regressions have heteroskedastic residuals.

Examples Variability of stock returns depends on the industry. Stock Returni,t = a + b Market Returnt + ei,t. Variability of unemployment depends on the state/country. Unemploymenti,t = a + b GDP Growtht + ei,t. Notice that both the inclusion of industry/state dummies and controlling for heteroskedasticity may be necessary.

Heteroscedasticity: the framework 𝑉𝑎𝑟 𝜀 𝑖 𝑥 𝑖 = 𝜎 𝑖 2 We set the ws so that their sum is equal to n, and they are all positive. The trace of the matrix W (see matrix appendix) is therefore equal to n.

Consequences The OLS estimator is still unbiased, consistent and asymptotically normal (only depends on A1-A3). But the OLS estimator is then inefficient (the proof of the Gauss-Markov theorem relies on homoscedasticity). And the confidence intervals calculated assuming homoscedasticity typically overestimate the power of the estimates/underestimate the size of the confidence intervals.

Variance-covariance matrix of the estimator Asymptotically At finite and fixed sample size xi is the i-th vector of covariates, a vector of size K. Notice that if the wi are all equal to 1, we are back to the homoscedastic case and we get Var(b|x) = s2(X’X)-1 We use the finite sample size formula to design an estimator of the variance-covariance matrix.

White Heteroscedasticity consistent estimator of the variance-covariance matrix The formula uses the estimated residuals ei of each observation, using the OLS estimator of the coefficients. This formula is consistent (plim Est. Asy. Var(b)=Var(b)), but may yield excessively large standard errors for small sample sizes. This is the formula used by the Stata robust option. From this, the square of the k-th diagonal element is the standard error of the k-th coefficient.

Test for heteroscedasticity Null hypothesis H0: si2 = s2 for all i=1,2,…,n. Alternative hypothesis Ha: at least one residual has a different variance. Steps: Estimate the OLS and predict the residuals ei. Regress the square of the residuals on a constant, the covariates, their squares and their cross products (P covariates). Under the null, all of the coefficients should be equal to 0, and NR2 of the regression is distributed as a c2 with P-1 degrees of freedom.

Suggests another visual check Examples #1 and #2 with one covariate. Example with two covariates.

Stata take aways Always use robust standard errors robust option available for most regressions. This is regardless of the use of covariates. Adding a covariate does not free you from the burden of heteroscedasticity. Test for heteroscedasticity: hettest reports the chi-squared statistic with P-1 degrees of freedom, and the p-value. A p-value lower than 0.05 rejects the null at 95%. The test may be used with small sample sizes, to avoid the use of robust standard errors.

Clustering

Clustering, example #1 Typical problem with clustering is the existence of a common unobservable component… Common to all observations in a country, a state, a year, etc. Take yit = xit + eit, a panel dataset where the residual eit=ui+hit. Exercise: Calculate the variance-covariance matrix of the residuals.

Clustering, example #2 Other occurrence of clustering is the use of data at a higher level of aggregation than the individual observation. Example: yij = xijb+zjg+eij. This practically implies (but not theoretically), that Cov(eij,ei’j) is nonzero. Example: regression performanceit = c + d policyj(i) + eit. regression stock returnit = constant + b Markett + eit.

Moulton paper

The clustering model Notice that the variance-covariance matrix can be designed this way by blocks. In this model, the estimator is unbiased and consistent, but inefficient and the estimated variance-covariance matrix is biased.

True variance-covariance matrix With all the covariates fixed within group, the variance covariance matrix of the estimator is: where m=n/p, the number of observations per group. This formula is not exact when there are individual-specific covariates, but the term (1+(m-1)r) can be used as an approximate correction factor.

Descriptive Statistics

Stata regress y x, cluster(unit) robust. Clustering and robust s.e. s should be used at the same time. This is the OLS estimator with corrected standard errors. If x includes unit-specific variables, we cannot add a unit (state/firm/industry) dummy as well.

Multi-way clustering Multi-way clustering: “Robust inference with multi-way clustering”, Cameron, Gelbach and Miller, Technical NBER Working Paper Number 327 (2006). Has become the new norm very recently. Example: clustering by year and state. yit = xitb + zig + wtd + eit What do you expect? ivreg2 , cluster(id year) . ssc install ivreg2.

Generalized least squares

OLS is BLUE only under A4 OLS is not BLUE if the variance-covariance matrix of the residuals is not diagonal. What should we do? Take general OLS model Y=Xb+e. And assume that Var(e)=W. Then take the square root of the matrix, W-1/2. This is a matrix that satisfies W=(W-1/2 )’W-1/2. This matrix exists for any positive definite matrix.

Sphericized model The sphericized model is: W-1/2 Y= W-1/2 Xb+ W-1/2e This model satisfies A4 since Var(e|X)=s2.

Generalized Least Squares The GLS estimator is: This estimator is BLUE. It is the efficient estimator of the parameter beta. This estimator is also consistent and asymptotically normal. Exercise: prove that the estimator is unbiased, and that the estimator is consistent.

Feasible Generalized Least Squares The matrix W in general is unknown. We estimate W using a procedure (see later) so that plim W = W. Then the FGLS estimator b=(X’W-1X)-1X’W-1Y is a consistent estimator of b. The typical problem is the estimation of W. There is no one size fits all estimation procedure.

GLS for heteroscedastic models Taking the formula of the GLS estimator, with a diagonal variance-covariance matrix. Where each weight is the inverse of wi. Or the inverse of si2. Scaling the weights has no impact. Stata application exercise: Calculate weights and tse the weighted OLS estimator regress y x [aweight=w] to calculate the heteroscedastic GLS estimator, on a dataset of your choice.

GLS for autocorrelation Autocorrelation is pervasive in finance. Assume that et=ret-1+ht, (we say that et is AR(1)) where ht is the innovation, uncorrelated with et-1. The problem is the estimation of r. Then a natural estimator of r is the coefficient of the regression of et on et-1. Exercise 1 (for adv. students): find the inverse of W. Exercise 2 (for adv. students): find W for an AR(2) process. Exercise 3 (for adv. students): what about MA(2) ? Variation: Panel specific AR(1) structure.

Autocorrelation example

GLS for clustered models Correlation r within each group. Exercise: write down the variance-covariance matrix W of the residuals. Put forward an estimator of r. What is the GLS estimator of b in Y=Xb+e with clustering? Estimation using xtgls, re.

Applications of GLS The Generalized Least Squares model is seldom used. In practice, the variance of the OLS estimator is corrected for heteroscedasticity or clustering. Take-away: use regress , cluster(.) robust Otherwise: xtgls, panels(hetero) xtgls, panels(correlated) xtgls, panels(hetero) corr(ar1) The GLS is mostly used for the estimation of random effects models. xtreg, re

Conclusion: no worries

Take away for this session Use regress, robust; always, unless the sample size is small. Use regress, robust cluster(unit) if: You believe there are common shocks at the unit level. You have included unit level covariates. Use ivreg2, cluster(unit1 unit2) for two way clustering. Use xtgls for the efficient FGLS estimator with correlated, AR(1) or heteroscedastic residuals. This might allow you to shrink the confidence intervals further, but beware that this is less standard than the previous methods.