Econometrics I Professor William Greene Stern School of Business Department of Economics
Econometrics I Part 6 – Estimating the Variance of b
Econometric Dire Emergency
Context The true variance of b|X is 2(XX)-1 . We consider how to use the sample data to estimate this matrix. The ultimate objectives are to form interval estimates for regression slopes and to test hypotheses about them. Both require estimates of the variability of the distribution.
Estimating 2 Using the residuals instead of the disturbances: The natural estimator: ee/N as a sample surrogate for /n Imperfect observation of i, ei = i - ( - b)xi Downward bias of ee/N. We obtain the result E[ee|X] = (N-K)2
Expectation of ee
Method 1:
Estimating σ2 The unbiased estimator is s2 = ee/(N-K). “Degrees of freedom correction” Therefore, the unbiased estimator of 2 is s2 = ee/(N-K)
Method 2: Some Matrix Algebra
Decomposing M
Example: Characteristic Roots of a Correlation Matrix
Gasoline Data
X’X and its Roots
Var[b|X] Estimating the Covariance Matrix for b|X The true covariance matrix is 2 (X’X)-1 The natural estimator is s2(X’X)-1 “Standard errors” of the individual coefficients are the square roots of the diagonal elements.
X’X (X’X)-1 s2(X’X)-1
Standard Regression Results ---------------------------------------------------------------------- Ordinary least squares regression ........ LHS=G Mean = 226.09444 Standard deviation = 50.59182 Number of observs. = 36 Model size Parameters = 7 Degrees of freedom = 29 Residuals Sum of squares = 778.70227 Standard error of e = 5.18187 <= sqr[778.70227/(36 – 7)] Fit R-squared = .99131 Adjusted R-squared = .98951 --------+------------------------------------------------------------- Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X Constant| -7.73975 49.95915 -.155 .8780 PG| -15.3008*** 2.42171 -6.318 .0000 2.31661 Y| .02365*** .00779 3.037 .0050 9232.86 TREND| 4.14359** 1.91513 2.164 .0389 17.5000 PNC| 15.4387 15.21899 1.014 .3188 1.67078 PUC| -5.63438 5.02666 -1.121 .2715 2.34364 PPT| -12.4378** 5.20697 -2.389 .0236 2.74486
The Variance of OLS - Sandwiches
Robust Covariance Estimation Not a structural estimator of XX/n If the condition is present, the estimator estimates the true variance of the OLS estimator If the condition is not present, the estimator estimates the same matrix that (2/n)(X’X/n)-1 estimates . Heteroscedasticity Autocorrelation Common effects
Heteroscedasticity Robust Covariance Matrix Robust estimation: Generality How to estimate Var[b|X] = 2 (X’X)-1 XX (X’X)-1 for the LS b? The distinction between estimating 2 an n by n matrix and estimating the KxK matrix 2 XX = 2 ijij xi xj NOTE…… VVVIRs for modern applied econometrics. The White estimator Newey-West.
The White Estimator
Groupwise Heteroscedasticity Countries are ordered by the standard deviation of their 19 residuals. Regression of log of per capita gasoline use on log of per capita income, gasoline price and number of cars per capita for 18 OECD countries for 19 years. The standard deviation varies by country. The “solution” is “weighted least squares.”
White Estimator +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X| Constant| 2.39132562 .11693429 20.450 .0000 LINCOMEP| .88996166 .03580581 24.855 .0000 -6.13942544 LRPMG | -.89179791 .03031474 -29.418 .0000 -.52310321 LCARPCAP| -.76337275 .01860830 -41.023 .0000 -9.04180473 | White heteroscedasticity robust covariance matrix | +----------------------------------------------------+ Constant| 2.39132562 .11794828 20.274 .0000 LINCOMEP| .88996166 .04429158 20.093 .0000 -6.13942544 LRPMG | -.89179791 .03890922 -22.920 .0000 -.52310321 LCARPCAP| -.76337275 .02152888 -35.458 .0000 -9.04180473
Autocorrelated Residuals logG=β1 + β2logPg + β3logY + β4logPnc + β5logPuc + ε
The Newey-West Estimator Robust to Autocorrelation
Newey-West Estimate --------+------------------------------------------------------------- Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X Constant| -21.2111*** .75322 -28.160 .0000 LP| -.02121 .04377 -.485 .6303 3.72930 LY| 1.09587*** .07771 14.102 .0000 9.67215 LPNC| -.37361** .15707 -2.379 .0215 4.38037 LPUC| .02003 .10330 .194 .8471 4.10545 Robust VC Newey-West, Periods = 10 Constant| -21.2111*** 1.33095 -15.937 .0000 LP| -.02121 .06119 -.347 .7305 3.72930 LY| 1.09587*** .14234 7.699 .0000 9.67215 LPNC| -.37361** .16615 -2.249 .0293 4.38037 LPUC| .02003 .14176 .141 .8882 4.10545
Panel Data Presence of omitted effects Potential bias/inconsistency of OLS – depends on the assumptions about unobserved c. Variance of OLS is affected by autocorrelation in most cases.
Estimating the Sampling Variance of b s2(X ́X)-1? Inappropriate because Correlation across observations (certainly) Heteroscedasticity (possibly) A ‘robust’ covariance matrix Robust estimation (in general) The White estimator A Robust estimator for OLS.
Cluster Robust Estimator
Alternative OLS Variance Estimators Cluster correction increases SEs +---------+--------------+----------------+--------+---------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Constant 5.40159723 .04838934 111.628 .0000 EXP .04084968 .00218534 18.693 .0000 EXPSQ -.00068788 .480428D-04 -14.318 .0000 OCC -.13830480 .01480107 -9.344 .0000 SMSA .14856267 .01206772 12.311 .0000 MS .06798358 .02074599 3.277 .0010 FEM -.40020215 .02526118 -15.843 .0000 UNION .09409925 .01253203 7.509 .0000 ED .05812166 .00260039 22.351 .0000 Robust Constant 5.40159723 .10156038 53.186 .0000 EXP .04084968 .00432272 9.450 .0000 EXPSQ -.00068788 .983981D-04 -6.991 .0000 OCC -.13830480 .02772631 -4.988 .0000 SMSA .14856267 .02423668 6.130 .0000 MS .06798358 .04382220 1.551 .1208 FEM -.40020215 .04961926 -8.065 .0000 UNION .09409925 .02422669 3.884 .0001 ED .05812166 .00555697 10.459 .0000
Bootstrapping Some assumptions that underlie it - the sampling mechanism Method: 1. Estimate using full sample: --> b 2. Repeat R times: Draw N observations from the n, with replacement Estimate with b(r). 3. Estimate variance with V = (1/R)r [b(r) - b][b(r) - b]’
Bootstrap Application matr;bboot=init(3,21,0.)$ Store results here name;x=one,y,pg$ Define X regr;lhs=g;rhs=x$ Compute b calc;i=0$ Counter Proc Define procedure regr;lhs=g;rhs=x;quietly$ … Regression matr;{i=i+1};bboot(*,i)=b$... Store b(r) Endproc Ends procedure exec;n=20;bootstrap=b$ 20 bootstrap reps matr;list;bboot' $ Display results
Results of Bootstrap Procedure --------+------------------------------------------------------------- Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X Constant| -79.7535*** 8.67255 -9.196 .0000 Y| .03692*** .00132 28.022 .0000 9232.86 PG| -15.1224*** 1.88034 -8.042 .0000 2.31661 Completed 20 bootstrap iterations. ---------------------------------------------------------------------- Results of bootstrap estimation of model. Model has been reestimated 20 times. Means shown below are the means of the bootstrap estimates. Coefficients shown below are the original estimates based on the full sample. bootstrap samples have 36 observations. Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X B001| -79.7535*** 8.35512 -9.545 .0000 -79.5329 B002| .03692*** .00133 27.773 .0000 .03682 B003| -15.1224*** 2.03503 -7.431 .0000 -14.7654
Bootstrap Replications Full sample result Bootstrapped sample results
Results of C&R Bootstrap Estimation
Bootstrap variance for a panel data estimator Panel Bootstrap = Block Bootstrap Data set is N groups of size Ti Bootstrap sample is N groups of size Ti drawn with replacement.
Quantile Regression: Application of Bootstrap Estimation
OLS vs. Least Absolute Deviations ---------------------------------------------------------------------- Least absolute deviations estimator............... Residuals Sum of squares = 1537.58603 Standard error of e = 6.82594 Fit R-squared = .98284 --------+------------------------------------------------------------- Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X |Covariance matrix based on 50 replications. Constant| -84.0258*** 16.08614 -5.223 .0000 Y| .03784*** .00271 13.952 .0000 9232.86 PG| -17.0990*** 4.37160 -3.911 .0001 2.31661 Ordinary least squares regression ............ Residuals Sum of squares = 1472.79834 Standard error of e = 6.68059 Standard errors are based on Fit R-squared = .98356 50 bootstrap replications Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X Constant| -79.7535*** 8.67255 -9.196 .0000 Y| .03692*** .00132 28.022 .0000 9232.86 PG| -15.1224*** 1.88034 -8.042 .0000 2.31661
Quantile Regression Q(y|x,) = x, = quantile Estimated by linear programming Q(y|x,.50) = x, .50 median regression Median regression estimated by LAD (estimates same parameters as mean regression if symmetric conditional distribution) Why use quantile (median) regression? Semiparametric Robust to some extensions (heteroscedasticity?) Complete characterization of conditional distribution
Estimated Variance for Quantile Regression Asymptotic Theory Bootstrap – an ideal application
= .25 = .50 = .75