Returning to Consumption

Slides:



Advertisements
Similar presentations
Heteroskedasticity Hill et al Chapter 11. Predicting food expenditure Are we likely to be better at predicting food expenditure at: –low incomes; –high.
Advertisements

Applied Econometrics Second edition
Homoscedasticity equal error variance. One of the assumption of OLS regression is that error terms have a constant variance across all value so f independent.
Heteroskedasticity The Problem:
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
1 Nonlinear Regression Functions (SW Chapter 8). 2 The TestScore – STR relation looks linear (maybe)…
Some Topics In Multivariate Regression. Some Topics We need to address some small topics that are often come up in multivariate regression. I will illustrate.
EC220 - Introduction to econometrics (chapter 7)
8. Heteroskedasticity We have already seen that homoskedasticity exists when the error term’s variance, conditional on all x variables, is constant: Homoskedasticity.
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Review.
Interpreting Bi-variate OLS Regression
Economics Prof. Buckles
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
Back to House Prices… Our failure to reject the null hypothesis implies that the housing stock has no effect on prices – Note the phrase “cannot reject”
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the effects of changing the reference category Original citation: Dougherty,
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
Confidence intervals were treated at length in the Review chapter and their application to regression analysis presents no problems. We will not repeat.
ECON 7710, Heteroskedasticity What is heteroskedasticity? What are the consequences? How is heteroskedasticity identified? How is heteroskedasticity.
Inference issues in OLS
Hypothesis Testing. Distribution of Estimator To see the impact of the sample on estimates, try different samples Plot histogram of answers –Is it “normal”
Serial Correlation and the Housing price function Aka “Autocorrelation”
What does it mean? The variance of the error term is not constant
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
12.1 Heteroskedasticity: Remedies Normality Assumption.
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 3: Basic techniques for innovation data analysis. Part II: Introducing regression.
Simple regression model: Y =  1 +  2 X + u 1 We have seen that the regression coefficients b 1 and b 2 are random variables. They provide point estimates.
1Spring 02 Problems in Regression Analysis Heteroscedasticity Violation of the constancy of the variance of the errors. Cross-sectional data Serial Correlation.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
Heteroskedasticity Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland.
Heteroskedasticity ECON 4550 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
1 Javier Aparicio División de Estudios Políticos, CIDE Primavera Regresión.
EC 532 Advanced Econometrics Lecture 1 : Heteroscedasticity Prof. Burak Saltoglu.
Principles of Econometrics, 4t h EditionPage 1 Chapter 8: Heteroskedasticity Chapter 8 Heteroskedasticity Walter R. Paczkowski Rutgers University.
Lecture 5. Linear Models for Correlated Data: Inference.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
Chap 8 Heteroskedasticity
1 Heteroskedasticity. 2 The Nature of Heteroskedasticity  Heteroskedasticity is a systematic pattern in the errors where the variances of the errors.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
1 In the Monte Carlo experiment in the previous sequence we used the rate of unemployment, U, as an instrument for w in the price inflation equation. SIMULTANEOUS.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
WHITE TEST FOR HETEROSCEDASTICITY 1 The White test for heteroscedasticity looks for evidence of an association between the variance of the disturbance.
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE In this sequence we will investigate the consequences of including an irrelevant variable.
11.1 Heteroskedasticity: Nature and Detection Aims and Learning Objectives By the end of this session students should be able to: Explain the nature.
Heteroskedasticity ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Heteroscedasticity Heteroscedasticity is present if the variance of the error term is not a constant. This is most commonly a problem when dealing with.
QM222 Class 9 Section A1 Coefficient statistics
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY
Autocorrelation.
Heteroskedasticity.
Autocorrelation.
Autocorrelation MS management.
Introduction to Econometrics, 5th edition
Presentation transcript:

Returning to Consumption

More on Consumption We return to the consumption problem to illustrate the issue of heteroscedasticity It turns out that OLS may NOT give us the best estimate of the MPC The reason is that one of the assumptions of the GM theorem is probably violated in the consumption model The data is probably not heteroscedastic Var(ui|xi) ≠ 2

Homoscedastic

Heteroscedastic

Characteristics of Heteroscedasticty Systematic pattern exists in variance of residuals:  Var(ui|xi) = i2 = 2.f(xi) i2 = f(1+2Z2+…+ kZk) variance has different values for different observations or groups of observations Intuition: if random bit come from roll of dice then homo is with same dice and hetero is with different dice Evident in cross-section data or time series

Consequences OLS is unbiased OLS is consistent OLS is no longer efficient Variance formula used previously is incorrect significance test, confidence intervals etc. cannot be used Aside: a corrected formula can be used Stata: regress y x, robust  We don’t bother with this because can do better with alternative estimator

Testing for Heteroskedasticity Plot of residuals Sort the residuals by explanatory variable and plot against that variable , look for pattern, do this for each explanatory variable Not a formal test but can give an idea of what's going on Can use it to reject idea of Het

An example of Het

Consumption Example

Goldfeld Quandt test Used for i2 = 2.f(xi) i.e. related to one variable only State Hypothesis Test H0: i2 = 2. H0: i2 ≠ 2. Note: the null is homoscedasticity Sort residuals by ascending order of xi Omit middle 20% observations: (n-c) observations remain Estimate the original model separately for two samples first (n-c)/2 obs (keep RSS1 ) last (n-c)/2 obs (keep RSS2) Compute: g = SSR2/SSR1 If g > Fc(df,df) => reject null hypothesis of homoscedasticity at a significance level Test can be carried out for each xi  

Intuition of GQ If het does exist then we can split sample into a low variance and high variance bit Run the regression separately for the two samples Calculate the ratio of variances of the residual (remember s2=RSS/df) If this ratio is 1 then they are equal and the data is homoscedastic So reject null of homoscedasticity if bigger than 1 How much bigger? Bigger than F critical value

Consumption Example Test it for nmwage State Hypothesis Test H0: i2 = 2 H0: i2 ≠ 2. Note: the null is homoscedasticity Sort residuals by ascending order of nmwagei Stata command: sort nmwage Omit middle 20% observations: (n-c) observations remain Two sample: 1..550 781..1330 Estimate the original model separately Compute: g = SSR2/SSR1 g= 92733079.9/ 91033129.2= 1.018674 If g > Fc(df,df) => reject null hypothesis of homoscedasticity at a significance level 5% sig level F(550,550)=1.15 So cannot reject the null at 5% significance level Test can be carried out for each xi  

sort nmwage . regress cons nmwage if _n<=550 Source | SS df MS Number of obs = 550 -------------+------------------------------ F( 1, 548) = 49.12 Model | 8312140.6 1 8312140.6 Prob > F = 0.0000 Residual | 92733079.9 548 169220.949 R-squared = 0.0823 -------------+------------------------------ Adj R-squared = 0.0806 Total | 101045220 549 184053.225 Root MSE = 411.36 ------------------------------------------------------------------------------ cons | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- nmwage | .7304001 .1042153 7.01 0.000 .5256897 .9351104 _cons | 67.60298 49.48125 1.37 0.172 -29.59315 164.7991 . regress cons nmwage if _n>780 -------------+------------------------------ F( 1, 548) = 118.87 Model | 19747100.1 1 19747100.1 Prob > F = 0.0000 Residual | 91033129.2 548 166118.849 R-squared = 0.1783 -------------+------------------------------ Adj R-squared = 0.1768 Total | 110780229 549 201785.481 Root MSE = 407.58 nmwage | .7270654 .0666855 10.90 0.000 .596075 .8580558 _cons | 94.26419 75.29303 1.25 0.211 -53.63408 242.1625

White’s Test More general test that allows for more than one varibles to influence the variance of the residuals Estimate model yi = 1 + 2 x2i + 3 x3i + ui Run auxiliary regression: sq’d residuals on squares and cross products of X variables: ei2 =1 +2 x2i +3 x3i + 4 x2i2+ 5 x3i2+ 6 x2i x3i + vi Null hypothesis is homoscedastic errors i.e. 2 = 3 = 4 = 5 = 6 = 0 i.e. ei2 = constant calculate nR2 ~ df2 test nR2 > df2 critical value reject null hypothesis Comment: why not an F-test

Consumption Example Test it for nmwage State Hypothesis Test H0: i2 = 2 H0: i2 ≠ 2. Note: the null is homoscedasticity Estimate the Model and generate residuals squared Regress residual squared on all of the variables that may cause heteroscdasticity Form the test statistic: NR2=0.266 Find critical value: chi-sq, df=2 alpha=0.05=5.99 We cannot reject the null at the 5% significance level  

predict u, residual gen u2=u^2 gen nmwage2=nmwage^2 regress u2 nmwage nmwage2 Source | SS df MS Number of obs = 1330 -------------+------------------------------ F( 2, 1327) = 0.11 Model | 1.1824e+10 2 5.9121e+09 Prob > F = 0.8921 Residual | 6.8688e+13 1327 5.1762e+10 R-squared = 0.0002 -------------+------------------------------ Adj R-squared = -0.0013 Total | 6.8700e+13 1329 5.1693e+10 Root MSE = 2.3e+05 ------------------------------------------------------------------------------ u2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- nmwage | -13.03642 58.98887 -0.22 0.825 -128.758 102.6852 nmwage2 | .0109503 .0325921 0.34 0.737 -.0529874 .0748881 _cons | 163843.1 24661.71 6.64 0.000 115462.9 212223.3

Efficient Estimation If we find heteroscsadstity we know that OLS will be inefficient Remember why this might be a problem (see over) Can we do better? Yes. There is an efficient estimator called Generalised Least Squares (GLS) Two steps Remove the heteroscedasticity from the data Do OLS on the transformed data

Prob of error is lower for efficient estimator at any sample size Same sample size, different estimator

The GLS Procedure Assume that i2 is known: Basic model: Yi = 1 + 2 Xi + ui , E(ui2) = i2 (not constant) Create new data with each observations weighted by the heteroscedastic standard deviation 𝑌 𝑖 ∗ = 𝑌 𝑖 𝜎 𝑖 𝑋 2𝑖 ∗ = 𝑋 2𝑖 𝜎 𝑖 𝑋 1𝑖 ∗ = 1 𝜎 𝑖

The GLS Procedure Then run the regression on the transformed data Yi∗ = 1∗ 𝑋 1𝑖 ∗ + 2 X2i∗ + 𝑢 𝑖 ∗ The slope estimates are the BLUE of the coefficients of the original model Note the intercept tem is slightly different (it has now become coefficient on a variable)

How it Works GLS eliminates heteroscedasticity To see this note that var(ui*) = E(ui*)2 = E(ui/i)2 = 1/i2.E(ui2) = (1/i2).i2 = 1 var of transformed error term is homoskedastic: it is constant NB This model does not have a constant now: it has two explanatory variables: 1/i and Xi/I Cannot apply GLS if the exact type of hetero is unknown. So do FGLS (Feasible GLS) and replace i with an estimate of I From White’s test

The Consumption Example Transform the data to eliminate the heteroscedasticty Use the estimate of from White’s test Stata command Predict white generate c=cons/sqrt(white) generate y=nmwage/sqrt(white) The GLS of the MPC is given by the regression on the transformed data

predict white (option xb assumed; fitted values). gen sigma=white^0. 5 predict white (option xb assumed; fitted values) . gen sigma=white^0.5 . . gen c=cons/sigma . gen y=nmwage/sigma . regress c y Source | SS df MS Number of obs = 1330 -------------+------------------------------ F( 1, 1328) = 583.66 Model | 584.542068 1 584.542068 Prob > F = 0.0000 Residual | 1330.00395 1328 1.001509 R-squared = 0.3053 -------------+------------------------------ Adj R-squared = 0.3048 Total | 1914.54602 1329 1.44059144 Root MSE = 1.0008 ------------------------------------------------------------------------------ c | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- y | .7552474 .0312614 24.16 0.000 .6939202 .8165746 _cons | .1572505 .0652215 2.41 0.016 .0293021 .285199

Conclusion The example didn’t appear to have heteroscedasticity. When het does exist the difference between GLS and OLS can be substantial Both are unbiased and consistent GLS is preferable because it is efficient so there is a lower probability of substantial error