1 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS This sequence presents two methods for dealing with the problem of heteroscedasticity. We will.

Slides:



Advertisements
Similar presentations
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: slope dummy variables Original citation: Dougherty, C. (2012) EC220 -
Advertisements

Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: a Monte Carlo experiment Original citation: Dougherty, C. (2012) EC220.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: introduction to maximum likelihood estimation Original citation: Dougherty,
1 THE DISTURBANCE TERM IN LOGARITHMIC MODELS Thus far, nothing has been said about the disturbance term in nonlinear regression models.
EC220 - Introduction to econometrics (chapter 7)
1 XX X1X1 XX X Random variable X with unknown population mean  X function of X probability density Sample of n observations X 1, X 2,..., X n : potential.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: testing a hypothesis relating to a regression coefficient Original citation:
1 PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE red This sequence provides an example of a discrete random variable. Suppose that you.
Random effects estimation RANDOM EFFECTS REGRESSIONS When the observed variables of interest are constant for each individual, a fixed effects regression.
MEASUREMENT ERROR 1 In this sequence we will investigate the consequences of measurement errors in the variables in a regression model. To keep the analysis.
EC220 - Introduction to econometrics (chapter 2)
EC220 - Introduction to econometrics (chapter 9)
00  sd  0 –sd  0 –1.96sd  0 +sd 2.5% CONFIDENCE INTERVALS probability density function of X null hypothesis H 0 :  =  0 In the sequence.
EXPECTED VALUE OF A RANDOM VARIABLE 1 The expected value of a random variable, also known as its population mean, is the weighted average of its possible.
TESTING A HYPOTHESIS RELATING TO THE POPULATION MEAN 1 This sequence describes the testing of a hypothesis at the 5% and 1% significance levels. It also.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: confidence intervals Original citation: Dougherty, C. (2012) EC220.
EC220 - Introduction to econometrics (review chapter)
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
1 A MONTE CARLO EXPERIMENT In the previous slideshow, we saw that the error term is responsible for the variations of b 2 around its fixed component 
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
1 In the previous sequence, we were performing what are described as two-sided t tests. These are appropriate when we have no information about the alternative.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: maximum likelihood estimation of regression coefficients Original citation:
DERIVING LINEAR REGRESSION COEFFICIENTS
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: two sets of dummy variables Original citation: Dougherty, C. (2012) EC220.
1 This sequence shows why OLS is likely to yield inconsistent estimates in models composed of two or more simultaneous relationships. SIMULTANEOUS EQUATIONS.
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
1 UNBIASEDNESS AND EFFICIENCY Much of the analysis in this course will be concerned with three properties of estimators: unbiasedness, efficiency, and.
FIXED EFFECTS REGRESSIONS: WITHIN-GROUPS METHOD The two main approaches to the fitting of models using panel data are known, for reasons that will be explained.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: sampling and estimators Original citation: Dougherty, C. (2012)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the effects of changing the reference category Original citation: Dougherty,
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
THE DUMMY VARIABLE TRAP 1 Suppose that you have a regression model with Y depending on a set of ordinary variables X 2,..., X k and a qualitative variable.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: conflicts between unbiasedness and minimum variance Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 8) Slideshow: measurement error Original citation: Dougherty, C. (2012) EC220 - Introduction.
THE FIXED AND RANDOM COMPONENTS OF A RANDOM VARIABLE 1 In this short sequence we shall decompose a random variable X into its fixed and random components.
1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.
CONSEQUENCES OF AUTOCORRELATION
ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE 1 This sequence derives an alternative expression for the population variance of a random variable. It provides.
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE
1 t TEST OF A HYPOTHESIS RELATING TO A POPULATION MEAN The diagram summarizes the procedure for performing a 5% significance test on the slope coefficient.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 7) Slideshow: weighted least squares and logarithmic regressions Original citation:
ASYMPTOTIC AND FINITE-SAMPLE DISTRIBUTIONS OF THE IV ESTIMATOR
EC220 - Introduction to econometrics (chapter 8)
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 12) Slideshow: footnote: the Cochrane-Orcutt iterative process Original citation: Dougherty,
TYPE II ERROR AND THE POWER OF A TEST A Type I error occurs when the null hypothesis is rejected when it is in fact true. A Type II error occurs when the.
Simple regression model: Y =  1 +  2 X + u 1 We have seen that the regression coefficients b 1 and b 2 are random variables. They provide point estimates.
A.1The model is linear in parameters and correctly specified. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS 1 Moving from the simple to the multiple.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 9) Slideshow: instrumental variable estimation: variation Original citation: Dougherty,
1 We will now look at the properties of the OLS regression estimators with the assumptions of Model B. We will do this within the context of the simple.
1 We will continue with a variation on the basic model. We will now hypothesize that p is a function of m, the rate of growth of the money supply, as well.
1 ASYMPTOTIC PROPERTIES OF ESTIMATORS: THE USE OF SIMULATION In practice we deal with finite samples, not infinite ones. So why should we be interested.
Definition of, the expected value of a function of X : 1 EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE To find the expected value of a function of.
HETEROSCEDASTICITY 1 This sequence relates to Assumption A.4 of the regression model assumptions and introduces the topic of heteroscedasticity. This relates.
INSTRUMENTAL VARIABLES 1 Suppose that you have a model in which Y is determined by X but you have reason to believe that Assumption B.7 is invalid and.
1 INSTRUMENTAL VARIABLE ESTIMATION OF SIMULTANEOUS EQUATIONS In the previous sequence it was asserted that the reduced form equations have two important.
1 ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION We have seen that the variance of a random variable X is given by the expression above. Variance.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: confidence intervals Original citation: Dougherty, C. (2012) EC220 -
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
1 We will illustrate the heteroscedasticity theory with a Monte Carlo simulation. HETEROSCEDASTICITY: MONTE CARLO ILLUSTRATION 1 standard deviation of.
WHITE TEST FOR HETEROSCEDASTICITY 1 The White test for heteroscedasticity looks for evidence of an association between the variance of the disturbance.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: simple regression model Original citation: Dougherty, C. (2012) EC220.
FOOTNOTE: THE COCHRANE–ORCUTT ITERATIVE PROCESS 1 We saw in the previous sequence that AR(1) autocorrelation could be eliminated by a simple manipulation.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Presentation transcript:

1 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS This sequence presents two methods for dealing with the problem of heteroscedasticity. We will start with the general case, where the variance of the distribution of the disturbance term in observation i is  ui 2., not constant for all i

2 If we knew  ui in each observation, we could derive a homoscedastic model by dividing the equation through by it. HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS, not constant for all i

3 The population variance of the disturbance term in the revised model is now equal to 1 in all observations, and so the disturbance term is homoscedastic. HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS, not constant for all i

4 In the revised model, we regress Y' on X' and H, as defined. Note that there is no intercept in the revised model.  1 becomes the slope coefficient of the artificial variable 1/  ui. HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS, not constant for all i

5 The revised model is described as a weighted regression model because we are weighting observation i by a factor 1/  ui. Note that we are automatically giving the highest weights to the most reliable observations (those with the lowest values of  ui ). HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS, not constant for all i

6 Of course in practice we do not know the value of  i in each observation. However it may be reasonable to suppose that it is proportional to some measurable variable, Z i. HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS, not constant for all i Assumption:

7 If this is the case, we can make the model homoscedastic by dividing through by Z i. HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS, not constant for all i Assumption:

8 The disturbance term in the revised model has constant variance 2. We do not need to know the value of 2. The crucial point is that, by assumption, it is constant. HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS, not constant for all i Assumption:

We will illustrate this procedure with the UNIDO data on manufacturing output and GDP. We will try scaling by population. A regression of manufacturing output per capita on GDP per capita is less likely to be subject to heteroscedasticity. 9 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

Here is the revised scatter diagram. Does it look homoscedastic? Actually, no. This is still a classic pattern of heteroscedasticity. 10 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

RSS 2 is much larger than RSS 1. RSS 1 = 5,378,000 RSS 2 = 17,362, HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

12 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS However, the subsamples are small and high ratios can occur on a pure chance basis. The null hypothesis of homoscedasticity is only just rejected at the 5% level. RSS 1 = 5,378,000 RSS 2 = 17,362,000

Often the X variable itself is a suitable scaling variable. After all, the Goldfeld–Quandt test assumes that the standard deviation of the disturbance term is proportional to it. 13 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS, not constant for all i Assumption:

Note that when we scale though by it, the  2 term becomes the intercept in the revised model. 14 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS, not constant for all i Assumption:

It follows that when we interpret the regression results, the slope coefficient is an estimate of  1 in the original model and the intercept is an estimate of  HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS, not constant for all i Assumption:

Here is the corresponding scatter diagram. Is there any evidence of heteroscedasticity? 16 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

No longer. The residual sums of squares for the two subsamples are almost identical, indeed closer than one would usually expect on a pure chance basis under the null hypothesis. RSS 2 = HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS RSS 1 = 0.065

18 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS As a consequence, the F statistic is not significant. The heteroscedasticity has been eliminated. RSS 2 = RSS 1 = 0.065

We will now consider an alternative approach to the problem. It is possible that the heteroscedasticity has been caused by an inappropriate mathematical specification. Suppose, in particular, that the true relationship is in fact logarithmic. 19 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

Here is the corresponding scatter diagram. No sign of heteroscedasticity. 20 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

We confirm this with the Goldfeld–Quandt test. In this case there is no point in calculating the conventional test statistic. RSS 2 is smaller than RSS 1, so it cannot be significantly greater than RSS 1. RSS 2 = RSS 1 = HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

In this situation we should test whether there is evidence that the standard deviation of the disturbance term is inversely proportional to the X variable. For this purpose, the F statistic is the inverse of the conventional one. 22 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS RSS 2 = RSS 1 = 2.140

The null hypothesis of homoscedasticity is not rejected. 23 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS RSS 2 = RSS 1 = 2.140

Now an additive disturbance term in the logarithmic model is equivalent to a multiplicative one in the original model. 24 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

This means that the absolute size of the effect of the disturbance term is large for large values of the X variable and small for small ones, when the scatter diagram is redrawn with the variables in their original form. 25 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

For example, Singapore and South Korea have relatively large manufacturing sectors, and Greece and Mexico relatively small ones. South Korea Mexico Singapore Greece 26 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

The variations for these countries are similar when plotted on the logarithmic scale, but those for South Korea and Mexico are much larger when the variables are plotted in natural units. South Korea Mexico Singapore Greece 27 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

Here is a summary of the regressions using the four alternative specifications of the model. 28 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

The first regression suggests that, for every increase of $1 million in GDP, manufacturing output increases by $194,000. Thus, at the margin, manufacturing accounts for 0.19 of GDP. The intercept does not have any plausible meaning. 29 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

However, this regression was subject to severe heteroscedasticity. Although the estimate of the coefficient of GDP is unbiased, it is likely to be relatively inaccurate. Also, and this is a separate effect of heteroscedasticity, the standard errors, t tests and F test are invalid. 30 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

In the second regression, the estimate of the slope coefficient was a little lower. However for this regression also the null hypothesis of homoscedasticity was rejected, but only at the 5% level. 31 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

In the third regression the model was scaled through by GDP. As a consequence, the intercept became an estimator of the original slope coefficient, and vice versa. 32 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

For this model the null hypothesis of homoscedasticity was not rejected. In principle, therefore, it should yield more accurate estimates of the coefficients than the first two, and we are able to perform tests. 33 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

For the logarithmic model also the null hypothesis of homoscedasticity was not rejected. So we have two models which survive the Goldfeld–Quandt test. Which do you prefer? Think about it. 34 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

You probably went for the logarithmic model, attracted by the high R 2. However, in this example, there is little to choose between the third and fourth models. Substantively, they have the same interpretation. 35 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

In the third model, 1/GDP has a low t statistic and appears to be an irrelevant variable. The model is telling us that manufacturing output, as a proportion of GDP, is constant. Because it is constant, R 2 is effectively HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

The fourth model is telling us that the elasticity of manufacturing output with respect to GDP is equal to 1. In other words, manufacturing output increases proportionally with GDP and remains a constant proportion of it. 37 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

Converting the logarithmic equation back into natural units, you obtain the equation shown. Like the third equation, it implies that manufacturing output accounts for a little over 0.18 of GDP, at the margin. 38 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS

Copyright Christopher Dougherty These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 7.3 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre Individuals studying econometrics on their own who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics or the University of London International Programmes distance learning course EC2020 Elements of Econometrics