Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 5 Heteroskedasticity.

Similar presentations


Presentation on theme: "Chapter 5 Heteroskedasticity."— Presentation transcript:

1 Chapter 5 Heteroskedasticity

2 A regression line

3 What is in this Chapter? How do we detect this problem
What are the consequences of this problem? What are the solutions?

4

5 What is in this Chapter? First, We discuss tests based on OLS residuals, likelihood ratio test, G-Q test and the B-P test. The last one is an LM test. Regarding consequences, we show that the OLS estimators are unbiased but inefficient and the standard errors are also biased, thus invalidating tests of significance

6 What is in this Chapter? Regarding solutions, we discuss solutions depending on particular assumptions about the error variance and general solutions. We also discuss transformation of variables to logs and the problems associated with deflators, both of which are commonly used as solutions to the heteroskedasticity problem.

7 5.1 Introduction The homoskedasticity = variance of the error terms is constant The heteroskedasticity = variance of the error terms is non-constant Illustrative Example  Table 5.1 presents consumption expenditures (y) and income (x) for 20 families. Suppose that we estimate the equation by ordinary least squares. We get (figures in parentheses are standard errors)

8 5.1 Introduction We get (figures in parentheses are standard errors) y= x R2 = 0.986 (0.703) (0.0253) RSS=31.074 Section 5.4

9 5.1 Introduction

10 5.1 Introduction

11 5.1 Introduction

12 5.1 Introduction

13 5.1 Introduction The residuals from this equation are presented in Table 5.3 In this situation there is no perceptible increase in the magnitudes of the residuals as the value of x increases Thus there does not appear to be a heteroskedasticity problem.

14 5.2 Detection of Heteroskedasticity
In the illustrative example in Section 5.1 we plotted estimated residual against to see whether we notice any systematic pattern in the residuals that suggests heteroskedasticity in the error. Note however, that by virtue if the normal equation, and are uncorrelated though could be correlated with .

15 5.2 Detection of Heteroskedasticity
Thus if we are using a regression procedure to test for heteroskedasticity, we should use a regression of on or a regression of or In the case of multiple regression, we should use powers of , the predicted value of , or powers of all the explanatory variables.

16 5.2 Detection of Heteroskedasticity
The test suggested by Anscombe and a test called RESET suggested by Ramsey both involve regressing and testing whether or not the coefficients are significant. The test suggested by White involves regressing on all the explanatory variables and their squares and cross products. For instance, with explanatory variables x1, x2, x3, it involves regressing

17 5.2 Detection of Heteroskedasticity
Glejser suggested estimating regressions of the type and so on and testing the hypothesis

18 5.2 Detection of Heteroskedasticity
The implicit assumption behind all these tests is that where zi os an unknown variable and the different tests use different proxies or surrogates for the unknown function f(z).

19 5.2 Detection of Heteroskedasticity

20 5.2 Detection of Heteroskedasticity

21 5.2 Detection of Heteroskedasticity
Thus there is evidence of heteroskedasticity even in the log- linear from, although casually looking at the residuals in Table 5.3, we concluded earlier that the errors were homoskedastic. The Goldfeld-Quandt, to be discussed later in this section, also did not reject the hypothesis of homoskedasticity. The Glejser tests, however, show significant heteroskedasticity in the log-linear form.

22 Assignment Redo this illustrative example
The figure of the absolute value of the residual and x variable Linear form Log-linear form Three types of tests: Linear form and log-linear form The e-view table Reject/accept the null hypothesis of homogenous variance

23 5.2 Detection of Heteroskedasticity
Some Other Tests (General tests) Likelihood Ratio Test Goldfeld and Quandt Test Breusch-Pagan Test

24 5.2 Detection of Heteroskedasticity
Likelihood Ratio Test If the number of observations is large, one can use a likelihood ratio test. Divide the residuals (estimated from the OLS regression) into k group with ni observations in the i th group, Estimate the error variances in each group by Let the estimate of the error variance from the entire sample be Then if we define as

25 5.2 Detection of Heteroskedasticity
Goldfeld and Quandt Test If we do not have large samples, we can use the Goldfeld and Quandt test. In this test we split the observations into two groups — one corresponding to large values of x and the other corresponding to small values of x —

26 5.2 Detection of Heteroskedasticity
Fit separate regressions for each and then apply an F-test to test the equality of error variances. Goldfeld and Quandt suggest omitting some observations in the middle to increase our ability to discriminate between the two error variances.

27 5.2 Detection of Heteroskedasticity
Breusch-Pagan Test Suppose that If there are some variables that influence the error variance and if , then the Breusch and Pagan test is atest of the hypothesis The function can be any function.

28 5.2 Detection of Heteroskedasticity
For instance, f(x) can be ,and so on. The Breusch and Pagan test does not depend on the functional form. Let S0 = regression sum of squares from a regression of Then has a X 2 –distribution with d.f. r. This test is an asymptotic test. An intuitive justification for the test will be given after an illustrative example.

29 5.2 Detection of Heteroskedasticity
Illustrative Example Consider the data in Table 5.1. To apply the Goldfeld-Quandt test we consider two groups of 10 observations each, ordered by the values of the variable x. The first group consists of observations 6, 11, 9, 4, 14, 15, 19, 20 ,1, and 16. The second group consists of the remaining 10.

30 5.2 Detection of Heteroskedasticity
Illustrative Example The estimate equations were Group 1: y= x R2 = 0.985 (0.616) (0.038) = 0.475 Group 2: y= x R2 = 0.904 (3.443) (0.096) = 3.154

31 5.2 Detection of Heteroskedasticity
The F- ratio for the test is The 1% point for the F-distribution with d.f. 8 and 8 is 6.03. Thus the F-value is significant at the 1% level and we reject the hypothesis if homoskedasticity.

32 5.2 Detection of Heteroskedasticity
Group 1: log y = x R2 = 0.992 (0.079) (0.030) = Group 2: log y = x R2 = 0.912 (0.352) (0.099) = The F-ratio for the test is

33 5.2 Detection of Heteroskedasticity
For d.f. 8 and 8, the 5% point from the F-tables is 3.44. Thus if we use the 5% significance level, we do not reject the hypothesis of homoskedasticity if we consider the linear form but do not reject it in the log-linear form. Note that the White test rejected the hypothesis in both the forms.

34 5.3 Consequences of Heteroskedasticity

35 5.4 Solutions to the Heteroskedasticity Problem
There are two types of solutions that have been suggested in the literature for the problem of heteroskedasticity:  Solutions dependent on particular assumptions about σi. General solutions. We first discuss category 1: weighted least squares (WLS)

36 5.4 Solutions to the Heteroskedasticity Problem
WLS

37 5.4 Solutions to the Heteroskedasticity Problem
Thus the constant term in this equation is the slope coefficient in the original equation.

38 5.4 Solutions to the Heteroskedasticity Problem
Prais and Houthakker found in their analysis of family budget data that the errors from the equation had variance increasing with household income. They considered a model ,that is, In this case we cannot divide the whole equation by a known constant as before. For this model we can consider a two-step procedure as follows.

39 5.4 Solutions to the Heteroskedasticity Problem
First estimate and by OLS. Let these estimators be and Now use the WLS procedure as outlined earlier, that is, regress on and with no constant term. The limitation of the two-step procedure: the error involved in the first step will affect the second step

40 5.4 Solutions to the Heteroskedasticity Problem
This procedure is called a two-step weighted least squares procedure. The standard errors we get for the estimates of and from this procedure are valid only asymptotically. The are asymptotic standard errors because the weights have been estimated.

41 5.4 Solutions to the Heteroskedasticity Problem
One can iterate this WLS procedure further, that is, use the new estimates of and to construct new weights and then use the WLS procedure, and repeat this procedure until convergence. This procedure is called the iterated weighted least squares procedure. However, there is no gain in (asymptotic) efficiency by iteration.

42 5.4 Solutions to the Heteroskedasticity Problem
Illustrative Example As an illustration, again consider the data in Table 5.1.We saw earlier that regressing the absolute values of the residuals on x (in Glejser’s tests) gave the following estimates: Now we regress (with no constant term) where

43 5.4 Solutions to the Heteroskedasticity Problem
The resulting equation is If we assume that , the two-step WLS procedure would be as follows. Section 5.1

44 5.4 Solutions to the Heteroskedasticity Problem
Next we compute and regress The results were The in these equations are not comparable. But our interest is in estimates of the parameters in the consumption function.

45 Assignment Use the data of Table 5.1 to do the WLS
Consider the log-liner form Run the Glejser’s tests to check if the log-linear regression model still has non-constant variance Estimate the non-constant variance and run the WLS Write a one-step program using Gauss program

46 5.5 Heteroskedasticity and the Use of Deflators
There are two remedies often suggested and used for solving the heteroskedasticity problem:  Transforming the data to logs Deflating the variables by some measure of "size."

47 5.5 Heteroskedasticity and the Use of Deflators

48 5.5 Heteroskedasticity and the Use of Deflators

49 5.5 Heteroskedasticity and the Use of Deflators
One important thing to note is that the purpose in all these procedures of deflation is to get more efficient estimates of the parameters But once those estimates have been obtained, one should make all inferences—calculation of the residuals, prediction of future values, etc., from the original equation—not the equation in the deflated variables.

50 5.5 Heteroskedasticity and the Use of Deflators
Another point to note is that since the purpose of deflation is to get more efficient estimates, it is tempting to argue about the merits of the different procedures by looking at the standard errors of the coefficients. However, this is not correct, because in the presence of heteroskedasticity the standard errors themselves are biased, as we showed earlier

51 5.5 Heteroskedasticity and the Use of Deflators
For instance, in the five equations presented above, the second and third are comparable and so are the fourth and fifth. In both cases if we look at the standard errors of the coefficient of X, the coefficient in the undeflated equation has a smaller standard error than the corresponding coefficient in the deflated equation. However, if the standard errors are biased, we have to be careful in making too much of these differences.

52 5.5 Heteroskedasticity and the Use of Deflators
In the preceding example we have considered miles M as a deflator and also as an explanatory variable. In this context we should mention some discussion in the literature on "spurious correlation" between ratios.

53 5.5 Heteroskedasticity and the Use of Deflators
The argument simply is that even if we have two variables X and Y that are uncorrelated, if we deflate both the variables by another variable Z, there could be a strong correlation between X/Z and Y/Z because of the common denominator Z . It is wrong to infer from this correlation that there exists a close relationship between X and Y.

54 5.5 Heteroskedasticity and the Use of Deflators
Of course, if our interest is in fact the relationship between X/Z and Y/Z, there is no reason why this correlation need be called "spurious." As Kuh and Meyer point out, "The question of spurious correlation quite obviously does not arise when the hypothesis to be tested has initially been formulated in terms of ratios, for instance, in problems involving relative prices.

55 5.5 Heteroskedasticity and the Use of Deflators
Similarly, when a series such as money value of output is divided by a price index to obtain a 'constant dollar' estimate of output, no question of spurious correlation need arise. Thus, spurious correlation can only exist when a hypothesis pertains to undeflated variables and the data have been divided through by another series for reasons extraneous to but not in conflict with the hypothesis framed an exact, i.e., nonstochastic relation.

56 5.5 Heteroskedasticity and the Use of Deflators
In summary, often in econometric work deflated or ratio variables are used to solve the heteroskedasticity problem Deflation can sometimes be justified on pure economic grounds, as in the case of the use of "real" quantities and relative prices In this case all the inferences from the estimated equation will be based on the equation in the deflated variables.

57 5.5 Heteroskedasticity and the Use of Deflators
However, if deflation is used to solve the heteroskedasticity problem, any inferences we make have to be based on the original equation, not the equation in the deflated variables In any case, deflation may increase or decrease the resulting correlations, but this is beside the point. Since the correlations are not comparable anyway, one should not draw any inferences from them.

58 5.5 Heteroskedasticity and the Use of Deflators
Illustrative Example In Table 5.5 we present data on y = population density x = distance from the central business district for 39 census tracts on the Baltimore area in It has been suggested (this is called the “density gradient model”) that population density follows the relationship where A is the density of the central business district.

59 5.5 Heteroskedasticity and the Use of Deflators
The basic hypothesis is that as you move away from the central business district population density drops off. For estimation purposes we take logs and write

60 5.5 Heteroskedasticity and the Use of Deflators
where Estimation of this equation by OLS gave the following results (figures in oarenthese are t-values, not standard errors):

61 5.5 Heteroskedasticity and the Use of Deflators
The t-values are very high and the coefficients and significantly different from zero (with a significance level of less than 1%).The sign of is negative, as expected. With cross-sectional data like these we expect heteroskedasticity, and this could result in an underestimation of the standard errors (and thus an overestimation of the t-ratios).

62 5.5 Heteroskedasticity and the Use of Deflators
To check whether there is heteroskedasticity, we have to analyze the estimated residuals A plot if against showed a positive relationship and hence Glejser’s tests were applied.

63 5.5 Heteroskedasticity and the Use of Deflators
Defining by , the following equations were estimated:

64 5.5 Heteroskedasticity and the Use of Deflators
We choose the specification that gives the highest [or equivalently the highest t-value, since in the case of only one regressor.

65 5.5 Heteroskedasticity and the Use of Deflators
The estimated regressions with t-values in parentheses were

66 5.5 Heteroskedasticity and the Use of Deflators
All the t-statistics are significant, indicating the presence of heteroskedasticity. Based on the highest t-ratio, we chose the second specification (although the fourth specification is equally valid).

67 5.5 Heteroskedasticity and the Use of Deflators
Deflating throughout by gives the regression equations to be estimated as The estimates were (figures in parentheses are t-ratios)


Download ppt "Chapter 5 Heteroskedasticity."

Similar presentations


Ads by Google