Lecture 16 Preview: Heteroskedasticity

Lecture 16 Preview: Heteroskedasticity
Regression Model Standard Ordinary Least Squares (OLS) Premises Estimation Procedures Embedded within the Ordinary Least Squares (OLS) Estimation Procedure What Is Heteroskedasticity? Heteroskedasticity and the Ordinary Least Squares (OLS) Estimation Procedure: The Consequences The Mathematics Our Suspicions Confirming Our Suspicions: A Simulation Accounting for Heteroskedasticity: An Example Justifying the Generalized Least Squares (GLS) Estimation Procedure Robust Standard Errors: An Alternative Approach

Regression Model yt = Dependent variable xt = Explanatory variable et = Error term yt = Const + xxt + et Const and x are the parameters t = 1, 2, …, T The error term is a random variable representing random influences: Mean[et] = 0 Standard Ordinary Least Squares (OLS) Premises Error Term Equal Variance Premise: The variance of the error term’s probability distribution for each observation is the same. Error Term/Error Term Independence Premise: The error terms are independent. Explanatory Variable/Error Term Independence Premise: The explanatory variables, the xt’s, and the error terms, the et’s, are not correlated. OLS Estimation Procedure Includes Three Estimation Procedures Value of the parameters, Const and x: bx = bConst = Question: What happens when the error term equal variance premise is violated? Variance of the error term’s probability distribution, Var[e]: SSR EstVar[e] = Degrees of Freedom Variance of the coefficient estimate’s probability distribution, Var[bx]: EstVar[bx] = Good News: When the standard premises are satisfied each of these procedures is unbiased. Good News: When the standard premises are satisfied the OLS estimation procedure for the coefficient value is the best linear unbiased estimation procedure (BLUE). Crucial Point: When the ordinary least squares (OLS) estimation procedure performs its calculations, it implicitly assumes that the three standard (OLS) premises are satisfied.

Error Term Equal Variance Premise: The variance of the error term’s probability distribution for each observation is the same; all the variances equal Var[e]: Var[e1] = Var[e2] = … = Var[eT] = Var[e] Let us review precisely what this means. What Is Heteroskedasticity? Consider the error terms of Professor Lord’s three students:  Lab 16.1 Heter = 0: No Heteroskedasticity For each student, the mean equals 0. This indicates that each student’s error term indeed reflects random influences. The variances are equal. No heteroskedasticity is present; the error term equal variance premise is satisfied.

Error Term Equal Variance Premise: The variance of the error term’s probability distribution for each observation is the same; all the variances equal Var[e]: Var[e1] = Var[e2] = … = Var[eT] = Var[e]  Lab 16.1 Heter = 1 For each student, the mean equals 0. This indicates that each student’s error term indeed reflects random influences. The variances are not equal. Heteroskedasticity is present; the error term equal variance premise is violated. Question: Why might heteroskedasticity exist in this case?

Consequences of Heteroskedasticity
How does the presence of heteroskedasticity affect the estimation procedure for the bx = value of the coefficient? SSR variance of the error term’s probability distribution? EstVar[e] = Degrees of Freedom variance of the coefficient estimate’s probability distribution? EstVar[bx] = More specifically, are the three estimation procedures embedded in the ordinary least squares (OLS) estimation procedure still unbiased in the presence of heteroskedasticity? Estimation Procedure for the Value of the Coefficient Question: In the presence of heteroskedasticity, is the OLS estimation procedure for the value of the coefficient unbiased? That is, does Mean[bx] still equal x? Review: Arithmetic of Means Mean of a constant plus a variable: Mean[c + x] = c + Mean[x] Mean of a constant times a variable: Mean[cx] = c Mean[x] Mean of the sum of two variables: Mean[x + y] = Mean[x] + Mean[y]

Mean of the Coefficient Estimate’s Probability Distribution
bx Mean[c + x] = c + Mean[x] Rewrite fraction as a product Mean[cx] = cMean[x] Mean[x+y] = Mean[x] + Mean[y] Mean[cx] = cMean[x] Mean[e1] = Mean[e2] = Mean[e3] = 0 Question: Have we relied on the error term equal variance premise to show that the OLS estimation procedure for the coefficient value is unbiased? No Question: In the presence of heteroskedasticity, should we expect the OLS estimation procedure for the coefficient value still to be unbiased? Yes

This equation is estimating a “single” Var[e].
OLS Estimation Procedure: Variance of the Coefficient Estimate’s Probability Distribution Question: In the presence of heteroskedasticity, is the OLS estimation procedure for the variance of the coefficient estimate’s probability distribution unbiased? Recall the two step strategy we used to estimate the variance of the coefficient estimate’s probability distribution: Step 1: Estimate the variance of the error term’s probability distribution from the available information. Step 2: Apply the relationship between the variances of the coefficient estimate’s and error term’s probability distributions: SSR EstVar[e] = Var[bx] = Degrees of Freedom EstVar[e] This equation is estimating a “single” Var[e]. What does Var[e] equal? EstVar[bx] = Var[e1] = … = Var[eT] = Var[e] Strategy: The strategy the ordinary least squares (OLS) estimation procedure uses is based on the premise that there is a “single” Var[e]. Question: Has the OLS estimation procedure relied on the error term equal variance premise to estimate the variance of the coefficient estimate’s probability distribution? Yes Question: In the presence of heteroskedasticity, might the OLS estimation procedure for the coefficient estimate’s variance be flawed? Yes

Our Suspicions: OLS estimation procedure for estimating the
coefficient value should be unbiased. The variance calculation is based on a false premise. variance of the coefficient estimate’s probability distribution may be flawed. Act Coef Is the estimation procedure for the coefficient value unbiased? Unbiased estimation procedure: After many, many repetitions of the experiment the average (mean) of the estimates equals the actual value. 2 0 2 Mean (average) of the value estimates from all repetitions. Repetition Coefficient estimate for this repetition: Coef Value Est Variance of the estimated coefficient values estimates from all repetitions. Mean bx = Var Sum Sqr XDev Is the estimation procedure for the variance of the coefficient estimate’s probability distribution unbiased? SSR EstVar[e] = Degrees of Freedom SSR EstVar[bx] = Coef Var Est Estimate of the variance for the coefficient estimate’s probability distribution calculated from this repetition Mean Average of the variance estimates from all repetitions “Single” Var[e] premise

Is the OLS estimation procedure for the coefficient’s value unbiased?
Simulation Results  Lab 16.2 Is OLS estimation procedure for the variance of the coefficient estimate’s probability distribution unbiased? Is the OLS estimation procedure for the coefficient’s value unbiased? Mean (Average) Variance of the Average of Actual of the Estimated Estimated Coef Estimated Variances, Heter Value Values, bx, from Values, bx, from EstVar[bx], from Each Factor of x All Repetitions All Repetitions All Repetitions 2.0 2.0 2.5 2.5 1 2.0 2.0 3.6 2.9 When heteroskedasticity is absent Nothing but good news When heteroskedasticity is present Good news: OLS estimation procedure for the coefficient value is unbiased. Bad news: OLS procedure for estimating the variance of the coefficient estimate’s probability distribution is flawed because it is based on a false premise. Consequently, all calculations based on the variance of the coefficient estimate’s probability distribution will be flawed: standard errors t-statistics tail probabilities

Accounting for Heteroskedasticity
Step 1: Apply the Ordinary Least Squares (OLS) Estimation Procedure. Estimate the model’s parameters with the ordinary least squares (OLS) estimation procedure. Step 2: Consider the Possibility of Heteroskedasticity. Ask whether there is reason to suspect that heteroskedasticity may be present. Use the ordinary least squares (OLS) regression results to “get a sense” of whether hetereoskedasticity is a problem by examining the residuals. If the presence of hetereoskedasticity is suspected, formulate a model to explain it. Use the Breusch-Pagan-Godfrey approach by estimating an artificial regression to test for the presence of heteroskedasticity. Step 3: Apply the Generalized Least Squares (GLS) Estimation Procedure. Apply the model of heteroskedasticity and algebraically manipulate the original model to derive a new, tweaked model in which the error terms do not suffer from heteroskedasticity. Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of the tweaked model. An Example: GDP and Internet Use Theory: Higher per capita GDP increases Internet use. Model: LogUsersInternett = Const + GDPGdpPCt + et Theory: GDP > 0. 1992 Internet Data: Cross section data of Internet use and gross domestic product for 29 countries in 1992. LogUsersInternett Log of Internet users per 1,000 people in nation t GdpPCt Per capita GDP (1,000’s of real “international” dollars) in nation t

Ordinary Least Squares (OLS)
Step 1: Apply the Ordinary Least Squares (OLS) Estimation Procedure. Theory: Higher per capita GDP increases Internet use.  EViews Model: LogUsersInternett = Const + GDPGdpPCt + et Ordinary Least Squares (OLS) Dependent Variable: LogUsersInternet Explanatory Variable(s): Estimate SE t-Statistic Prob GdpPC 0.0046 Const  0.4475 Number of Observations 29 Estimated Equation: EstLogUsersInternet =  GdpPC Interpretation: We estimate that a $1,000 increase in real per capita GDP results in a 10.1 percent increase in Internet users. Critical Result: The GdpPC coefficient estimate equals The positive sign of the coefficient estimate suggests that higher per capita GDP increases Internet use. This evidence supports the theory. H0: GDP = 0 Per capita GDP does not affect Internet use H1: GDP > 0 Higher per capita GDP increases Internet use Prob[Results IF H0 True]: What is the probability that the GdpPC estimate from one repetition of the experiment will be .101 or more, if H0 is true (that is, if the per capita GDP has no effect on the Internet use, if GDP actually equals 0)?

Prob[Results IF H0 True]: What is the probability that the GdpPC estimate from one repetition of the experiment will be .101 or more, if H0 is true (that is, if the per capita GDP has no effect on the Internet use, if GDP actually equals 0)? H0: GDP = 0 H1: GDP > 0 t-distribution Mean = 0 SE = .0326 DF = 27 .0046/2 .0046/2 Using the tails probability: bGDP .0046 = .0023 Prob[Results IF H0 True] = .101 .101 2 .101 Would we reject H0 at the traditional levels? Yes Question: Could there be a potential problem? Question: What do we know about EstVar[bGDP] when heteroskedasticity is present? Answer: It is based on a false premise; EstVar[bGDP] could be inaccurate. Question: What is the SE[bGDP]? Answer: The square root of EstVar[bGDP] The tails probability is based on the SE[bGDP]. Consequently, the tails probability could be inaccurate also. Our calculation of Prob[Results IF H0 True] could be misleading us. Ordinary Least Squares (OLS) Dependent Variable: LogUsersInternet Explanatory Variable(s): Estimate SE t-Statistic Prob GdpPC 0.0046 Const  0.4475 Number of Observations 29 Degrees of Freedom 27

 EViews Step 2: Consider the Possibility of Heteroskedasticity.
Is reason to suspect that heteroskedasticity may be present? Yes. When the per capita GDP is low, individuals have little to spend on any goods other than the basic necessities. Individuals have little to spend on Internet use and consequently Internet use will be low. When the per capita GDP is high, individuals can afford to purchase more goods. Naturally, consumer tastes vary from nation to nation. In some high per capita GDP nations, individuals will opt to spend much on Internet use. In other high per capita GDP nations, individuals will opt to spend little on Internet use. Model: LogUsersInternett = Const + GDPGdpPCt + et  EViews Two nations with virtually the same level of per capita GDP have quite different rates of Internet use. The error term in the model would capture these differences. As per capita GDP increases we would expect the variance of the error term’s probability distribution to increase.

Use the ordinary least squares (OLS) regression results to “get a sense” of whether hetereoskedasticity is a problem by examining the residuals. We can think of the residuals as the estimated errors. The error terms, the et’s, are unobservable The residuals, the Rest’s, are observatible  yt = Const + xxt + et  Rest = yt  Estyt Estyt = bConst + bxxt  et = yt  (Const + xxt)  Rest = yt  (bConst + bxxt) Our suspicions appear to be borne out.  EViews

If the presence of hetereoskedasticity is suspected, formulate a model to explain it. Heteroskedasticity Model: (et  Mean[et])2 = Const + GDPGdpPCt + vt Theory: GDP > 0  Since Mean[et] = 0. Use the Breusch-Pagan-Godfrey approach by estimating an artificial regression to test for the presence of heteroskedasticity. We can think of the  residuals as the estimated errors. ResSqrt = Const + GDPGdpPCt + vt Aside: Statistical software makes it easy to do this.  EViews Ordinary Least Squares (OLS) Dependent Variable: ResSqr Explanatory Variable(s): Estimate SE t-Statistic Prob GdpPC 0.0118 Const   0.2651 Number of Observations 29 Critical Result: The GdpPC coefficient estimate equals The positive sign of the coefficient estimate suggests that higher per capita GDP increases the squared deviation of the error term from its mean. This evidence supports the view that heteroskedasticity is present. H0: GDP = 0 Per capita GDP does not affect the squared deviation of the residual H1: GDP > 0 Higher per capita GDP increases the squared deviation of the residual Based on these results we assume that the variance of the error terms probability distribution is proportion to GdpPC: .0118 Prob[Results IF H0 True] = = 2 Heteroskedasticity Model: Var[et] = VGdpPCt where V equals a constant

Step 3: Apply the Generalized Least Squares (GLS) Estimation Procedure.
Apply the model of heteroskedasticity and algebraically manipulate the original model to derive a new, tweaked model in which the error terms do not suffer from heteroskedasticity. Original Model: LogUsersInternett = Const GDPGdpPCt et For now, do not worry about why we divide by ; it will become clear shortly. Divide by Arithmetic of variances: Var[cx] = c2Var[x] Heteroskedasticity Model: Var[et] = VGdpPCt where V equals a constant Crucial Point: The tweaked model does not suffer from heteroskedasticity. That is why we divided by = V

Use the ordinary least squares (OLS) estimation procedure to estimate the parameters of the tweaked model. NB: The tweaked model does not include a constant term.  EViews Ordinary Least Squares (OLS) Dependent Variable: AdjLogUsersInternet Explanatory Variable(s): Estimate SE t-Statistic Prob AdjGdpPC 0.0002 AdjConst   0.1183 Number of Observations 29 H0: GDP = 0 H1: GDP > 0 .0002 = Prob[Results IF H0 True] = 2 The Ordinary Least Squares (OLS) and Generalized Least Squares (GLS) Estimates GDP Estimate SE t-Statistic Tails Prob Ordinary Least Squares (OLS) Generalized Least Squares (GLS)

Is the estimation procedure for the coefficient’s value unbiased?
Justifying the Generalized Least Squares (GLS) Estimation Procedure Is the estimation procedure for the variance of the coefficient estimate’s probability distribution unbiased? Is the estimation procedure for the coefficient’s value unbiased? Recall our simulation: Mean (Average) Variance of the Average of Actual of the Estimated Estimated Coef Estimated Variances, Heter Estim Value Values, bx, from Values, bx, from EstVar[bx], from Each Factor Proc of x All Repetitions All Repetitions All Repetitions OLS 2.0 2.0 2.5 2.5 1 OLS 2.0 2.0 3.6 2.9 1 GLS 2.0 2.0 2.3 2.3  Lab 16.4 Questions: Is the estimation procedure: Std Premises Hetero an unbiased estimation procedure for the OLS OLS GLS coefficient value? Yes Yes Yes variance of the coefficient estimate’s probability distribution? Yes No Yes for the coefficient value the best linear unbiased estimation procedure (BLUE)? Yes No Yes

Justifying the Generalized Least Squares (GLS) Estimation Procedure
Two issues emerge with the ordinary least squares (OLS) estimation procedure when heteroskedasticity is present: The standard error calculations made by the ordinary least squares (OLS) estimation procedure are flawed. While the ordinary least squares (OLS) for the coefficient value is unbiased, it is not the best linear unbiased estimation procedure (BLUE). Recall our simulation: Mean (Average) Variance of the Average of Actual of the Estimated Estimated Coef Estimated Variances, Heter Estim Value Values, bx, from Values, bx, from EstVar[bx], from Each Factor Proc of x All Repetitions All Repetitions All Repetitions OLS 2.0 2.0 2.5 2.5 1 OLS 2.0 2.0 3.6 2.9 1 GLS 2.0 2.0 2.3 2.3  Lab 16.4 Questions: Is the estimation procedure: Std Premises Hetero an unbiased estimation procedure for the OLS OLS GLS coefficient value? Yes Yes Yes variance of the coefficient estimate’s probability distribution? Yes No Yes for the coefficient value the best linear unbiased estimation procedure (BLUE)? Yes No Yes

Robust Standard Errors: An Alternative Approach Two issues emerge with the ordinary least squares (OLS) estimation procedure when heteroskedasticity is present: The standard error calculations made by the ordinary least squares (OLS) estimation procedure are flawed. While the ordinary least squares (OLS) for the coefficient value is unbiased, it is not the best linear unbiased estimation procedure (BLUE).  EViews Huber-White robust standard errors: Dependent Variable: LogUsersInternet Explanatory Variable(s): Estimate SE t-Statistic Prob GdpPC 0.0044 Const   0.3627 White heteroskedasticity-consistent SEs Number of Observations 29 Standard errors based on the equal error term variance premise: Ordinary Least Squares (OLS) Dependent Variable: LogUsersInternet Explanatory Variable(s): Estimate SE t-Statistic Prob GdpPC 0.0046 Const  0.4475 Number of Observations 29

Lecture 16 Preview: Heteroskedasticity

Similar presentations

Presentation on theme: "Lecture 16 Preview: Heteroskedasticity"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 16 Preview: Heteroskedasticity

Similar presentations

Presentation on theme: "Lecture 16 Preview: Heteroskedasticity"— Presentation transcript:

Similar presentations

About project

Feedback