Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 7 Preview: Estimating the Variance of an Estimate’s Probability Distribution Review: Ordinary Least Squares (OLS) Estimation Procedure Importance.

Similar presentations


Presentation on theme: "Lecture 7 Preview: Estimating the Variance of an Estimate’s Probability Distribution Review: Ordinary Least Squares (OLS) Estimation Procedure Importance."— Presentation transcript:

1 Lecture 7 Preview: Estimating the Variance of an Estimate’s Probability Distribution Review: Ordinary Least Squares (OLS) Estimation Procedure Importance of the Coefficient Estimate’s Probability Distribution General Properties of the Ordinary Least Squares (OLS) Estimation Procedure Step 1: Estimate the Variance of the Error Term’s Probability Distribution Step 2: Use the Estimated Variance of the Error Term’s Probability Distribution to Estimate the Variance of the Coefficient Estimate’s Probability Distribution Degrees of Freedom Estimating the Variance of the Coefficient Estimate’s Probability Distribution First Attempt: Variance of the Error Term’s Numerical Values Second Attempt: Variance of the Residual’s Numerical Values Third Attempt: “Adjusted” Variance of the Residual’s Numerical Values Three Important Parts Regression Printouts Mean (Center) of the Coefficient Estimate’s Probability Distribution Variance (Spread) of the Coefficient Estimate’s Probability Distribution Summary: The Ordinary Least Squares (OLS) Estimation Procedure Value of the Coefficient Variance of the Error Term’s Probability Distribution Variance of the Coefficient Estimate’s Probability Distribution

2 The Problem: But there is a problem here, isn’t there? We need to know the variance of the error term’s probability distribution to calculate the variance of the coefficient estimate’s probability distribution. Unfortunately, the variance of the error term’s probability distribution is unobservable. In reality, we can never know the variance of the error term’s probability distribution. How can Clint proceed? Importance of the Probability Distribution’s Mean (Center) and Variance (Spread) Mean: When the mean of the coefficient estimate’s probability distribution, Mean[b x ], equals the actual value of the coefficient,  x, the estimation procedure is unbiased. Variance: When the estimation procedure for the coefficient value is unbiased, the variance of the estimate’s probability distribution, Var[b x ], determines the reliability of the estimate. Mean[b x ] =  x Estimation Procedure Is Unbiased Var[b x ] = Determines the Reliability of the Estimate As Var[b x ] Decreases Reliability of b x Increases As the variance decreases, the probability distribution becomes more tightly cropped around the actual value making it more likely for the coefficient estimate to be close to the actual coefficient value. The estimation procedure does not systematically underestimate or overestimate the actual coefficient value. General Properties of the Ordinary Least Squares (OLS) Estimation Procedure When the standard ordinary least squares premises are met, the following equations describe the coefficient estimate’s probability distribution: Probability Distribution of Coefficient Estimates Mean[b x ] =  x Probability Distributions of Coefficient Estimates Mean[b x ] =  x Variance largeVariance small

3 Clint’s Strategy: Estimating the Variance of the Coefficient Estimate’s Probability Distribution Step 2: Apply the relationship between the variances of the coefficient estimate’s and error term’s probability distributions to estimate the variance of the coefficient estimate’s probability distribution: Step 1: Estimate the variance of the error term’s probability distribution from the available information – information from the first quiz: Strategy: Two Steps EstVar[e] EstVar[b x ] = Var[b x ] = EstVar[e] When Clint was faced with a similar problem before, what did he do? Econometrician’s Philosophy: If you lack the information to determine the value directly, estimate the value to the best of your ability using the information you do have. What information does Clint have? Information from Professor Lord’s first quiz. First Quiz Student x y 1 5 66 2 15 87 3 25 90

4 Step 1: Estimating the Variance of the Error Term’s Probability Distribution Relative Frequency Interpretation of Probability: After many, many repetitions of the experiment, the distribution of the numerical values from the experiments mirrors the random variable’s probability distribution; the two distributions are identical: Distribution of the Numerical Values  After many, many repetitions Probability Distribution Variance of the Numerical Values  Variance of Probability Distribution Applying this to the variance: We shall use simulations to assess these attempts by exploiting the relative frequency interpretation of probability: Variance of the error term’s numerical values from the first quiz. Variance of the residual’s numerical values from the first quiz “Adjusted” variance of the residual’s numerical values from the first quiz Preview: While the first two attempts fail for different reasons, they provide the motivation for the third attempt which succeeds. Three Attempts to Estimate the Variance of the Error Term’s Probability Distribution

5 Estimating Var[e], Var[Clint’s 3 Error Terms] – 1st Try Error term represents random influences: Mean[e] = 0. Calculate the variance of the three error terms that were observed on the first quiz; Strategy: Use the variance of the three error terms from Professor Lord’s first quiz to estimate the variance of the error term’s probability distribution. y t =  Const +  x x t + e t  e t = y t  (  Const +  x x t ) First Quiz Student x t y t  Const = 50  x = 2  Const +  x x t = 50 + 2x t 50 + 2  5 = 60 50 + 2  15 = 80 50 + 2  25 = 100 e t = y t  (  Const +  x x t ) e t = y t  (50 + 2x t ) 66  60 = 6 87  80 = 7 90  100 =  10 6 2 = 36 7 2 = 49  10 2 = 100 SSE = 185 1 5 66 2 15 87 3 25 90 Compute the deviations from the mean. Var[e 1, e 2, and e 3 1 st Quiz] Square the deviations.Calculate the average. Question: As a consequence of random influences, can we expect the variance of the numerical values from one repetition, the first quiz, to equal the actual variance of the error term’s probability distribution? No What can we hope for then? We can hope that this procedure is unbiased; we can hope that the procedure does not systematically underestimate or overestimate the actual variance.

6 Does the error term represent a random influence? Does the simulation represent the variance of the error term’s probability distribution accurately? Is the estimation procedure for the variance of the error term’s probability distribution unbiased? Lab 7.1  Lab 7.1

7 Is the estimation procedure for the variance of the error term’s probability distribution unbiased? Mean (Average) of the Estimates Actual for the Variance of the Error Term’s Var[e] Repetitions Probability Distribution 500 200 50 >1,000,000  500  200 .50 Question: What is the best we can hope for? Answer: We can hope that this procedure is unbiased; we can hope that the procedure does not systematically underestimate or overestimate the actual variance. Question: How can we determine whether or not the estimation procedure for variance of the error term’s probability distribution unbiased? Answer:Exploit the relative frequency interpretation of probability: Compare the actual variance of the error term’s probability distribution and the mean (average) of the variance estimates after many, many repetitions. Observations: Can we expect the estimate to equal the actual value?No. In fact, we can be all but certain that the estimate will not equal the actual value. Sometimes the estimate is less than the actual value and sometimes it is greater. We cannot predict the value of the estimate for the variance of the error term’s probability distribution beforehand even when we know the actual value of the variance. The estimate is a random variable.  Lab 7.1

8 Estimating Var[e], Var[Clint’s 3 Error Terms] – 1st Try Error term represents random influences: Mean[e] = 0. Calculate the variance of the three error terms that were observed on first quiz; Strategy: Use the variance of the three error terms from Professor Lord’s first quiz to estimate the variance of the error term’s probability distribution. y t =  Const +  x x t + e t  e t = y t  (  Const +  x x t ) First Quiz Student x y  Const = 50  x = 2  Const +  x x = 50 + 2x 50 + 2  5 = 60 50 + 2  15 = 80 50 + 2  25 = 100 e t = y t  (  Const +  x x t ) e t 1 st Quiz 66  60 = 6 87  80 = 7 90  100 =  10 e 2 1 st Quiz 6 2 = 36 7 2 = 49  10 2 = 100 SSE = 185 1 5 66 2 15 87 3 25 90 Compute the deviations from the mean.Square the deviations.Calculate the average. But we used the actual constant and coefficient,  Const and  x, to calculate the errors. Bad news: It does not help Clint. Clint does not know the values of  Const or  x. Good news: This procedure is unbiased. Despite the bad news, keep the good news in mind. Var[e 1, e 2, and e 3 1 st Quiz]

9 Sum of Squared Errors (SSE) Versus Sum of Squared Residuals (SSR) Sum of Squared Errors (SSE)  Based on the value of the error terms  y t =  Const +  x x t + e t  e t = y t  (  Const +  x x t ) Sum of Squared Residuals (SSR)  Based on the value of the residuals  Need the actual constant and coefficient,  Const and  x, calculate the sum of squared errors.  But,  Const and  x are unobservable; that is the whole problem. Clint cannot calculate the sum of squared errors.  Use the OLS procedure to calculate the estimates of the constant and coefficient, b Const and b x.  Clint can calculate the sum of squared residuals. Strategy:We just showed in our simulations that the sum of squared errors, is an unbiased estimation procedure for the variance of the error term’s probability distribution. Clint cannot calculate the sum of squared errors, however. Perhaps Clint can use the sum of squared residuals instead. Econometrician’s Philosophy: If you lack the information to determine the value directly, estimate the value to the best of your ability using the information you do have. We can think of an observation’s residual as an estimate of its error term.  Res t = y t  Esty t where Esty t = b Const + b x x  Res t = y t  (b Const + b x x)

10 Estimating Var[e], Var[Clint’s 3 Residuals] – 2nd Try First Quiz Student x t y t 66  69 =  3 87  81 = 6 90  93 =  3  3 2 = 9 6 2 = 36  3 2 = 9 SSR = 54 1 5 66 2 15 87 3 25 90 Clint uses the estimated constant and coefficient to calculate the “estimated” error terms, the residuals. Good news: Clint has the information to perform these calculations. Bad news: This procedure is biased. It systematically underestimates the variance.  Lab 7.2 Res t = y t  Esty t = y t  (b Const + b x x) Var[Res 1, Res 2, and Res 3 ] Mean[Res] = Mean[Res 1, Res 2, and Res 3 1 st Quiz] = Res 1 + Res 2 + Res 3 3 Var[Res 1, Res 2, and Res 3 1 st Quiz] Question: Is the procedure is unbiased?No In fact, we can prove that the mean of the residuals must equal 0.

11 Why Is Our Second Attempt Biased? Question: How were b Const and b x chosen? Answer: To minimize the sum. SSR < SSE We can be all but certain that b Const  Const and b x  x.  Unbiased  Systematically underestimates the variance The estimation procedure based on the SSE’s is unbiased. How do SSE and SSR differ?  Const and  x versus b Const and b x. Sum using b’s Sum using  ’s < SSE SSR =  Lab 7.3 Error: e t = y t  (  Const +  x x t ) Residual: Res t = y t  (b Const + b x x t ) Var[e 1, e 2, and e 3 1 st Quiz] Var[Res 1, Res 2, and Res 3 1 st Quiz] = Var[e 1, e 2, and e 3 1 st Quiz]Var[Res 1, Res 2, and Res 3 1 st Quiz] = [y 1  (b Const + b x x 1 )] 2 + [y 2  (b Const + b x x 2 )] 2 + [y 3  (b Const + b x x 3 )] 2 = [y 1  (  Const +  x x 1 )] 2 + [y 2  (  Const +  x x 2 )] 2 + [y 3  (  Const +  x x 3 )] 2 < < The estimation procedure based on the SSR’s is biased downward.  Biased downward When the actual constant and coefficient are used, the procedure is unbiased.

12 66  69 =  3 87  81 = 6 90  93 =  3  3 2 = 9 6 2 = 36  3 2 = 9 SSR = 54 1 5 66 2 15 87 3 25 90 Estimating Var[e], AdjVar[Clint’s 3 Residuals] – 3rd Try Good news: Clint can perform to this calculation. Number of Degrees of Freedom = Sample Size  Estimated Parameters = 3  2 = 1 From before: Question: Is the procedure is unbiased? Good news: The procedure is unbiased.  Lab 7.4 Yes First Quiz: Student x y NB: We shall postpone our discussion of degrees of freedom for a few minutes.

13 Clint’s Strategy To Estimate the Variance of the Coefficient Estimate’s Probability Distribution Step 2: Apply the relationship between the variances of the coefficient estimate’s and error term’s probability distributions to estimate the variance of the coefficient estimate’s probability distribution: Step 1: Estimate the variance of the error term’s probability distribution from the available information – information from the first quiz: = 54 200 =.27 The square root of the estimated variance is called the standard error. =.5196 = 54 SSR Degrees of Freedom = 54 1 = What can we hope to be able to say about the estimation procedure for the variance of the coefficient estimate’s probability distribution? We can hope that this procedure is unbiased also; that is, we can hope that the procedure does not systematically underestimate or overestimate the actual variance of the coefficient estimate’s probability distribution What can we say about the estimation procedure for the variance of the error term’s probability distribution? It is unbiased. EstVar[e] Var[b x ] = EstVar[b x ] = EstVar[e] x’s: x 1 = 5 x 2 = 15 x 3 = 25 = (-10) 2 + 0 2 + 10 2 = 100 + 0 + 100= 200 = 15

14 Is the estimation procedure for the variance of the coefficient estimate’s probability distribution unbiased?

15 Variance of the Coefficient Mean (Average) of the Estimates Actual Estimate’s Probability for the Variance of the Coefficient Var[e] Distribution: Var[b x ] Estimate’s Probability Distribution 500 200 50  Lab 7.5 = 200 500 = 2.5 20050 = 1.0=.25 2.5 1.0.25  2.5  1.0 .25

16 Degrees of Freedom Attempt 2: We divided by the sample size: Error terms  e t = y t  (  Const +  x x t ) Residuals  Res t = y t  (b Const + b x x t ) Since the residuals are the “estimated errors,” it seems natural to divide the sum of squared residuals by the sample size, 3 in Clint’s case. But this procedure proved to be biased; it systematically underestimates the actual variance. Attempt 3: We divided by the degrees of freedom rather than the sample size: Recall Attempts 2 and 3 to estimate the variance of the error terms probability distribution. Think of the residuals are the estimated errors. Var[Res 1, Res 2, and Res 3 ] Since Mean[Res] = 0: AdjVar[Res 1, Res 2, and Res 3 ] Degrees of Freedom = Sample Size  Number of Estimated Parameters= 3  2 = 1 Dividing by the degrees of freedom rather than the sample size solves the bias problem. The modified procedure proved to be unbiased. Strategy: Use the variance of the residuals (“estimated errors”) to estimate the variance of the error term’s probability distribution. Question: Why does dividing by the sample size fail, but dividing by the degrees of freedom succeeed? Question: Why does dividing by 1 rather than 3 work?

17 How Do We Calculate an Average? Monthly Precipitation in Amherst, Massachusetts during the 20 th Century Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1901 2.09 0.56 5.66 5.80 5.12 0.75 3.77 5.75 3.67 4.17 1.30 8.51 1902 2.13 3.32 5.47 2.92 2.42 4.54 4.66 4.65 5.83 5.59 1.27 4.27 1903 3.28 4.27 6.40 2.30 0.48 7.79 4.64 4.92 1.66 2.72 2.04 3.95 2000 3.00 3.40 3.82 4.14 4.26 7.99 6.88 5.40 5.36 2.29 2.83 4.24 Mean (Average) for June =.75 + 4.54 + 7.79 + … + 7.99 100 = 377.76 100 = 3.78 Each of the 100 Junes in the twentieth century provide one piece of information in calculating the average. Consequently, to calculate an average we divide the sum by the number of pieces of information. Hence, to calculate the average of the squared deviations, the variance, we must divide by the number of pieces of information. Key Principle: To calculate an average we divide the sum by the number of pieces of information. Mean (Average) = Sum Number of Pieces of Information Claim: The degrees of freedom equal the number of pieces of information that are available to estimate the variance of the error term’s probability distribution.

18 Question: Why does subtracting 2 from the sample size make sense? Suppose that the sample size were 2.With only two observations we only have two points. Consequently, the two residuals, “estimated errors,” for each observation must always equal 0 when the sample size is 2 regardless of what the variance of the error term’s probability distribution actually equals: Do the first two residuals provide information about the variance of the error term’s probability distribution? Which observation provides the first piece of information about the variance of the error term’s probability distribution? The first two observations provide no information about the variance. Consequently, when the sample size is 3 we should divide by 1 to calculate the “average” of the squared deviations because we really only have 1 piece of information. In general, we should divide by the Degrees of Freedom: Key principle: To calculate the average divide by the number of pieces of information. Res 1 = 0 and Res 2 = 0 No 3 rd The best fitting line passes directly through each of the two points The third observation provides the first piece of information about the variance. Sample Size  Number of Estimated Parameters

19 Dependent Variable: y Explanatory Variable(s):EstimateSEt-StatisticProb x 1.2000000.5196152.3094010.2601 Const 63.000008.8741207.0992960.0891 Number of Observations3 Sum Squared Residuals54.00000 SE of Regression7.348469 Estimated Equation:Esty = 63 + 1.2x OLS Estimation Procedure and the Regression Printout The ordinary least squares (OLS) estimation procedure actually includes three procedures: A Procedure to Estimate the Value of the Parameters A Procedure to Estimate the Variance of the Error Term’s Probability Distribution A Procedure to Estimate the Variance of the Coefficient Estimate’s Probability Distribution  EViews Good News: When the standard ordinary least squares (OLS) premises are satisfied: Each of the three procedures is unbiased. The procedure to estimate the value of the parameters is the best linear unbiased estimation procedure.


Download ppt "Lecture 7 Preview: Estimating the Variance of an Estimate’s Probability Distribution Review: Ordinary Least Squares (OLS) Estimation Procedure Importance."

Similar presentations


Ads by Google