Lecture 7 Preview: Estimating the Variance of an Estimate’s Probability Distribution Review: Ordinary Least Squares (OLS) Estimation Procedure Importance.

Slides:



Advertisements
Similar presentations
Things to do in Lecture 1 Outline basic concepts of causality
Advertisements

Properties of Least Squares Regression Coefficients
Multiple Regression Analysis
Multiple Regression W&W, Chapter 13, 15(3-4). Introduction Multiple regression is an extension of bivariate regression to take into account more than.
The Simple Regression Model
Lecture 20 Preview: Omitted Variables and the Instrumental Variables Estimation Procedure The Ordinary Least Squares Estimation Procedure, Omitted Explanatory.
The Multiple Regression Model.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Lecture 9 Preview: One-Tailed Tests, Two-Tailed Tests, and Logarithms A One-Tailed Hypothesis Test: The Downward Sloping Demand Curve A Two-Tailed Hypothesis.
8. Heteroskedasticity We have already seen that homoskedasticity exists when the error term’s variance, conditional on all x variables, is constant: Homoskedasticity.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Lecture 3 Cameron Kaplan
The Simple Linear Regression Model: Specification and Estimation
2.5 Variances of the OLS Estimators
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Chapter 4 Multiple Regression.
Chapter 11 Multiple Regression.
1 MF-852 Financial Econometrics Lecture 6 Linear Regression I Roy J. Epstein Fall 2003.
Lecture 10 Preview: Multiple Regression Analysis – Introduction Linear Demand Model and the No Money Illusion Theory A Two-Tailed Test: No Money Illusion.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.
Chapter 12 Section 1 Inference for Linear Regression.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
Introduction to Linear Regression and Correlation Analysis
Chapter 13: Inference in Regression
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Econ 3790: Business and Economics Statistics
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Lecture 3 Preview: Interval Estimates and the Central Limit Theorem Review Populations, Samples, Estimation Procedures, and the Estimate’s Probability.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Lecture 11 Preview: Hypothesis Testing and the Wald Test Wald Test Let Statistical Software Do the Work Testing the Significance of the “Entire” Model.
Chapter 2 Ordinary Least Squares Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
6. Simple Regression and OLS Estimation Chapter 6 will expand on concepts introduced in Chapter 5 to cover the following: 1) Estimating parameters using.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Lecture 6 Preview: Ordinary Least Squares Estimation Procedure  The Properties Clint’s Assignment: Assess the Effect of Studying on Quiz Scores General.
Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly Copyright © 2014 by McGraw-Hill Higher Education. All rights.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
1 AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Part II: Theory and Estimation of Regression Models Chapter 5: Simple Regression Theory.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Chapter 4 Variability PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh Edition by Frederick J Gravetter and Larry.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Multiple Regression.
The simple linear regression model and parameter estimation
Chapter 4 Basic Estimation Techniques
6. Simple Regression and OLS Estimation
Revisit Omitted Explanatory Variable Bias
Prediction, Goodness-of-Fit, and Modeling Issues
Lecture 21 Preview: Panel Data
Lecture 8 Preview: Interval Estimates and Hypothesis Testing
Lecture 18 Preview: Explanatory Variable/Error Term Independence Premise, Consistency, and Instrumental Variables Review Regression Model Standard Ordinary.
Lecture 17 Preview: Autocorrelation (Serial Correlation)
Multiple Regression.
Lecture 22 Preview: Simultaneous Equation Models – Introduction
Lecture 16 Preview: Heteroskedasticity
Simple Linear Regression
24/02/11 Tutorial 3 Inferential Statistics, Statistical Modelling & Survey Methods (BS2506) Pairach Piboonrungroj (Champ)
Simple Linear Regression
Best Fitting Line Clint’s Assignment Simple Regression Model
Econometrics Economics 360
MGS 3100 Business Analysis Regression Feb 18, 2016
Regression Models - Introduction
Presentation transcript:

Lecture 7 Preview: Estimating the Variance of an Estimate’s Probability Distribution Review: Ordinary Least Squares (OLS) Estimation Procedure Importance of the Coefficient Estimate’s Probability Distribution General Properties of the Ordinary Least Squares (OLS) Estimation Procedure Step 1: Estimate the Variance of the Error Term’s Probability Distribution Step 2: Use the Estimated Variance of the Error Term’s Probability Distribution to Estimate the Variance of the Coefficient Estimate’s Probability Distribution Degrees of Freedom Estimating the Variance of the Coefficient Estimate’s Probability Distribution First Attempt: Variance of the Error Term’s Numerical Values Second Attempt: Variance of the Residual’s Numerical Values Third Attempt: “Adjusted” Variance of the Residual’s Numerical Values Three Important Parts Regression Printouts Mean (Center) of the Coefficient Estimate’s Probability Distribution Variance (Spread) of the Coefficient Estimate’s Probability Distribution Summary: The Ordinary Least Squares (OLS) Estimation Procedure Value of the Coefficient Variance of the Error Term’s Probability Distribution Variance of the Coefficient Estimate’s Probability Distribution

The Problem: But there is a problem here, isn’t there? We need to know the variance of the error term’s probability distribution to calculate the variance of the coefficient estimate’s probability distribution. Unfortunately, the variance of the error term’s probability distribution is unobservable. In reality, we can never know the variance of the error term’s probability distribution. How can Clint proceed? Importance of the Probability Distribution’s Mean (Center) and Variance (Spread) Mean: When the mean of the coefficient estimate’s probability distribution, Mean[b x ], equals the actual value of the coefficient,  x, the estimation procedure is unbiased. Variance: When the estimation procedure for the coefficient value is unbiased, the variance of the estimate’s probability distribution, Var[b x ], determines the reliability of the estimate. Mean[b x ] =  x Estimation Procedure Is Unbiased Var[b x ] = Determines the Reliability of the Estimate As Var[b x ] Decreases Reliability of b x Increases As the variance decreases, the probability distribution becomes more tightly cropped around the actual value making it more likely for the coefficient estimate to be close to the actual coefficient value. The estimation procedure does not systematically underestimate or overestimate the actual coefficient value. General Properties of the Ordinary Least Squares (OLS) Estimation Procedure When the standard ordinary least squares premises are met, the following equations describe the coefficient estimate’s probability distribution: Probability Distribution of Coefficient Estimates Mean[b x ] =  x Probability Distributions of Coefficient Estimates Mean[b x ] =  x Variance largeVariance small

Clint’s Strategy: Estimating the Variance of the Coefficient Estimate’s Probability Distribution Step 2: Apply the relationship between the variances of the coefficient estimate’s and error term’s probability distributions to estimate the variance of the coefficient estimate’s probability distribution: Step 1: Estimate the variance of the error term’s probability distribution from the available information – information from the first quiz: Strategy: Two Steps EstVar[e] EstVar[b x ] = Var[b x ] = EstVar[e] When Clint was faced with a similar problem before, what did he do? Econometrician’s Philosophy: If you lack the information to determine the value directly, estimate the value to the best of your ability using the information you do have. What information does Clint have? Information from Professor Lord’s first quiz. First Quiz Student x y

Step 1: Estimating the Variance of the Error Term’s Probability Distribution Relative Frequency Interpretation of Probability: After many, many repetitions of the experiment, the distribution of the numerical values from the experiments mirrors the random variable’s probability distribution; the two distributions are identical: Distribution of the Numerical Values  After many, many repetitions Probability Distribution Variance of the Numerical Values  Variance of Probability Distribution Applying this to the variance: We shall use simulations to assess these attempts by exploiting the relative frequency interpretation of probability: Variance of the error term’s numerical values from the first quiz. Variance of the residual’s numerical values from the first quiz “Adjusted” variance of the residual’s numerical values from the first quiz Preview: While the first two attempts fail for different reasons, they provide the motivation for the third attempt which succeeds. Three Attempts to Estimate the Variance of the Error Term’s Probability Distribution

Estimating Var[e], Var[Clint’s 3 Error Terms] – 1st Try Error term represents random influences: Mean[e] = 0. Calculate the variance of the three error terms that were observed on the first quiz; Strategy: Use the variance of the three error terms from Professor Lord’s first quiz to estimate the variance of the error term’s probability distribution. y t =  Const +  x x t + e t  e t = y t  (  Const +  x x t ) First Quiz Student x t y t  Const = 50  x = 2  Const +  x x t = x t  5 =  15 =  25 = 100 e t = y t  (  Const +  x x t ) e t = y t  (50 + 2x t ) 66  60 = 6 87  80 = 7 90  100 =  = = 49  10 2 = 100 SSE = Compute the deviations from the mean. Var[e 1, e 2, and e 3 1 st Quiz] Square the deviations.Calculate the average. Question: As a consequence of random influences, can we expect the variance of the numerical values from one repetition, the first quiz, to equal the actual variance of the error term’s probability distribution? No What can we hope for then? We can hope that this procedure is unbiased; we can hope that the procedure does not systematically underestimate or overestimate the actual variance.

Does the error term represent a random influence? Does the simulation represent the variance of the error term’s probability distribution accurately? Is the estimation procedure for the variance of the error term’s probability distribution unbiased? Lab 7.1  Lab 7.1

Is the estimation procedure for the variance of the error term’s probability distribution unbiased? Mean (Average) of the Estimates Actual for the Variance of the Error Term’s Var[e] Repetitions Probability Distribution >1,000,000  500  200 .50 Question: What is the best we can hope for? Answer: We can hope that this procedure is unbiased; we can hope that the procedure does not systematically underestimate or overestimate the actual variance. Question: How can we determine whether or not the estimation procedure for variance of the error term’s probability distribution unbiased? Answer:Exploit the relative frequency interpretation of probability: Compare the actual variance of the error term’s probability distribution and the mean (average) of the variance estimates after many, many repetitions. Observations: Can we expect the estimate to equal the actual value?No. In fact, we can be all but certain that the estimate will not equal the actual value. Sometimes the estimate is less than the actual value and sometimes it is greater. We cannot predict the value of the estimate for the variance of the error term’s probability distribution beforehand even when we know the actual value of the variance. The estimate is a random variable.  Lab 7.1

Estimating Var[e], Var[Clint’s 3 Error Terms] – 1st Try Error term represents random influences: Mean[e] = 0. Calculate the variance of the three error terms that were observed on first quiz; Strategy: Use the variance of the three error terms from Professor Lord’s first quiz to estimate the variance of the error term’s probability distribution. y t =  Const +  x x t + e t  e t = y t  (  Const +  x x t ) First Quiz Student x y  Const = 50  x = 2  Const +  x x = x  5 =  15 =  25 = 100 e t = y t  (  Const +  x x t ) e t 1 st Quiz 66  60 = 6 87  80 = 7 90  100 =  10 e 2 1 st Quiz 6 2 = = 49  10 2 = 100 SSE = Compute the deviations from the mean.Square the deviations.Calculate the average. But we used the actual constant and coefficient,  Const and  x, to calculate the errors. Bad news: It does not help Clint. Clint does not know the values of  Const or  x. Good news: This procedure is unbiased. Despite the bad news, keep the good news in mind. Var[e 1, e 2, and e 3 1 st Quiz]

Sum of Squared Errors (SSE) Versus Sum of Squared Residuals (SSR) Sum of Squared Errors (SSE)  Based on the value of the error terms  y t =  Const +  x x t + e t  e t = y t  (  Const +  x x t ) Sum of Squared Residuals (SSR)  Based on the value of the residuals  Need the actual constant and coefficient,  Const and  x, calculate the sum of squared errors.  But,  Const and  x are unobservable; that is the whole problem. Clint cannot calculate the sum of squared errors.  Use the OLS procedure to calculate the estimates of the constant and coefficient, b Const and b x.  Clint can calculate the sum of squared residuals. Strategy:We just showed in our simulations that the sum of squared errors, is an unbiased estimation procedure for the variance of the error term’s probability distribution. Clint cannot calculate the sum of squared errors, however. Perhaps Clint can use the sum of squared residuals instead. Econometrician’s Philosophy: If you lack the information to determine the value directly, estimate the value to the best of your ability using the information you do have. We can think of an observation’s residual as an estimate of its error term.  Res t = y t  Esty t where Esty t = b Const + b x x  Res t = y t  (b Const + b x x)

Estimating Var[e], Var[Clint’s 3 Residuals] – 2nd Try First Quiz Student x t y t 66  69 =  3 87  81 = 6 90  93 =  3  3 2 = = 36  3 2 = 9 SSR = Clint uses the estimated constant and coefficient to calculate the “estimated” error terms, the residuals. Good news: Clint has the information to perform these calculations. Bad news: This procedure is biased. It systematically underestimates the variance.  Lab 7.2 Res t = y t  Esty t = y t  (b Const + b x x) Var[Res 1, Res 2, and Res 3 ] Mean[Res] = Mean[Res 1, Res 2, and Res 3 1 st Quiz] = Res 1 + Res 2 + Res 3 3 Var[Res 1, Res 2, and Res 3 1 st Quiz] Question: Is the procedure is unbiased?No In fact, we can prove that the mean of the residuals must equal 0.

Why Is Our Second Attempt Biased? Question: How were b Const and b x chosen? Answer: To minimize the sum. SSR < SSE We can be all but certain that b Const  Const and b x  x.  Unbiased  Systematically underestimates the variance The estimation procedure based on the SSE’s is unbiased. How do SSE and SSR differ?  Const and  x versus b Const and b x. Sum using b’s Sum using  ’s < SSE SSR =  Lab 7.3 Error: e t = y t  (  Const +  x x t ) Residual: Res t = y t  (b Const + b x x t ) Var[e 1, e 2, and e 3 1 st Quiz] Var[Res 1, Res 2, and Res 3 1 st Quiz] = Var[e 1, e 2, and e 3 1 st Quiz]Var[Res 1, Res 2, and Res 3 1 st Quiz] = [y 1  (b Const + b x x 1 )] 2 + [y 2  (b Const + b x x 2 )] 2 + [y 3  (b Const + b x x 3 )] 2 = [y 1  (  Const +  x x 1 )] 2 + [y 2  (  Const +  x x 2 )] 2 + [y 3  (  Const +  x x 3 )] 2 < < The estimation procedure based on the SSR’s is biased downward.  Biased downward When the actual constant and coefficient are used, the procedure is unbiased.

66  69 =  3 87  81 = 6 90  93 =  3  3 2 = = 36  3 2 = 9 SSR = Estimating Var[e], AdjVar[Clint’s 3 Residuals] – 3rd Try Good news: Clint can perform to this calculation. Number of Degrees of Freedom = Sample Size  Estimated Parameters = 3  2 = 1 From before: Question: Is the procedure is unbiased? Good news: The procedure is unbiased.  Lab 7.4 Yes First Quiz: Student x y NB: We shall postpone our discussion of degrees of freedom for a few minutes.

Clint’s Strategy To Estimate the Variance of the Coefficient Estimate’s Probability Distribution Step 2: Apply the relationship between the variances of the coefficient estimate’s and error term’s probability distributions to estimate the variance of the coefficient estimate’s probability distribution: Step 1: Estimate the variance of the error term’s probability distribution from the available information – information from the first quiz: = =.27 The square root of the estimated variance is called the standard error. =.5196 = 54 SSR Degrees of Freedom = 54 1 = What can we hope to be able to say about the estimation procedure for the variance of the coefficient estimate’s probability distribution? We can hope that this procedure is unbiased also; that is, we can hope that the procedure does not systematically underestimate or overestimate the actual variance of the coefficient estimate’s probability distribution What can we say about the estimation procedure for the variance of the error term’s probability distribution? It is unbiased. EstVar[e] Var[b x ] = EstVar[b x ] = EstVar[e] x’s: x 1 = 5 x 2 = 15 x 3 = 25 = (-10) = = 200 = 15

Is the estimation procedure for the variance of the coefficient estimate’s probability distribution unbiased?

Variance of the Coefficient Mean (Average) of the Estimates Actual Estimate’s Probability for the Variance of the Coefficient Var[e] Distribution: Var[b x ] Estimate’s Probability Distribution  Lab 7.5 = = = 1.0=  2.5  1.0 .25

Degrees of Freedom Attempt 2: We divided by the sample size: Error terms  e t = y t  (  Const +  x x t ) Residuals  Res t = y t  (b Const + b x x t ) Since the residuals are the “estimated errors,” it seems natural to divide the sum of squared residuals by the sample size, 3 in Clint’s case. But this procedure proved to be biased; it systematically underestimates the actual variance. Attempt 3: We divided by the degrees of freedom rather than the sample size: Recall Attempts 2 and 3 to estimate the variance of the error terms probability distribution. Think of the residuals are the estimated errors. Var[Res 1, Res 2, and Res 3 ] Since Mean[Res] = 0: AdjVar[Res 1, Res 2, and Res 3 ] Degrees of Freedom = Sample Size  Number of Estimated Parameters= 3  2 = 1 Dividing by the degrees of freedom rather than the sample size solves the bias problem. The modified procedure proved to be unbiased. Strategy: Use the variance of the residuals (“estimated errors”) to estimate the variance of the error term’s probability distribution. Question: Why does dividing by the sample size fail, but dividing by the degrees of freedom succeeed? Question: Why does dividing by 1 rather than 3 work?

How Do We Calculate an Average? Monthly Precipitation in Amherst, Massachusetts during the 20 th Century Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Mean (Average) for June = … = = 3.78 Each of the 100 Junes in the twentieth century provide one piece of information in calculating the average. Consequently, to calculate an average we divide the sum by the number of pieces of information. Hence, to calculate the average of the squared deviations, the variance, we must divide by the number of pieces of information. Key Principle: To calculate an average we divide the sum by the number of pieces of information. Mean (Average) = Sum Number of Pieces of Information Claim: The degrees of freedom equal the number of pieces of information that are available to estimate the variance of the error term’s probability distribution.

Question: Why does subtracting 2 from the sample size make sense? Suppose that the sample size were 2.With only two observations we only have two points. Consequently, the two residuals, “estimated errors,” for each observation must always equal 0 when the sample size is 2 regardless of what the variance of the error term’s probability distribution actually equals: Do the first two residuals provide information about the variance of the error term’s probability distribution? Which observation provides the first piece of information about the variance of the error term’s probability distribution? The first two observations provide no information about the variance. Consequently, when the sample size is 3 we should divide by 1 to calculate the “average” of the squared deviations because we really only have 1 piece of information. In general, we should divide by the Degrees of Freedom: Key principle: To calculate the average divide by the number of pieces of information. Res 1 = 0 and Res 2 = 0 No 3 rd The best fitting line passes directly through each of the two points The third observation provides the first piece of information about the variance. Sample Size  Number of Estimated Parameters

Dependent Variable: y Explanatory Variable(s):EstimateSEt-StatisticProb x Const Number of Observations3 Sum Squared Residuals SE of Regression Estimated Equation:Esty = x OLS Estimation Procedure and the Regression Printout The ordinary least squares (OLS) estimation procedure actually includes three procedures: A Procedure to Estimate the Value of the Parameters A Procedure to Estimate the Variance of the Error Term’s Probability Distribution A Procedure to Estimate the Variance of the Coefficient Estimate’s Probability Distribution  EViews Good News: When the standard ordinary least squares (OLS) premises are satisfied: Each of the three procedures is unbiased. The procedure to estimate the value of the parameters is the best linear unbiased estimation procedure.