Download presentation
Presentation is loading. Please wait.
Published byTamsin Ryan Modified over 6 years ago
1
Lecture 8 Preview: Interval Estimates and Hypothesis Testing
Clint’s Assignment: Taking Stock General Properties of the Ordinary Least Squares (OLS) Estimation Procedure Estimate Reliability: Interval Estimate Question Normal Distribution versus the Student t-Distribution: One Last Complication Assessing the Reliability of a Coefficient Estimate: Applying the Student t-Distribution Theory Assessment: Hypothesis Testing Summary: The Ordinary Least Squares (OLS) Estimation Procedure Regression Model and the Role of the Error Term Standard Ordinary Least Squares (OLS) Premises Ordinary Least Squares (OLS) Estimation Procedure: Three Important Parts Value of the coefficient itself Variance of the error term’s probability distribution Variance of the coefficient estimate’s probability distribution Properties of the Ordinary Least Squares (OLS) Estimation Procedure When the Standard Ordinary Least Squares (OLS) Premises Are Met: Each estimation procedure is unbiased. The estimation procedure for the coefficient value is the best linear unbiased estimation procedure (BLUE). Causation versus Correlation
2
Clint’s Assignment: Taking Stock
Theory: Additional studying increases quiz scores. The Model: yt = Const + xxt + et yt = Quiz score xt = Minutes studied et = Error term Const = Points given for showing up x = Points earned for each minute studied Clint wishes to find the values of Const and x? But Const and x are not observable. Clint can never determine the actual values of Const and x. How can he proceed? First Quiz Student x y Ordinary Least Squares (OLS) Estimates Esty = x bConst = 63 = Estimated points given for showing up bx = 1.2 = Estimated points for each minute studied Clint’s Assignment Coefficient Reliability: How reliable is the coefficient estimate, 1.2, calculated from the first quiz? That is, how confident should Clint be that the coefficient estimate, 1.2, will be close to the actual value? Theory Confidence: How much confidence should Clint have in the theory that additional studying increases quiz scores?
3
General Properties of the Ordinary Least Squares (OLS) Estimation Procedure
When the standard ordinary least squares (OLS) premises are satisfied, the following equations describe the coefficient estimate’s probability distribution: Mean[bx] = x Var[bx] = Importance of the Probability Distribution’s Mean (Center) and Variance (Spread) Mean: When the mean of the coefficient estimate’s probability distribution, Mean[bx], equals the actual value of the coefficient, x, the estimation procedure is unbiased. Unbiased does not mean that the estimate will equal the actual value. In fact, we can be all but certain that the estimate will not equal the actual value. Unbiased does mean that the estimation procedure does not systematically underestimate or overestimate the actual coefficient value. Formally, the mean of the estimate’s probability distribution equals the actual value. Furthermore, if the estimate’s probability distribution is symmetric, the chances that the estimate is too high equals the chances that it is too low. Variance: When the estimation procedure for the coefficient value is unbiased, the variance of the estimate’s probability distribution, Var[bx], determines the reliability of the estimate. As the variance decreases, the estimate is more likely to be close to the actual coefficient value. The Problem: But there is a problem here, isn’t there? Variance is unobservable. Solution: Estimate the variance of the error term’s probability distribution. Use the estimate of the variance of the error term’s probability distribution to estimate the variance of the coefficient estimate’s probability distribution.
4
We must estimate its variance.
Coefficient Reliability: How reliable is the coefficient estimate, 1.2, calculated from the first quiz? That is, how confident should Clint be that the coefficient estimate, 1.2, will be close to the actual value? Interval Estimate Question: What is the probability that the coefficient estimate, 1.2, lies within _____ of the actual coefficient value? _____. One Last Complication To use the normal distribution we must know the actual value of the variance (and the standard deviation) of the random variable’s probability distribution. We do not know the actual value of the variance for the coefficient estimate’s probability distribution. We must estimate its variance. We cannot use the normal distribution when dealing with the coefficient estimate. Instead we must use another distribution, the Student t-distribution.
5
The Normal Distribution Versus the Student t-Distribution
Student t-distribution: t equals the number of estimated standard deviations the value lies from the mean. Normal distribution: z equals the number of standard deviations the value lies from the mean. Student t-distribution Normal distribution Standard deviation is not known Standard deviation is known Standard deviation must be estimated Mean When we estimate the standard deviation, we introduce an additional element of uncertainty. Hence, the Student t-distribution is more “spread out” than the normal distribution. The Student t-distribution’s “spread” depends on the degrees of freedom: As the degrees of freedom increase. We have more information. The distribution’s “spread” decreases.
6
The Normal Distribution’s z and the Student t-Distribution’s t
When the standard deviation is known, we use the normal distribution: Value of Random Variable Mean of Random Variable z = Standard Deviation of Random Variable = Number of Standard Deviations from the Mean To calculate probabilities using the normal distribution we need only the value of z. When the standard deviation must be estimated, we must use the t-distribution: Value of Random Variable Mean of Random Variable t = Estimated Standard Deviation of Random Variable = Number of Estimated Standard Deviations (Standard Errors) from the Mean To calculate probabilities using the Student t-distribution we need the the value of t. the degrees of freedom. Coefficient Reliability: How reliable is the coefficient estimate, 1.2, calculated from the first quiz? That is, how confident should Clint be that the coefficient estimate, 1.2, will be ‘close to” the actual value? Interval Estimate Question: What is the probability that the coefficient estimate, 1.20, lies within _____ of the actual coefficient value? _____. 1.50 First Blank: We begin by filling in the first blank, choosing our “close to” value. Suppose that we choose 1.50; Close To Criterion = 1.50 So we write 1.50 in the first blank.
7
Interval Estimate Question: What is the probability that the coefficient estimate, 1.20, lies within _____ of the actual coefficient value? _____. 1.50 .78 Convert 1.50 into standard errors: Second Blank: Calculate the probability. Probability that the estimate lies within 1.50 of the actual value 1.50 Question: Why does the actual value equal the distribution mean? = 2.89 .5196 .78 t = Number of standard errors from the mean Answer: The ordinary least squares (OLS) estimation procedure is unbiased. .11 .11 Left tail: Lab 8.1a 1.50 1.50 Right tail: Lab 8.1b 2.89 SE’s 2.89 SE’s x 1.50 x Actual Value = x t = 2.89 t = 2.89 Dependent Variable: y Explanatory Variable(s): Estimate SE x Const Number of Observations 3 Degrees Number of = Sample Size of Estimated Freedom Parameters = = 1 Probability that the estimate lies within 1.50 of the actual value equals Probability that the estimate lies within 2.89 SE’s of the actual value Between t’s of 2.89 and +2.89 = 1.00 ( ) = 1.00 .22 = .78
8
Theory: Additional studying increases quiz scores.
Clint’s Assignment: Theory Confidence. How much confidence should Clint have in the theory that additional studying increases quiz scores? Theory: Additional studying increases quiz scores. Step 0: Construct a model reflecting the theory to be tested yt = Const + xxt + et yt = Actual quiz score xt = Minutes studied et = Error term Const reflects points given for showing up x reflects points earned for each minute studied First Quiz Student x y The theory suggests that x should be positive. Theory: x > 0. Step 1: Collect data, run the regression, and interpret the estimates Dependent Variable: y Explanatory Variable(s): Estimate SE t-Statistic Prob x 0.2601 Const 0.0891 Number of Observations 3 bConst = Estimate of Const = 63 The estimated equation: Esty = x bx = Estimate of x = 1.2 Interpretation: The regression suggests that students receive 63 points for showing up 1.2 additional points for each additional minute studied Critical Result: The coefficient estimate equals 1.2. The positive sign of the coefficient estimate suggests that additional studying increases quiz scores. This evidence supports our theory.
9
Step 2: Play the cynic and challenge the results; construct the null and alternative hypotheses:
Cynic’s view: Sure, the coefficient estimate was positive, but this result was just “the luck of the draw.” In fact, studying has no impact on quiz scores, the actual coefficient, x, equals 0. H0: x = Cynic is correct: Studying has no impact on a student’s quiz score H1: x > Cynic is incorrect: Additional studying increases quiz scores Lab 8.2 Question: Can we dismiss the cynic’s view as being impossible? No Step 3: Formulate the question to assess the cynic’s view, to assess the null hypothesis. Generic Question: What is the probability that the results would be like those we actually obtained (or even stronger), if the cynic is correct and studying actually has no impact? Specific Question: The regression’s coefficient estimate was 1.2. What is the probability that the coefficient estimate, bx, in one regression would be 1.2 or more, if H0 were true (if the actual coefficient, x, equaled 0)? Answer: Prob[Results IF Cynic Correct] or equivalently Prob[Results IF H0 True] Prob[Results IF H0 True] small Prob[Results IF H0 True] large Unlikely that H0 is true Likely that H0 is true Reject H0 Do not reject H0
10
H0: x = 0 Cynic is correct: Studying has no impact on quiz score
H1: x > Cynic is incorrect: As studying increases, the quiz score increases Step 4: Use the estimation procedure’s general properties to calculate Prob[Results IF H0 True]. Estimate was 1.2: What is the probability that the coefficient estimate in one regression would be 1.2 or more, if H0 were true (if the actual coefficient, x, equaled 0)? OLS estimation procedure unbiased If H0 were true Standard error Number of observations Number of parameters Mean[bx] = x = 0 SE[bx] = .5196 DF = 3 2 = 1 Question: What do we know about the probability distribution of the coefficient estimate, bx? t-distribution Mean = 0 SE = .5196 Lab 8.3 DF = 1 Use the Econometrics Lab. .13 t = 2.309 bx Prob[Results IF H0 True] .13 1.2 Dependent Variable: y Explanatory Variable(s): Estimate SE t-Statistic Prob x 0.2601 Const 0.0891 Number of Observations 3
11
Using Statistical Software to Calculate Prob[Results IF H0 True]
OLS estimator is unbiased Assume cynic is correct Standard Error Number of observations Number of parameters Mean[bx] = x = 0 SE[bx] = .5196 DF = 3 2 = 1 t-Statistic Column: How many standard errors does the coefficient estimate, 1.2, lie from 0? The estimate, 1.2, lies about standard errors from 0. = = = t-Statistic Column Tails Probability: What is the probability that the coefficient estimate, bx, resulting from one regression would will lie at least 1.2 from 0, if the actual coefficient, x, equaled 0? t-distribution Mean = 0 SE = .5196 DF = 1 .2601/2 .2601/2 Tails Probability .26 bx NB: The Prob. Column is based on the premise that the actual coefficient, x, equals 0. 1.2 1.2 1.2 t = 2.309 Tails Probability: Prob Column Dependent Variable: y Explanatory Variable(s): Estimate SE t-Statistic Prob x 0.2601 Const 0.0891 Number of Observations 3
12
We can use the Prob column to calculate Prob[Results IF H0 True]
t-distribution Mean = 0 SE = .5196 DF = 1 .2601/2 .2601/2 bx 1.2 1.2 1.2 Question to Assess Cynic’s View: What is the probability of obtaining a result like the one calculated from the first quiz data (a coefficient estimate, bx, of 1.2 or more), if studying actually has no impact on quiz scores (if the actual coefficient, x, were 0)? t-distribution Mean = 0 SE = .5196 DF = 1 .2601/2 Prob[Results IF H0 True] .13 bx 1.2 Tails Probability = .2601 Dependent Variable: y Explanatory Variable(s): Estimate SE t-Statistic Prob x 0.2601 Const 0.0891 Number of Observations 3
13
Prob[Results IF H0 True] .13
H0: x = Cynic is correct: Studying has no impact on a student’s quiz score H1: x > Cynic is incorrect: As studying increases, quiz score increases Prob[Results IF H0 True] .13 Step 5: Decide on the standard of proof, a significance level The significance level is the dividing line between the probability being small and the probability being large. Prob[Results IF H0 True] Less Than Significance Level Prob[Results IF H0 True] Greater Than Significance Level Prob[Results IF H0 True] small Prob[Results IF H0 True] large Unlikely that H0 is true Likely that H0 is true Reject H0 Do not reject H0 Would we reject H0 at a 1 percent (.01) significance level? No. Would we reject H0 at a 5 percent (.05) significance level? No. Would we reject H0 at a 10 percent (.10) significance level? No. At the “traditional” significance levels, we could not reject the null hypothesis; we cannot reject the notion that studying has no impact on quiz scores.
14
Summary: The Ordinary Least Squares (OLS) Estimation Procedure and Standard Ordinary Lest Squares (OLS) Premises Regression Model: yt = Const + xxt + et Const and x are the parameters yt = Dependent variable xt = Explanatory variable et = Error term Role of the Error Term The error term is a random variable representing random influences: Mean[et] = 0 Standard Ordinary Least Squares (OLS) Premises Error Term Equal Variance Premise: The variance of the error term’s probability distribution for each observation is the same. Error Term/Error Term Independence Premise: The error terms are independent. Explanatory Variable/Error Term Independence Premise: The explanatory variables, the xt’s, and the error terms, the et’s, are not correlated. OLS Estimation Procedure Includes Three Estimation Procedures Good News: When the standard OLS regression premises are met the OLS estimation procedure is BLUE. Good News: When the standard OLS regression premises are met each of these procedures is unbiased. Value of the parameters, Const and x: bx = bConst = Variance of the error term’s probability distribution, Var[e]: SSR EstVar[e] = Degrees of Freedom Variance of the coefficient estimate’s probability distribution, Var[bx]: EstVar[bx] =
15
Causation versus Correlation
Theory: Additional studying increases quiz scores. Step 0: Construct a model reflecting the theory to be tested yt = Const + xxt + et yt = Actual quiz score xt = Minutes studied et = Error term Const reflects points given for showing up x reflects points earned for each minute studied The theory suggests that x should be positive. Theory: x > 0. Increase in studying (xt) Cause and Effect Our model is a causal model. Quiz score to increase (yt) Question: Does causation imply correlation? Yes. If our theory is correct, does knowing the number of minutes a student studies help us to predict his/her quiz score? Yes. If our theory is correct, does knowing a student’s quiz score helps us predict the number of minutes he/she has studied? Yes.
16
Question: Does correlation imply causation?
No. Consider the Twin Cities, Minneapolis and St. Paul. Is rainfall in Minneapolis and St. Paul correlated? Yes. Does knowing whether or not it rains in Minneapolis help us predict whether or not it will rain in St. Paul? Yes. Does knowing whether or not it rains in St. Paul help us predict whether or not it will rain in Minneapolis? Yes. Is there a causal relationship between rainfall in Minneapolis and St. Paul? No. Does rain in Minneapolis cause rain in St. Paul? No. Does rain in St. Paul cause rain in Minneapolis? No. Summary: Causation versus Correlation Causation does imply correlation. Correlation need not imply causation.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.