Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1

OMITTED VARIABLES 2

3 By the end of Stats II you should be able to interpret regression coefficients in a linear model. You are looking ONLY at direct effects. As a result, you think about the world like this: The true causal model is much more complicated. There are important implications for how we build a program evaluation and interpret the data. We aim to identify an unbiased estimate of the effects of a single policy variable. Understanding bias in regression MCAT GPA SAT IQ income MCAT GPA SAT IQ income

4 Case 1 CSTS TQ CSTS SES Case 2 TS TQCS TS SES CS

A note on how I use terms in this section: 5 “Full Model”, i.e. the “truth”. The slopes will be correct because we have all of the variables included, therefore we use Greek letters. “Naive Model” - We are missing variables and therefore we do NOT know if the slopes are correct. They represent our best guess. They may contain bias. We use Latin characters to denote this. You are used to thinking in terms of population statistics and sample. In regressions, you can have the entire population in your sample, but if you are missing variables in your regression then your slopes will be wrong. To map concepts, when I say “full model” think population statistic (the truth), and when I say “naïve model” think sample statistic (the best guess). SES

6 Full model (the “truth”): test scores regressed on class size, socio-economic status, and teacher quality. Class size is the policy variable, meaning it is the input into a policy process and the one we care about getting right. It is statistically significant. Is it practically significant? Recall that multi- billion dollar policy decisions are being based upon this estimate. Class size and academic performance

7 Assume that Model 5 is the “full model” – it is all of the relevant information in the world. Now we can see what happens if we happened to omit important variables from the model. How does the slope of class size change? Why does the significance level change? How do omitted variables affect regression results? SES omitted TQ omitted SES & TQ omitted “Policy variable” Full Model

8 How do omitted variables affect regression results? SES omitted TQ omitted SES & TQ omitted Full Model Bias is the difference between the “truth” (Model 5 in this case) and what we would get if we ran a naïve regression (Model 1 here). Note that the bias can be quite large. We overestimate the impact of our program by 51% !

Some examples Class size versus SES Institutions and Geography (Sachs vs. Rodrick) Is it drugs or environment that hurts developing babies? 9

WHY DOES THIS HAPPEN? 10

Omitted Variable Bias

12 The slope does not change significantly as a result of adding a non-correlated control variable. As a result, omitting this variable would NOT bias the results. Adding the variable, however, increases precision of the estimates (the standard error decreases by a factor of seven). How do omitted variables affect regression results? Test TQCS

13 Note from the correlation matrix on the right that teacher quality has very low correlation with classroom size and SES. As a result, there is almost no omitted variable bias when this variable is left out of the model. SES and class size are highly correlated, though, so omitting one of these variables has a large impact on the slope estimate for the other. Why is this? The correlation of the independent variables affects omitted variable bias

Calculation of bias: Case 1 14

15 Calculation of bias: Case 1

Calculation of bias: Case 2 16 ???

Case 1: Omitted variable correlated with regressors In this case, the omitted variable X2 is correlated with the policy variable X1. There is shared co-variance, represented by the region B. This is the region that is discarded as part of the regression procedure The naïve slope, b 1, and the full-model slope, B 1, will now be different because of the exclusion of the region B. The naïve model will be biased as a result of omitting X2. 17 Y X2 X1 B A C

Case 2: Omitted variable uncorrelated with regressors In this case, the omitted variable X2 is uncorrelated with the policy variable X1. There is no overlap in the Venn Diagram. Since the naïve slope, b 1, and the full- model slope, B 1, are the same, there is no bias that results from omitting X2. Y X2 X1 A C 18

Y X1 X2 Y X1 B A C Path Diagram Case 1: Omitted Variable Correlated with Regressors

Y X1 X2 Y X1 A C Path Diagram Case 2: Omitted Variable Uncorrelated with Regressors

EXAMPLE: OVB 21

Class Example

True Model: What happens when we omit X2?

Calculations Omit MAT

BACK TO THE SIMULATIONS 25

Omitted Variable Bias 26

Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

Similar presentations

Presentation on theme: "Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

Similar presentations

Presentation on theme: "Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1."— Presentation transcript:

Similar presentations

About project

Feedback