Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

Slides:



Advertisements
Similar presentations
Simple Linear Regression and Correlation by Asst. Prof. Dr. Min Aung.
Advertisements

Economics 20 - Prof. Anderson
There are at least three generally recognized sources of endogeneity. (1) Model misspecification or Omitted Variables. (2) Measurement Error.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Correlation and regression Dr. Ghada Abo-Zaid
3.3 Omitted Variable Bias -When a valid variable is excluded, we UNDERSPECIFY THE MODEL and OLS estimates are biased -Consider the true population model:
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
Session 2. Applied Regression -- Prof. Juran2 Outline for Session 2 More Simple Regression –Bottom Part of the Output Hypothesis Testing –Significance.
Elementary Statistics Larson Farber 9 Correlation and Regression.
The Simple Linear Regression Model: Specification and Estimation
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Chapter 4 Multiple Regression.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Interactions in Regression.
Intro to Statistics for the Behavioral Sciences PSYC 1900
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Business Statistics - QBM117 Interval estimation for the slope and y-intercept Hypothesis tests for regression.
Chapter 9 Assessing Studies Based on Multiple Regression.
1Prof. Dr. Rainer Stachuletz Panel Data Methods y it =  0 +  1 x it  k x itk + u it.
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Hypothesis Testing in Linear Regression Analysis
Things that I think are important Chapter 1 Bar graphs, histograms Outliers Mean, median, mode, quartiles of data Variance and standard deviation of.
Regression Method.
Section #6 November 13 th 2009 Regression. First, Review Scatter Plots A scatter plot (x, y) x y A scatter plot is a graph of the ordered pairs (x, y)
Lecture 14 Multiple Regression Model
Regression Review Evaluation Research (8521) Prof. Jesse Lecy Lecture 1 1.
Elementary Statistical Methods André L. Souza, Ph.D. The University of Alabama Lecture 22 Statistical Power.
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Lecture 22 Dustin Lueker.  The sample mean of the difference scores is an estimator for the difference between the population means  We can now use.
7.1 Multiple Regression More than one explanatory/independent variable This makes a slight change to the interpretation of the coefficients This changes.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
1 MF-852 Financial Econometrics Lecture 10 Serial Correlation and Heteroscedasticity Roy J. Epstein Fall 2003.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Elementary Statistics Correlation and Regression.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Six.
10. Basic Regressions with Times Series Data 10.1 The Nature of Time Series Data 10.2 Examples of Time Series Regression Models 10.3 Finite Sample Properties.
Practical Statistics Regression. There are six statistics that will answer 90% of all questions! 1. Descriptive 2. Chi-square 3. Z-tests 4. Comparison.
1Spring 02 Problems in Regression Analysis Heteroscedasticity Violation of the constancy of the variance of the errors. Cross-sectional data Serial Correlation.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Fixed Effects Models Evaluation Research (8521) Prof. Jesse Lecy 1.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
7.4 DV’s and Groups Often it is desirous to know if two different groups follow the same or different regression functions -One way to test this is to.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Scatter Diagrams scatter plot scatter diagram A scatter plot is a graph that may be used to represent the relationship between two variables. Also referred.
Chapter 8: Simple Linear Regression Yang Zhenlin.
5. Consistency We cannot always achieve unbiasedness of estimators. -For example, σhat is not an unbiased estimator of σ -It is only consistent -Where.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Linear Correlation (12.5) In the regression analysis that we have considered so far, we assume that x is a controlled independent variable and Y is an.
Seven Sins of Regression Evaluation Research (8521) Prof. Jesse Lecy Lecture 7 1.
Chapter 4. The Normality Assumption: CLassical Normal Linear Regression Model (CNLRM)
Lecture 6 Feb. 2, 2015 ANNOUNCEMENT: Lab session will go from 4:20-5:20 based on the poll. (The majority indicated that it would not be a problem to chance,
Lecture #25 Tuesday, November 15, 2016 Textbook: 14.1 and 14.3
REGRESSION G&W p
partitioned regression
More on Specification and Data Issues
More on Specification and Data Issues

Chapter 6: MULTIPLE REGRESSION ANALYSIS
Economics 20 - Prof. Anderson
Some issues in multivariate regression
Econometrics Chengyaun yin School of Mathematics SHUFE.
Tutorial 1: Misspecification
Chapter 13 Additional Topics in Regression Analysis
More on Specification and Data Issues
Presentation transcript:

Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1

OMITTED VARIABLES 2

3 By the end of Stats II you should be able to interpret regression coefficients in a linear model. You are looking ONLY at direct effects. As a result, you think about the world like this: The true causal model is much more complicated. There are important implications for how we build a program evaluation and interpret the data. We aim to identify an unbiased estimate of the effects of a single policy variable. Understanding bias in regression MCAT GPA SAT IQ income MCAT GPA SAT IQ income

4 Case 1 CSTS TQ CSTS SES Case 2 TS TQCS TS SES CS

A note on how I use terms in this section: 5 “Full Model”, i.e. the “truth”. The slopes will be correct because we have all of the variables included, therefore we use Greek letters. “Naive Model” - We are missing variables and therefore we do NOT know if the slopes are correct. They represent our best guess. They may contain bias. We use Latin characters to denote this. You are used to thinking in terms of population statistics and sample. In regressions, you can have the entire population in your sample, but if you are missing variables in your regression then your slopes will be wrong. To map concepts, when I say “full model” think population statistic (the truth), and when I say “naïve model” think sample statistic (the best guess). SES

6 Full model (the “truth”): test scores regressed on class size, socio-economic status, and teacher quality. Class size is the policy variable, meaning it is the input into a policy process and the one we care about getting right. It is statistically significant. Is it practically significant? Recall that multi- billion dollar policy decisions are being based upon this estimate. Class size and academic performance

7 Assume that Model 5 is the “full model” – it is all of the relevant information in the world. Now we can see what happens if we happened to omit important variables from the model. How does the slope of class size change? Why does the significance level change? How do omitted variables affect regression results? SES omitted TQ omitted SES & TQ omitted “Policy variable” Full Model

8 How do omitted variables affect regression results? SES omitted TQ omitted SES & TQ omitted Full Model Bias is the difference between the “truth” (Model 5 in this case) and what we would get if we ran a naïve regression (Model 1 here). Note that the bias can be quite large. We overestimate the impact of our program by 51% !

Some examples Class size versus SES Institutions and Geography (Sachs vs. Rodrick) Is it drugs or environment that hurts developing babies? 9

WHY DOES THIS HAPPEN? 10

Omitted Variable Bias

12 The slope does not change significantly as a result of adding a non-correlated control variable. As a result, omitting this variable would NOT bias the results. Adding the variable, however, increases precision of the estimates (the standard error decreases by a factor of seven). How do omitted variables affect regression results? Test TQCS

13 Note from the correlation matrix on the right that teacher quality has very low correlation with classroom size and SES. As a result, there is almost no omitted variable bias when this variable is left out of the model. SES and class size are highly correlated, though, so omitting one of these variables has a large impact on the slope estimate for the other. Why is this? The correlation of the independent variables affects omitted variable bias

Calculation of bias: Case 1 14

15 Calculation of bias: Case 1

Calculation of bias: Case 2 16 ???

Case 1: Omitted variable correlated with regressors In this case, the omitted variable X2 is correlated with the policy variable X1. There is shared co-variance, represented by the region B. This is the region that is discarded as part of the regression procedure The naïve slope, b 1, and the full-model slope, B 1, will now be different because of the exclusion of the region B. The naïve model will be biased as a result of omitting X2. 17 Y X2 X1 B A C

Case 2: Omitted variable uncorrelated with regressors In this case, the omitted variable X2 is uncorrelated with the policy variable X1. There is no overlap in the Venn Diagram. Since the naïve slope, b 1, and the full- model slope, B 1, are the same, there is no bias that results from omitting X2. Y X2 X1 A C 18

Y X1 X2 Y X1 B A C Path Diagram Case 1: Omitted Variable Correlated with Regressors

Y X1 X2 Y X1 A C Path Diagram Case 2: Omitted Variable Uncorrelated with Regressors

EXAMPLE: OVB 21

Class Example

True Model: What happens when we omit X2?

Calculations Omit MAT

BACK TO THE SIMULATIONS 25

Omitted Variable Bias 26