QM222 Class 18 Omitted Variable Bias

Slides:

Advertisements

Similar presentations

Dummy Variables and Interactions. Dummy Variables What is the the relationship between the % of non-Swiss residents (IV) and discretionary social spending.

Advertisements

Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.16 Original citation: Dougherty, C. (2012) EC220 - Introduction.

Linear Regression: Making Sense of Regression Results

Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.

Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.

1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.

Stat 112: Lecture 9 Notes Homework 3: Due next Thursday

Back to House Prices… Our failure to reject the null hypothesis implies that the housing stock has no effect on prices – Note the phrase “cannot reject”

Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:

EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.

How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.

Addressing Alternative Explanations: Multiple Regression

MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.

EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.

MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.

. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model |

(1)Combine the correlated variables. 1 In this sequence, we look at four possible indirect methods for alleviating a problem of multicollinearity. POSSIBLE.

STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.

RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.

1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE In this sequence we will investigate the consequences of including an irrelevant variable.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.

MGS4020_Minitab.ppt/Jul 14, 2011/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Regression Analysis By Using Minitab Jul.

Stats Methods at IC Lecture 3: Regression.

QM222 Class 19 Section D1 Tips on your Project

QM222 Class 12 Section D1 1. A few Stata things 2

For presentation schedule and makeup test signups, see:

The statistics behind the game

Sit in your permanent seat

F-tests continued.

Chapter 14 Introduction to Multiple Regression

QM222 Class 9 Section A1 Coefficient statistics

QM222 Class 11 Section D1 1. Review and Stata: Time series data, multi-category dummies, etc. (chapters 10,11) 2. Capturing nonlinear relationships (Chapter.

business analytics II ▌appendix – regression performance the R2 

QM222 Class 10 Section D1 1. Goodness of fit -- review 2

QM222 Nov. 7 Section D1 Multicollinearity Regression Tables What to do next on your project QM222 Fall 2016 Section D1.

QM222 Nov. 28 Presentations Some additional tips on the project

assignment 7 solutions ► office networks ► super staffing

Data Analysis Module: Correlation and Regression

QM222 Class 13 Section D1 Omitted variable bias (Chapter 13.)

Review Multiple Regression Multiple-Category Dummy Variables

QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.

QM222 Class 11 Section A1 Multiple Regression

QM222 Class 19 Omitted Variable Bias pt 2 Different slopes for a single variable QM222 Fall 2017 Section A1.

Multiple Regression Analysis and Model Building

QM222 Class 14 Section D1 Different slopes for the same variable (Chapter 14) Review: Omitted variable bias (Chapter 13.) The bias on a regression coefficient.

QM222 Class 9 Section D1 1. Multiple regression – review and in-class exercise 2. Goodness of fit 3. What if your Dependent Variable is an 0/1 Indicator.

QM222 Class 15 Today’s New topic: Time Series

QM222 A1 On tests and projects

QM222 Class 8 Section A1 Using categorical data in regression

QM222 Class 8 Section D1 1. Review: coefficient statistics: standard errors, t-statistics, p-values (chapter 7) 2. Multiple regression 3. Goodness of fit.

QM222 A1 Nov. 27 More tips on writing your projects

The slope, explained variance, residuals

QM222 A1 How to proceed next in your project Multicollinearity

The statistics behind the game

QM222 Class 14 Today’s New topic: What if the Dependent Variable is a Dummy Variable? QM222 Fall 2017 Section A1.

QM222 Your regressions and the test

QM222 Dec. 5 Presentations For presentation schedule, see:

Auto Accidents: What’s responsible?

QM222 Class 15 Section D1 Review for test Multicollinearity

Covariance x – x > 0 x (x,y) y – y > 0 y x and y axes.

EPP 245 Statistical Analysis of Laboratory Data

MGS 3100 Business Analysis Regression Feb 18, 2016

Introduction to Econometrics, 5th edition

Forecasting Plays an important role in many industries

Presentation transcript:

QM222 Class 18 Omitted Variable Bias QM222 Fall 2017 Section A1

To-dos Assignment 5 is due on Monday involves doing multiple regression. Friday in lab – Assignment 5 help. Test 6pm Oct 31 (location TBD) Still not have your Stata data set? Let me help you. This afternoon – for shorter meetings (<=15 minutes) Tomorrow – I have lots of time 3:30 – 6. (Also, 11:30-1:30 for shorter meetings <=15 minutes. If you are still painstakingly collecting data, let me try and speed it up. I do realize that you won’t have Assignment 5 on time – better late and good. (This does not apply to those with their data set). QM222 Fall 2017 Section A1

Today we will… Introduce the idea of omitted variable bias Understand why it occurs Understand how to measure its sign Explain it with a graph QM222 Fall 2017 Section A1

What is omitted/missing variable bias? Omitted (or missing) variable bias is how a coefficient is biased if it picks up the impact of a confounding factor not included in the multiple regression. e.g. If we run a regression: Drownings = b0 + b1 Icecream b1 will pick up the impact of temperature, because temperature is omitted from the regression (yet correlated with both drownings and Icecream) Condo Price = b0 + b1 BeaconSt. b1 will pick up the impact of condo size, because size is omitted from the regression (yet correlated with both condo price and Beacon) QM222 Fall 2017 Section A1

Omitted variable = Possibly confounding factor Why learn again about this? What is new? We have tried to explain the bias due to omitted factors due to intuition. If your intuition about this isn’t so good, this will help you: Measure omitted variable bias. Predict what you think the sign of the bias due to an omitted variable is (which is especially important if you can’t measure the confounding factor). Learn about the correlation between X’s from the omitted variable bias. In your projects, this will help you figure out why coefficients change in multiple regressions when you add variables. Finally, it will be on the test. QM222 Fall 2017 Section A1

So isolating each factor’s effect Multiple regression measures the individual impacts of different factors on Y…. Multiple regression helps us to measure the individual impacts of different factors on our dependent variable Y… Holding the other factors constant So isolating each factor’s effect QM222 Fall 2017 Section A1

Basketball In a previous semester, a student Jonathan Wong wanted to know how injuries affected later basketball performance. He measured basketball performance by Win Share 48 (WS48), a basketball statistic that measures how much a player contributes to winning on average during a 48 minute game. WS48 “takes into account the various things a basketball player does to win or lose a game.” He measure INJURY by whether a basketball player only played part of the previous season (then stopped – presumably because of injury) QM222 Fall 2017 Section A1

First regression: simple 1 explanatory variable . regress WS48 INJURED Source | SS df MS Number of obs = 1,051 -------------+---------------------------------- F(1, 1049) = 40.15 Model | .121434921 1 .121434921 Prob > F = 0.0000 Residual | 3.1730123 1,049 .003024797 R-squared = 0.0369 -------------+---------------------------------- Adj R-squared = 0.0359 Total | 3.29444722 1,050 .003137569 Root MSE = .055 ------------------------------------------------------------------------------ WS48 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- INJURED | -.032542 .0051359 -6.34 0.000 -.0426198 -.0224641 _cons | .1203435 .0018132 66.37 0.000 .1167855 .1239015 Each injury decreases WS48 by .0325 . To put this in perspective, average WS48 is .116, so this is a decrease of almost a third. However, he thought of an obvious confounding factor, Age. Probably, older basketball players perform worse (lower WS48). Probably, older basketball players are more likely to get injured. QM222 Fall 2017 Section A1

Regressions without and with Age Regression (1) WS48= .1203 - .0325 INJURED (66.37) (-6.34) adjRsq=.0359 Regression 2: WS48= .1991 - .0274 INJURED - .00279 Age (18.38) (-5.41) (-7.37) adjRsq=.0826 (t-stats in parentheses) Adding age changed the coefficient on injured, making it a smaller negative. QM222 Fall 2017 Section A1

Omitted variable bias In a simple regression of Y on X1, the coefficient b1 measures the combined effects of: the direct (or often called “causal”) effect of the included variable X1 on Y PLUS an “omitted variable bias” due to factors that were left out (omitted) from the regression. Often we want to measure the direct, causal effect. In this case, the coefficient in the simple regression is biased. QM222 Fall 2017 Section A1

Regressions without and with Age WS48= .1203 - .0325 INJURED (66.37) (-6.34) adjRsq=.0359 Regression 2: WS48= .1991 - .0274 INJURED - .00279 Age (18.38) (-5.41) (-7.37) adjRsq=.0826 (t-stats in parentheses) Putting age in the regression (2) added .051 to the INJURED coefficient (i.e. made it a smaller negative.) The omitted variable bias in Regression (1) was - .051 More generally: Omitted variable bias occurs when: The omitted variable (Age) has an effect on the dependent variable (WS48) AND 2. The omitted variable (Age) is correlated with the explanatory variable of interest (INJURED). QM222 Fall 2017 Section A1

Another example: How does getting more education affect salaries? Let’s say you run this regression: Income = 20,000 + 4000 Education (in years). But, the coefficient 4000 may pick up the fact that more intelligent people have both more education and higher income. If you could add the variable IQ to the regression, the coefficient on education would hold IQ constant, taking out the omitted variable bias. Omitted variable bias occurs because: The omitted variable (IQ) has an effect on the dependent variable (Income) AND 2. The omitted variable (IQ) is correlated with the explanatory variable of interest (Education). QM222 Fall 2017 Section A1

We are going to learn several methods so that you can understand Omitted Variable Bias- today using with graphs Really, both being injured and age affect WS48 as in the multiple regression Y = b0 + b1X1 + b2X2 This is drawn below. Let’s call this the Full model. Let’s call b1 and b2 the direct effects. QM222 Fall 2017 Section A1

The mis-specified or Limited model However, in the simple (1 X variable) regression, we measure only a (combined) effect of injured on price. Call its coefficient c1 Y = c0 + c1X1 Let’s call c1 is the combined effect because it combines the direct effect of X1 and the bias. QM222 Fall 2017 Section A1

The reason that there is an omitted variable bias in the simple regression of Y on X1 is that there is a Background Relationship between the X’s We intuited that there is a relationship between X1 (Injured) and X2 (Age). We call this the Background Relationship: correlate WS48 INJURED Age (obs=1,051) | WS48 INJURED Age -------------+--------------------------- WS48 | 1.0000 INJURED | -0.1920 1.0000 Age | -0.2425 0.1388 1.0000 This background relationship, shown in the graph as a1, is positive. QM222 Fall 2017 Section A1

So if we want the direct effect only We should include both X1 and X2 in a multiple regression, so we get the coefficient b1 – the direct effect of X1. QM222 Fall 2017 Section A1

But in the limited model without an X2 in the regression, The combined effect c1 includes both X1‘s direct effect b1. And the indirect effect (blue arrow) working through X2. i.e. when X1 changes, X2 also tends to change (a1) This change in X2 has another effect on Y (b2) QM222 Fall 2017 Section A1

But in the limited model without an X2 in the regression, The combined effect c1 includes both X1‘s direct effect b1. And the indirect effect (blue arrow) working through X2. i.e. when X1 changes, X2 also tends to change (a1) This change in X2 has another effect on Y (b2) The indirect effect (blue arrow) is the omitted variable bias and its sign is the sign of a1 times the sign of b2 QM222 Fall 2017 Section A1

In the basketball case WS48= .1203 - .0325 INJURED (limited model) WS48= .1991 - .0274 INJURED - .00279 Age (full model) The effect of Injured on WS48 has two channels. The first one is the direct effect b1 (-.0274) The second channel is the indirect effect working through X2.(Age) When X1 (INJURED) changes, X2 (Age) also tends to change (a1) (correlation +.1388) This change in X2 has its own effect on Y (b2) (-.0274) The indirect effect (blue arrow) is the omitted variable bias and its sign is the sign of a1 times the sign of b2 : pos*neg=neg QM222 Fall 2017 Section A1

In-Class exercise (t-stats in parentheses) Regression 1: Score = 61.809 – 5.68 Pay_Program adjR2=.0175 (93.5) (-3.19) Regression 2: Score = 10.80 + 3.73 Pay_Program + 0.826 OldScore adjR2=.6687 (6.52) (3.46) (31.68) QM222 Fall 2017 Section A1

Today we … Introduced the idea of omitted variable bias Understood why it occurs Understood how to measure its sign Explained it with a graph QM222 Fall 2017 Section A1

We will continue discussing omitted variable bias on Monday The algebra More examples Then we will start experiments QM222 Fall 2017 Section A1