Presentation is loading. Please wait.

Presentation is loading. Please wait.

QM222 Class 14 Section D1 Different slopes for the same variable (Chapter 14) Review: Omitted variable bias (Chapter 13.) The bias on a regression coefficient.

Similar presentations


Presentation on theme: "QM222 Class 14 Section D1 Different slopes for the same variable (Chapter 14) Review: Omitted variable bias (Chapter 13.) The bias on a regression coefficient."— Presentation transcript:

1 QM222 Class 14 Section D1 Different slopes for the same variable (Chapter 14) Review: Omitted variable bias (Chapter 13.) The bias on a regression coefficient due to leaving out confounding factors from a Regression QM222 Fall 2016 Section D1

2 One variable with different slopes
QM222 Fall 2015 Section D1

3 Review of simple derivatives
A derivative is the same as a slope. In a line, the slope is always the same. In a curve, the slope changes. The rules of derivatives tell you how to calculate the slope at any point of a curve. We write the derivative as dy/dx instead of the slope ∆Y/∆X QM222 Fall 2015 Section D1

4 Three rules of calculus
1. The derivative (slope) of two terms added together = the derivative of each term added together: Y = A + B where A and B are terms with X in them dY/dX= dA/dX + dB/dX 2. The derivative (slope) of a constant is zero If Y = 5, dY/dX =0 3. If Y = a xb dY/dX = b a x b-1 QM222 Fall 2015 Section D1

5 Examples Y = 25 X2 then dY/dX = 2 · 25 X 2-1 = 50 x Another example combining the three rules is: y = 25 x x then, recalling that x0 = 1, dY/dX = 2 · 25 x · 200 x = 50 x The exponent does not have to be either positive or an integer. Example: Y = 20 X-2.5 then: dY/dX = 2.5 · 20 X = - 50 X-3.5 QM222 Fall 2015 Section D1

6 Now we’re ready for different slopes
QM222 Fall 2015 Section D1

7 Movie dataset Here is a regression of Movie lifetime revenues on Budget and a dummy for if it is a SciFi movie Revenues = Budget SciFi (5.28) (.102) (11.6) (standard errors in parentheses) What does an observation represent in this data set? What do we learn from the standard errors about each coefficient’s significance? What is the slope dRevenues/dSciFi? What is the slope dRevenues/dBudget? Are these results what you expect? QM222 Fall 2015 Section D1

8 Do you think that budget will matter similarly for all types of movies?
Particularly, what do we expect about the coefficient on budget (slope) for SciFi movies (compared to others)? QM222 Fall 2015 Section D1

9 If we think that each budget dollar affects SciFi movies differently…
The simplest way to model this in a regression is: Make an additional variable by multiplying Budget x SciFi Make an additional variable by multiplying Budget x non-SciFi Replace Budget with these two variables (keeping in SciFi) These are called interaction terms. QM222 Fall 2015 Section D1

10 Steps 1 and 2: Replace budget with two new variables Budget x SciFi and Budget x Non-SciFi
gen budgetscifi= budget*scifi gen budgetnonscifi=budget*(1-scifi) QM222 Fall 2015 Section D1

11 What data looks like in a spreadsheet
moviename revenue scifi budget budgetscifi budgetnonscifi The Bridges of Madison County 22 Dead Man Walking 11 Rob Roy 28 Clueless 13.7 Babe 30 Jumanji 100 65 Showgirls 40 Starship Troopers 1 Bad Boys 65.807 23 Event Horizon 60 Jefferson in Paris 14 To Die For 20 Star Trek: Insurrection 70 Sphere 73 Out of Sight 48 Saving Private Ryan 220 Enemy of the State 110 85 The Big Lebowski 15 Lost in Space 80 Mortal Kombat Copycat QM222 Fall 2015 Section D1

12 3. Replace Budget with these two variables (keeping in SciFi)
regress revenues scifi budgetscifi budgetnonscifi You get: revenues = – SciFi budgetscifi budgetnotscifi (5.36) (25.5) (0.352) (0.105) What is the slope drevenues/dbudget? drevenues/dbudget = 2.04 scifi notscifi If it is a scifi movie: Slope drevenues/dbudget = 2.04 (since the last term is 0) If it is not a scifi movie : Slope drevenues/dbudget = 1.04 Each budget dollar is more important if it is a scifi/fantasy movie. Note also: All coefficients are significant. QM222 Fall 2015 Section D1

13 Graph of this model SciFi movies Revenues Other movies Budget
QM222 Fall 2015 Section D1

14 This also allows the effect of being a scifi movie to depend on the budget
From the previous overhead: revenues = – scifi budgetscifi budgetnotscifi What is the slope drevenues/dscifi? drevenues/dscifi = budget So if budget = 100, drevenues/dscifi = *100 = 131.3 Compare to our equation without the “interaction terms”, with : drevenues/dscifi = QM222 Fall 2015 Section D1

15 Review: Omitted Variable Bias
The bias on a regression coefficient due to leaving out confounding factors from a Regression QM222 Fall 2016 Section D1

16 Omitted variable bias in the test and in Assignment 5 – Due Friday at 6pm: Hard copy and online
Multiple Regression- Assignment 5 For any possibly one confounding factor, explain exactly why leaving that variable from the regression is likely to bias the coefficient of your key explanatory variable. In your explanation, predict the sign of the omitted variable bias (when you do not control for that factor) and explain exactly why you expect that sign, using methods and formulas learned from Chapter 13. Run a multiple regression that answers (or begins to answer) your main research question and includes all possibly confounding factors that you can measure.) Using this actual coefficient in this multiple regression, was the sign of bias that you predicted in Q2 correct? If not, explain why not. (1-3 sentences) Explain what you learn from this regression that addresses your main research question, making sure to explain the precise meaning of important coefficients. FOR THE TEST: It might be simpler to run a multiple regression that includes the possibly confounding factor you discussed above. Using this actual coefficient in this multiple regression, was the sign of bias that you predicted in Q2 correct? If not, explain why not. (1-3 sentences) QM222 Fall 2016 Section D1

17 What to do if your data is not yet ready?
I’ll probably give you regressions, but penalize you some. QM222 Fall 2016 Section D1

18 Graphic method with two X variables
Really, both X’s Y price, as in the multiple regression Y = b0 + b1X1 + b2X2 Let’s call this the Full model. Let’s call b1 and b2 the direct effects. QM222 Fall 2016 Section D1

19 The mis-specified or Limited model
However, in the simple (1 X variable) regression, we measure only a (combined) effect of X1 on Y. Call its coefficient c1 Y = c0 + c1X1 Let’s call c1 is the combined effect. QM222 Fall 2016 Section D1

20 The reason that there is a bias on X­1 is that there is a Background Relationship between the X’s
This occurs if X­1 and X2 are correlated. We call this the Background Relationship: QM222 Fall 2016 Section D1

21 Graphic model of omitted variable bias
The effect of X­1 on Y has two channels. The first one is the direct effect b1. The second channel is the indirect effect through X­2. When X­1 changes, X2 also tends to change (a1) This change in X­2 has another effect on Y (b2) QM222 Fall 2016 Section D1

22 If we want the direct effect only
When we include both X­1 and X2 in a multiple regression, we get the coefficient b1 – the direct effect of X­1. QM222 Fall 2016 Section D1

23 Algebraic method for Omitted Variable bias
QM222 Fall 2016 Section D1

24 Y = b0 + X1 + X2 b1 b2 Y = c0 + X1 Y = c0 + [ + ] X1 c1 b1 bias
FULL MODEL IN A MULTIPLE REGRESSION Y = b0 + X X2 MIS-SPECIFIED MODEL WHEN MISSING A VARIABLE Y = c X1 Y = c0 + [ ] X1 b1 b2 c1 b1 bias QM222 Fall 2015 Section D1

25 Y = b0 + b1 X1 + b2 X2 Y = c0 + c1 X1 Y = c0 + [ + ] X1 b1 bias
FULL MODEL IN A MULTIPLE REGRESSION Y = b0 + b1 X1 + b2 X2 MIS-SPECIFIED MODEL WHEN MISSING A VARIABLE Y = c0 + c1 X1 Y = c0 + [ ] X1 b1 bias QM222 Fall 2015 Section D1

26 X2 = a0 + X1 a1 a1 Y = b0 + X1 + X2 Y = b0 + X1 + ( a0 + X1)
To understand the bias, note that there is a relationship between the X’s we call the BACKGROUND MODEL X2 = a X1 Remember the FULL MODEL (MULTIPLE REGRESSION) Y = b X X2 Y = b X ( a X1) Y = (b0+b2a0) + [ ] X1 Y = c [ ] X1 a1 b1 b2 a1 b1 b2 b1 b2 a1 b1 bias QM222 Fall 2015 Section D1

27 Y = c0 + [ + ] X1 Y = c0 + X1 b1 bias c1 MIS-SPECIFIED MODEL
QM222 Fall 2015 Section D1

28 c1 b2 b1 Y = c0 + BEACON bias= -79905 b1=32936
FULL MODEL: Brookline Condos Price = BEACON SIZE Y = b BEACON SIZE MIS-SPECIFIED MODEL: Price = – BEACON Y = c BEACON Y = [ ] BEACON – = b1 b2 c1 b1=32936 bias= QM222 Fall 2015 Section D1

29 Y = 520729 + [ + ] BEACON = 409 so must be very negative
bias= b1=32936 b2 a1 b2 a1 QM222 Fall 2015 Section D1

30 More on Brookline Condo’s
c1 combined effect (negative.) Limited Model: Price = – BEACON Full Model: Price = SIZE BEACON Background relationship: SIZE = 1254 – BEACON c1 = (b1 + b2a1) check =32935+( *409.4) Bias is b2a1 or * which is negative. We are UNDERESTIMATING the direct effect a1 (negative) b1 direct effect (positive.)

31 Summary: Calculating the omitted variable bias
Full model: Y = b0 + b1X1 + b2X2 Background relationship X2 = a0 + a1X1 Re-arranging Y = b0 + b1X1 + b2(a0 + a1X1 Y = (b0 + b2 a0 ) + (b1 + b2a1) X1 This is the limited model intercept c0 slope c1 with the combined effect Y = c c1X1 What we are most interested in the sign of the bias in the slope: We want to measure the direct effect b1 but instead we measure b1 + b2 a1 omitted variable bias = b2 a1 b2 =direct effect of X2 a1 =background relationship between X1 & X2

32 Pay_Program example (t-stats in parentheses)
Regression 1: Score = – 5.68 Pay_Program adjR2=.0175 (93.5) (-3.19) Regression 2: Score = Pay_Program OldScore adjR2=.6687 (6.52) (3.46) (31.68) Regression 3: adjR2=.6727 Score = Pay_Program OldScore – Poverty (7.10) (4.59) (28.97) (-3.05) Which regression gives us the best estimate of causal effect of PAY_PROGRAM. Why?

33 Pay for Performance. You are given the MIS-SPECIFIED MODEL: Score = 61
Pay for Performance. You are given the MIS-SPECIFIED MODEL: Score = Pay_Program You are given a multiple regression that we assume (for now) is the FULL MODEL Score = Pay_Program + OldScore Which regression measures the true effect of the Pay_Program? The FULL MODEL C1= -5.68 b1= 3.73 b2 =.826

34 a1 C1= -5.68 b1= 3.73 b2 =.826 b1= 3.73 bias<<0 b1= 3.73
e.g Pay for Performance. You are given the MIS-SPECIFIED MODEL: Score = Pay_Program You are given a multiple regression that we assume (for now) is the FULL MODEL Score = Pay_Program + OldScore What do we learn about the bias in the the mis-specified model? Y = [ + ] Pay_Program Y = [ + aaaa ] Pay_Program What is the sign of a1 (OldScore = a0 + a1 Pay Program) ?? C1= -5.68 b1= 3.73 b2 =.826 b1= 3.73 bias<<0 b1= 3.73 b2 =.826 a1

35 e.g EDUCATION and IQ. You are given the MIS-SPECIFIED MODEL: Salary = 20,000 + EDUCATION (in years) You know that the FULL MODEL includes both Education & IQ Salary = b0 + EDUCATION + IQ So you know that the mis-specified model has Salary = 20,000 + [ + ] EDUCATION Salary = 20,000 + [ + ] EDUCATION What is the sign of b2? What is the sign of a1 in IQ2 = a0 + a1EDUCATION? b2 surely is positive a1 surely is positive so bias surely positive so b1< 4000 C1=4000 b1 b2 bias b1=?? b1=4000-bias b2 a1


Download ppt "QM222 Class 14 Section D1 Different slopes for the same variable (Chapter 14) Review: Omitted variable bias (Chapter 13.) The bias on a regression coefficient."

Similar presentations


Ads by Google