Addressing Alternative Explanations: Multiple Regression 17.871.

Slides:



Advertisements
Similar presentations
Dummy Variables and Interactions. Dummy Variables What is the the relationship between the % of non-Swiss residents (IV) and discretionary social spending.
Advertisements

Qualitative predictor variables
Sociology 601 Class 24: November 19, 2009 (partial) Review –regression results for spurious & intervening effects –care with sample sizes for comparing.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
Linear Regression: Making Sense of Regression Results
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
INTERPRETATION OF A REGRESSION EQUATION
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Lecture 4 This week’s reading: Ch. 1 Today:
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Multiple Regression Spring Gore Likeability Example Suppose: –Gore’s* likeability is a function of Clinton’s likeability and not directly.
Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
1 Multiple Regression EPP 245/298 Statistical Analysis of Laboratory Data.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
Addressing Alternative Explanations: Multiple Regression Spring 2007.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
1 Michigan.do. 2. * construct new variables;. gen mi=state==26;. * michigan dummy;. gen hike=month>=33;. * treatment period dummy;. gen treatment=hike*mi;
Sociology 601 Class 23: November 17, 2009 Homework #8 Review –spurious, intervening, & interactions effects –stata regression commands & output F-tests.
Interpreting Bi-variate OLS Regression
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
TOBIT ANALYSIS Sometimes the dependent variable in a regression model is subject to a lower limit or an upper limit, or both. Suppose that in the absence.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy classification with more than two categories Original citation:
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
Bivariate Relationships Testing associations (not causation!) Continuous data  Scatter plot (always use first!)  (Pearson) correlation coefficient.
1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.
1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.
Returning to Consumption
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Introduction 1 Panel Data Analysis. And now for… Panel Data! Panel data has both a time series and cross- section component Observe same (eg) people over.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 3: Basic techniques for innovation data analysis. Part II: Introducing regression.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Special topics. Importance of a variable Death penalty example. sum death bd- yv Variable | Obs Mean Std. Dev. Min Max
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
Lecture 5. Linear Models for Correlated Data: Inference.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
1 In the Monte Carlo experiment in the previous sequence we used the rate of unemployment, U, as an instrument for w in the price inflation equation. SIMULTANEOUS.
Multiple Independent Variables POLS 300 Butz. Multivariate Analysis Problem with bivariate analysis in nonexperimental designs: –Spuriousness and Causality.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
QM222 Class 19 Section D1 Tips on your Project
QM222 Class 9 Section A1 Coefficient statistics
QM222 Class 11 Section A1 Multiple Regression
QM222 Class 18 Omitted Variable Bias
QM222 Class 8 Section A1 Using categorical data in regression
QM222 Your regressions and the test
QM222 Class 15 Section D1 Review for test Multicollinearity
Covariance x – x > 0 x (x,y) y – y > 0 y x and y axes.
EPP 245 Statistical Analysis of Laboratory Data
Introduction to Econometrics, 5th edition
Presentation transcript:

Addressing Alternative Explanations: Multiple Regression

Did Clinton hurt Gore example Did Clinton hurt Gore in the 2000 election?  Treatment is not liking Bill Clinton How would you test this?

Bivariate regression of Gore thermometer on Clinton thermometer Clinton thermometer

Did Clinton hurt Gore example What alternative explanations would you need to address? Nonrandom selection into the treatment group (disliking Clinton) from many sources Let’s address one source: party identification How could we do this?  Matching: compare Democrats who like or don’t like Clinton; do the same for Republicans and independents  Multivariate regression: control for partisanship statistically Also called multiple regression, Ordinary Least Squares (OLS) Presentation below is intuitive

Democratic picture

Independent picture

Republican picture

Combined data picture

Combined data picture with regression: bias! Clinton thermometer

Tempting yet wrong normalizations Subtract the Gore therm. from the avg. Gore therm. score Subtract the Clinton therm. from the avg. Clinton therm. score

Combined data picture with “true” regression lines overlaid Clinton thermometer

The Linear Relationship between Three Variables Clinton thermometer Gore thermometer Party ID STATA: reg y x1 x2 reg gore clinton party3

Party ID Gore thermometer Clinton thermometer Y X1X1 X2X2 Clinton Gore Party ID Clinton Gore

Multivariate slope coefficients Clinton effect (on Gore) in bivariate (B) regression Clinton effect (on Gore) in multivariate (M) regression Are Gore and Party ID related? Bivariate estimate: Multivariate estimate: Are Clinton and Party ID related? Clinton Gore Party ID

Gore thermometer Clinton thermometer Y X1X1 X2X2 Clinton Gore Party ID

Gore thermometer Clinton thermometer Y X1X1 X2X2 When does Obviously, when

Genetic predisposition Lung cancer Smoking Y X1X1 X2X2

The Slope Coefficients X 1 is Clinton thermometer, X 2 is PID, and Y is Gore thermometer

The Slope Coefficients More Simply X 1 is Clinton thermometer, X 2 is PID, and Y is Gore thermometer

The Matrix form y1y1 y2y2 … ynyn 1x 1,1 x 2,1 …x k,1 1x 1,2 x 2,2 …x k,2 1………… 1x 1,n x 2,n …x k,n

The Output. reg gore clinton party3 Source | SS df MS Number of obs = F( 2, 1742) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = gore | Coef. Std. Err. t P>|t| [95% Conf. Interval] clinton | party3 | _cons | Interpretation of clinton effect: Holding constant party identification, a one- point increase in the Clinton feeling thermometer is associated with a.51 increase in the Gore thermometer.

Separate regressions Each column shows the coefficients for a separate regression DV: Gore thermometer (1)(2)(3) Intercept Clinton Party N 1745

Is the Clinton effect causal? That is, should we be convinced that negative feelings about Clinton really hurt Gore? No!  The regression analysis has only ruled out linear nonrandom selection on party ID.  Nonrandom selection into the treatment could occur from Variables other than party ID, or Reverse causation, that is, feelings about Gore influencing feelings about Clinton.  Additionally, the regression analysis may not have entirely ruled out nonrandom selection even on party ID because it may have assumed the wrong functional form. E.g., what if nonrandom selection on strong Republican/strong Democrat, but not on weak partisans

Other approaches to addressing confounding effects? Experiments Matching Difference-in-differences designs Others?

Summary: Why we control Address alternative explanations by removing confounding effects Improve efficiency

Why did the Clinton Coefficient change from 0.62 to corr gore clinton party, cov (obs=1745) | gore clinton party gore | clinton | party3 |

The Calculations. corr gore clinton party,cov (obs=1745) | gore clinton party gore | clinton | party3 |

Drinking and Greek Life Example Why is there a correlation between living in a fraternity/sorority house and drinking?  Greek organizations often emphasize social gatherings that have alcohol. The effect is being in the Greek organization itself, not the house.  There’s something about the House environment itself. Example of indicator or dummy variables

Dependent variable: Times Drinking in Past 30 Days

. infix age residence 16 greek 24 screen 102 timespast howmuchpast gpa studying 281 timeshs 325 howmuchhs 326 socializing 283 stwgt_ weight using da3818.dat,clear (14138 observations read). recode timespast30 timeshs (1=0) (2=1.5) (3=4) (4=7.5) (5=14.5) (6=29.5) (7=45) (timespast30: 6571 changes made) (timeshs: changes made). replace timespast30=0 if screen<=3 (4631 real changes made)

. tab timespast30 timespast30 | Freq. Percent Cum | 4, | 2, | 2, | 1, | 1, | | Total | 13,

Key explanatory variables (indicator variables or dummy variables) Live in fraternity/sorority house  Indicator variable (dummy variable)  Coded 1 if live in, 0 otherwise Member of fraternity/sorority  Indicator variable (dummy variable)  Coded 1 if member, 0 otherwise

Three Regressions Dependent variable: number of times drinking in past 30 days Live in frat/sor house (indicator variable) 4.44 (0.35) (0.38) Member of frat/sor (indicator variable) (0.16) 2.44 (0.18) Intercept4.54 (0.56) 4.27 (0.059) 4.27 (0.059) R N13,876 Note: Standard errors in parentheses. Corr. Between living in frat/sor house and being a member of a Greek organization is.42

Interpreting indicator variables Dependent variable: number of times drinking in past 30 days Live in frat/sor house (indicator variable) 4.44 (0.35) (0.38) Member of frat/sor (indicator variable) (0.16) 2.44 (0.18) Intercept4.54 (0.56) 4.27 (0.059) 4.27 (0.059) R N13,876 Col 1 Live in frat/sor house increases number of times drinking in past 30 days by 4.4 Compare to constant: Live in frat/sor house increases number of times drinking from 4.54 to ( =) 8.98 Col 3 Holding constant membership, live in frat/sor house increases number of times drinking by 2.26

Accounting for the total effect Total effect = Direct effect + indirect effect

Accounting for the effects of frat house living and Greek membership on drinking EffectTotalDirectIndirect Member of Greek org (85%) 0.44 (15%) Live in frat/ sor. house (51%) 2.18 (49%)