Presentation is loading. Please wait.

Presentation is loading. Please wait.

September 1, 2009 Session 2Slide 1 PSC 5940: Regression Review and Questions about “Causality” Session 2 Fall, 2009.

Similar presentations


Presentation on theme: "September 1, 2009 Session 2Slide 1 PSC 5940: Regression Review and Questions about “Causality” Session 2 Fall, 2009."— Presentation transcript:

1 September 1, 2009 Session 2Slide 1 PSC 5940: Regression Review and Questions about “Causality” Session 2 Fall, 2009

2 September 1, 2009 Session 2Slide 2 Data Discussion EE09 & NS09 Data: research ideas? Fixing data in Excel: EE09 –NA replacement –Text to numeric (e28_gcc) –Getting rid of extraneous characters $ in “random_p” EE and partisanship –Loading and attaching the data –Examining party identification (“e216_par”) –Examining gender (“e3_gender”) Dealing with awkward names and NA values

3 September 1, 2009 Session 2Slide 3 Deterministic Linear Models Theoretical Model: –    and    are constant terms   is the intercept   is the slope – X i is a predictor of Y i  a b XiXi YiYi

4 September 1, 2009 Session 2Slide 4 Stochastic Linear Models E[Y i ] =    +   Xi – – Variation in Y is caused by more than X: error (  i ) So:

5 September 1, 2009 Session 2Slide 5 Assumptions Necessary for Estimating Linear Models 1.Errors have identical distributions Zero mean, same variance, across the range of X 2.Errors are independent of X and other  i 3.Errors are normally distributed   i =0 X

6 September 1, 2009 Session 2Slide 6 Normal, Independent & Identical  i Distributions (“Normal iid”) Y X Problem: We don’t know: a) if error assumptions hold true; b) values for  0 and  1 Solution: Estimate ‘em!

7 September 1, 2009 Session 2Slide 7 OLS Derivation of b 0 Use partial derivation in this step:

8 September 1, 2009 Session 2Slide 8 Derivation of b 0, step 2

9 September 1, 2009 Session 2Slide 9 Derivation of b 1 Step 1: Multiply out e 2

10 September 1, 2009 Session 2Slide 10 Derivation of b 1 Step 2: Differentiate w.r.t. b 1

11 September 1, 2009 Session 2Slide 11 Derivation of b 1 Step 3: Substitute for b 0

12 September 1, 2009 Session 2Slide 12 Derivation of b 1 Step 4: Simplify and Isolate b 1

13 September 1, 2009 Session 2Slide 13 Calculating b 0 and b 1 The formula for b 1 and b 0 allow you (or preferably your computer) to calculate the error-minimizing slope and intercept for any data set representing a bi-variate, linear relationship. No other line, using the same data, will result in a smaller a squared-error (e 2 ). OLS gives best fit.

14 September 1, 2009 Session 2Slide 14 Interpreting b 1 and b 0 For each 1-unit increase in X, you get b 1 units change in Y When X is zero, Y will be equal to b 0. Note that a regression model with no independent variables is simply the mean.

15 September 1, 2009 Session 2Slide 15 Theoretical Specification of Multivariate Regression

16 Regression in Matrix Form Assume a model using n observations, with K-1 X i (independent) variables

17 Regression in Matrix Form Note: we can’t uniquely define (X’X) -1 if any column in the X matrix is a linear function of any other column(s) in X.

18 The X’X Matrix Note that you can obtain the basis for all the necessary means, variances and covariances among the Xs from the (X’X) matrix

19 An Example of Matrix Regression Using a sample of 7 observations, where X has Elements {X 0, X 1, X 2, X 3 }

20 September 1, 2009 Session 2Slide 20 Summary of OLS Assumption Failures and their Implications ProblemBiased bBiased SEInvalid t/FHi Var Non-linear YesYesYes--- Omit relev. X YesYesYes--- Irrel X NoNoNoYes X meas. Error YesYesYes--- Heterosced. NoYes YesYes Autocorr. NoYes YesYes X corr. error YesYes Yes--- Non-normal err. NoNoYesYes Multicolinearity NoNoNoYes

21 September 1, 2009 Session 2Slide 21 BREAK

22 September 1, 2009 Session 2Slide 22 Causality and Experiments X2X2 Y Number of Fire Trucks Number of Fire Deaths Question: What is the relationship between the number of fire trucks at the scene of a fire, and the number of deaths caused by that fire? Experimental approach: Randomly assign fire incidents to different categories, which receive different numbers of trucks (treatment).

23 September 1, 2009 Session 2Slide 23 Causality and Observational Data The problem of spurious relations... X2X2 X1X1 Y Number of Fire Trucks Number of Fire Deaths Size of Fire In an experimental design, we fully control for spurious relationships. With OLS we try to manage them statistically.

24 September 1, 2009 Session 2Slide 24 Statistical Calculation of Partial Effects In calculating the effect of X 1 on Y, we remove the effect of the other X’s on both X 1 and Y: The use of residuals “cleans” both Y and X 1 of their correlations with X 2, permitting estimation PRCs. Y stripped of the effect of X 2 X 1 stripped of the effect of X 2

25 September 1, 2009 Session 2Slide 25 Intuition of PRC’s All overlapping variance is stripped Highly correlated IVs are problematic –But what if the overlap is important? What if X 1 and X 2 are really part of some larger construct? –The case of knowledge, efficacy and behavior –Kelstet et al How should we interpret the PRC’s in this case?

26 September 1, 2009 Session 2Slide 26 Workshop Load EE data Run a simple model: Willingness to pay for an alternative energy tax Use randomly assigned cost as IV Plot to relationship (use jitter) Now add: Income, Ideology Change in cost variable? (Why?)

27 September 1, 2009 Session 2Slide 27 Homework Generate and analyze the residuals Add to the model: Belief in anthropogenic climate change Will require recodes Understanding of GCC science Recode “What scientists’ believe…” variables 1 page summary of findings for class next week Next Extension: Modeling Dummies and Interactions


Download ppt "September 1, 2009 Session 2Slide 1 PSC 5940: Regression Review and Questions about “Causality” Session 2 Fall, 2009."

Similar presentations


Ads by Google