Lecture Slide #1 Logistic Regression Analysis Estimation and Interpretation Hypothesis Tests Interpretation Reversing Logits: Probabilities –Averages –Types –Estimating variable influence
Lecture Slide #2 Logit Assumptions and Qualifiers The model is correctly specified –True conditional probabilities are logistic function of the X’s –No important X’s omitted; no extraneous X’s included –No significant measurement error The cases are independent No X is a linear function of other X’s –Multicollinearity leads to imprecision Influential cases can bias estimates Sample size: n-K should exceed 100 –Independent covariation is critical
Lecture Slide #3 Logit Hypothesis Tests Nested Model Tests (like F-Tests in OLS) –Is a more complex model a better fit? Test to see if parameters for omitted variables are statistically indistinguishable from zero: Where the Chi-square table uses K degrees of freedom. If p < 0.05, the complex model fits significantly better
Lecture Slide #4 More Logit Hypothesis Tests To test for the overall hypothesis that all b’s are equal to zero (like an overall F-test): –Compare the final log-likelihood with the initial one, using the same formula: Initial log likelihood = Final log likelihood = Difference = = 46.85, df=K-1; p-value > (see Hamilton p. 354)
Lecture Slide #5 Still More Logit Hypothesis Tests z-statistic: –Similar to the t-stat in OLS –Compares the estimated coefficient to the estimated standard error –P-value is derived from the Chi-Square distribution Attached to each estimated coefficient –The p-value shows probability that the null hypothesis is correct, given the data
Lecture Slide #6 Interpreting Logits Logits can be used to directly calculate odds: Logits can be reversed to obtain the predicted probabilities:
Lecture Slide #7 Interpreting Logits, Continued How would you calculate the effect of a particular independent variable, X i, on the probability of Y = 1? Set all X j ’s at their mean, then calculate With X i at it’s minimum and maximum. Then calculate the difference.
Lecture Slide #8 Reversing Logits in Stata: Illustrating the Effect of Certainty First: calculate mean values for the “control” variables (ideology and sex) sum ideology sex Use the mean values to generate L generate L1= _b[_cons] + _b[ideology]* _b[sex]* _b[DR_cert]*DR_cert Next: calculate the anti-log of L generate Phat1=1/(1 + exp(-L1)) Now graph the relationship graph twoway mspline Phat1 DR_cert, bands(50)
Lecture Slide #9 Estimated Probability Effects
Lecture Slide #10 Interpreting Logits, Continued How would you calculate the effect of a particular combination of independent variables on the probability of Y=1? Set all X j ’s at the appropriate values, then calculate (e= ) The result is the average probability for that “type” of respondents
Lecture Slide #11 Example: Effect of ideology, gender on probability of choosing the Linear model for standard setting Model: choice (DR_standard) as a function of: –Ideology, gender and certainty Types –A=conservative male; B=liberal female –Set certainty at the average –A: conservative, male, average level of certainty Ideology=7, gender = 1, certainty=6.289 –B: liberal, female, average level of certainty Ideology=1, gender = 0, certainty= =chose threshold, 1=choose linear
Lecture Slide #12 Logit Model Results L= *DR_cert *ideology *sex
Lecture Slide #13 Analyzing Types L = ( *(ideology)) + ( *(sex)) + ( *(certainty)) L Probability Conservative Males: (indep. vars.: 7; 1; ) Liberal Females: (indep. vars.: 1; 0; ) Hint: Use a spreadsheet to calculate L and P. In Excel, the formula for probability would be: P = 1/(1+EXP(-L)) Example from Scientist data
Lecture Slide #14 Estimates of Coefficient Strength In Excel, calculate the difference in probability for each X at its min and max, holding all other variables constant:
Lecture Slide #15 Estimated Logit Probabilities
Lecture Slide #16 Logit Diagnostics The most useful diagnostics are to match “influence” (case-wise dfbetas) with predicted probabilities:
Lecture Slide #17 Logit Outliers and Influence In this instance, the high influence cases are those are uncertain, liberal males, OR certain conservative females. These “bundles” of attributes make them harder to predict.
Lecture Slide #18 Coming Up... Factor Analysis Readings in the course e-reserves –See the link on the class web page