Logistic (regression) single and multiple
Overview Defined: A model for predicting one variable from other variable(s). Variables:IV(s) is continuous/categorical, DV is dichotomous Relationship:Prediction of group membership Example:Can we predict bar passage from LSAT score (and/or GPA, etc) Assumptions:Multicollinearity (not linearity or normality)
Comparison to Linear Regression: Since dichotomous outcome, can’t use linear regression because not linear Since dichotomous outcome, we are now talking about “probabilities” (of 0 or 1) So logistic is about predicting the probability of the outcome occurring.
Comparison to Linear Regression: Logistic is based upon “odds ratio” which is the probability of an event divided by probability of non-event. For example, if Exp(b) =2, then a one unit change would make the event twice as likely (.67/.33) to occur.
Comparison to Linear Regression: Single predictor Multiple predictor Notice the linear regression equation e is the base of the natural logarithm (about 2.718)
Comparison to Linear Regression: Linear = measure of fit was sum of squares Summing the squared difference between the line and actual outcomes Logistic = measure of fit is log-likelihood Summing the probabilities associated with the predicted and actual outcomes
Comparison to Linear Regression: Linear = overall variance explained by R 2 Logistic = overall “ variance explained ” by… -2LL (log-likelihood score x 2, higher means worse fit) R 2 cs (Cox and Snell’s statistic for comparison to baseline) R 2 n (Nagelkerke’s statistic variation of R 2 cs )
NOTE: There is no direct analog of R 2 in logistic analysis. This is because an R 2 measure seeks to make a statement about the "percent of variance explained," but the variance of a dichotomous or categorical dependent variable depends on the frequency distribution of that variable. For a dichotomous dependent variable, for instance, variance is at a maximum for a split, and the more lopsided the split, the lower the variance. This means that R 2 measures for logistic analysis with differing marginal distributions of their respective dependent variables cannot be compared directly, and comparison of logistic R 2 measures with R 2 from OLS regression is also problematic. Nonetheless, a number of logistic “pseudo” R 2 measures have been proposed, all of which should be reported as approximations to OLS R 2, BUT NOT as actual percent of variance explained.
Comparison to Linear Regression: Linear = unique contributions of variable by... unstandardized b (for the regression equation) standardized b (for interpretation, similar to r) significance level (t-test) Logistic = unique contributions of variable by... unstandardized b (for the logistic equation) exp(b) (for interpretation, as odds ratio) significance level (Wald, using chi-square test)
Comparison to Linear Regression: Logistic = unique contributions of variable by... unstandardized b (for the logistic equation) exp(b) (for interpretation, as odds ratio) significance level (Wald, using chi-square test) (1) Both gre and gpa are significant predictors while topnotch is not. (2) For a one unit increase in gpa, the log odds of being admitted to graduate school (vs. not being admitted) increases by.668. (3) For a one unit increase in gpa, the odds of being admitted to graduate school (vs. not being admitted) increased by a factor of
Comparison to Linear Regression: Linear = each variable (without controlling)… Bivariate correlation Logistic = each variable (without controlling)… Logistic output shows you the following information:
Comparison to Linear Regression: Linear = different methods… Entry Hierarchical Stepwise Logistic = different methods… Entry (same as with linear regression) Hierarchical (same as with linear regression) Stepwise (see Field’s textbook page 226)