Statistics Sweden September 2004 Dan Hedlin Logistic Regression Statistics Sweden September 2004 Dan Hedlin
Binary Y variable (0 or 1) Contract cancer or not, over or under a poverty line, response or nonresponse Y is not limited in ordinary regression Trick: p is probability for cancer, etc.
Alternative expressions Common notation Equivalent:
Different scales Log-odds (additive effects) Odds p/(1-p) (multiplicative effects) Probability p Another difference to ’ordinary’ regression: Iterative computation and numerical issues
Interpretation of parameters ’Base probability’ for and Maybe most interpretable when x are interval scaled variables and the zero point is meaningful
Interpretation of ß One auxiliary variable: So Hence additive one-step-increment of x gives multiplicative effect on odds with
Classical example Bliss (1935), also in Agresti (1990) ’Catergorical Data Analysis’, Wiley, section 4.5.3. Beetles, two interval-scaled variables y = dead/survived, x = log(dose carbon disulphide) There are other models for a binary y that in some cases may be better. Logistic reg most common.
Model fitting Table low-high risk vs each variable separately Are there cells with zero observations? First selection with e.g. Forward selection 0.25 significance level Test each remaining variable separately For continuous variables: examine linearity by dividing the continuous variable in groups and compute log-odds within group Test interaction effects Consider subject matter knowledge