Logistic regression One of the most common types of modeling in the biomedical literature Especially case-control studies Used when the outcome is binary (yes/no) Produces (functions of) odds ratios Can be simple (one predictor) or multiple (two or more predictors)
Logistics of logistic regression ln(odds) = β0 + β1X1 + β2X2 +…+ ε (Which is similar to the equation for linear regression: Y = β0 + β1X1 + β2X2 +…+ ε) Why ln(odds)? Probability can only be between 0 and 1 (severely non-normal). Odds can only be between 0 and infinity (not normal) “Log odds” can take on any value and is normal-ish enough.
Logistic regression example From Lesko SM et al. JAMA 1993;269:998-1003 Where does this odds ratio come from?
From Lesko SM et al. JAMA 1993;269:998-1003 Where does OR come from? From Lesko SM et al. JAMA 1993;269:998-1003 ln(odds) = β0 + 0.69*[level 4 baldness(y/n)] + β2*age + β3*race + β4*religion + β5*education + β6*BMI + β7*alcohol + more βs)
Interpretation of ORs Odds ratios represent The ratio of the odds of getting the outcome comparing exposed to unexposed The multiplicative increase in odds of getting the outcome for each one-unit change in the exposure ln(odds of MI) = β0 + β1(height [in]), where β1 = 0.15 This means that for each 1-inch increase in height, the odds of getting a MI increases by a factor of exp(0.15), or 16%
Things we talked about Correlation analysis (Pearson’s r) Compared linear regression and correlation Linear regression set up and interpretation Distinction between linear and logistic regression Multivariate regression and confounding Logistic regression set up and interpretation