Introduction to logistic regression and Generalized Linear Models July 14, 2011 Introduction to Statistical Measurement and Modeling Karen Bandeen-Roche, PhD Department of Biostatistics Johns Hopkins University
Data motivation Osteoporosis data Scientific question : Can we detect osteoporosis earlier and more safely? Some related statistical questions : How does the risk of osteoporosis vary as a function of measures commonly used to screen for osteoporosis? Does age confound the relationship of screening measures with osteoporosis risk? Do ultrasound and DPA measurements discriminate osteoporosis risk independently of each other?
Outline Why we need to generalize linear models Generalized Linear Model specification Systematic, random model components Maximum likelihood estimation Logistic regression as a special case of GLM Systematic model / interpretation Inference Example
Regression for categorical outcomes Why not just apply linear regression to categorical Y’s? Linear model (A1) will often be unreasonable. Assumption of equal variances (A3) will nearly always be unreasonable. Assumption of normality will never be reasonable
Introduction: Regression for binary outcomes Y i = 1{event occurs for sampling unit i} = 1 if the event occurs = 0 otherwise. p i = probability that the event occurs for sampling unit i := Pr{Y i = 1} Begin by generalizing random model (A5): Probability mass function: Bernoulli Pr{Y i = 1} = p i ; Pr{Y i = 0} = 1-p i all other y i occur with 0 probability
Binary regression By assuming Bernoulli: (A3) is definitely not reasonable Var(Y i ) = p i (1-p i ) Variance is not constant: rather a function of the mean Systematic model Goal remains to describe E[Y i |x i ] Expectation of Bernoulli Y i = p i To achieve a reasonable linear model (A1): describe some function of E[Y i |x i ] as a linear function of covariates g(E[Y i |x i ]) = x i ’ β Some common g: log, log{p/(1-p)}, probit
General framework: Generalized Linear Models Random model Y~a density or mass function, f Y, not necessarily normal Technical aside: f Y within the “exponential family” Systematic model g(E[Y i |x i ]) = x i ’ β = η i “g” = “link function”; “x i ’ β ” = “linear predictor” Reference: Nelder JA, Wedderburn RWM, Generalized linear models, JRSSA 1972; 135:
Types of Generalized Linear Models Model (link function) ResponseDistributionRegression Coef Interp Linear ContinuousGaussianChange in ave(Y) per unit change in X Logistic BinaryBinomialLog odds ratio Log-linear Times to events/counts PoissonLog relative rate Proportional hazards Times to events Semi- parametric Log hazard
Estimation Estimation: maximizes L( β,a;y,X) = General method: Maximum likelihood (Fisher) Given {Y 1,...,Y n } distributed with joint density or mass function f Y (y; θ ), a likelihood function L( θ ;y) is any function (of θ ) that is proportional to f Y (y; θ ). If sampling is random, {Y 1,...,Y n } are statistically independent, and L( θ ;y) α product of individual f.
Maximum likelihood The maximum likelihood estimate (MLE),, maximizes L( θ ;y): Under broad assumptions MLEs are asymptotically Unbiased (consistent) Efficient (most precise / lowest variance)
Logistic regression Y i binary with p i = Pr{Y i = 1} Example: Y i = 1{person i diagnosed with heart disease} Simple logistic regression (1 covariate) Random Model: Bernoulli / Binomial Systematic Model: log{p i /(1- p i )}= β 0 + β 1 x i log odds; logit(p i ) Parameter interpretation β 0 = log(heart disease odds) in subpopulation with x=0 β 1 = log{p x+1 /(1-p x+1 )}- log{p x /(1-p x )}
Logistic regression Interpretation notes β 1 = log{p x+1 /(1-p x+1 )}- log{p x /(1-p x )} = exp( β 1 ) = = odds ratio for association of prevalent heart disease with each (say) one year increment in age = factor by which odds of heart disease increases / decreases with each 1-year cohort of age
Multiple logistic regression Systematic Model: log{p i /(1- p i )}= β 0 + β 1 x i1 + … + β p x ip Parameter interpretation β 0 = log(heart disease odds) in subpopulation with all x =0 β j = difference in log outcome odds comparing subpopulations who differ by 1 on x j, and whose values on all other covariates are the same “Adjusting for,” “Controlling for” the other covariates One can define variables contrasting outcome odds differences between groups, nonlinear relationships, interactions, etc., just as in linear regression
Logistic regression - prediction Translation from η i to p i log{p i /(1- p i )}= β 0 + β 1 x i1 + … + β p x ip Then = logistic function of η i Graph of p i versus η i has a sigmoid shape
GLMs - Inference The negative inverse Hessian matrix of the log likelihood function characterizes Var( ) (adjunct) SE( ) obtained as square root of the jth diagonal entry Typically, substituting for β “Wald” inference applies the paradigm from Lecture 2 Z = is asympotically ~ N(0,1) under H 0 : β j = β 0j Z provides a test statistic for H 0 : β j = β 0j versus H A : β j ≠ β 0j ± z (1- α /2) SE{ } =(L,U) is a (1- α )x100% CI for β j {exp(L),exp(U)} is a (1- α )x100% CI for exp( β j )
GLMs: “Global” Inference Analog: F-testing in linear regression The only difference: log likelihoods replace SS Hypothesis to be tested is H 0 : β j1 =...= β jk = 0 Fit model excluding x j1,...,x jp j : Save -2 log likelihood = L s Fit “full” (or larger) model adding x j1,...,x jp j to smaller model. Save -2 log likelihood = L L Test statistic S = L s - L L Distribution under null hypothesis: χ 2 p j Define rejection region based on this distribution Compute S Reject or not as S is in rejection region or not
GLMs: “Global” Inference Many programs refer to “deviance” rather than -2 log likelihood This quantity equals the difference in -2 log likelihoods between ones fitted model and a “saturated model” Deviance measures “fit” Differences in deviances can be substituted for differences in -2 log likelihood in the method given on the previous page Likelihood ratio tests have appealing optimality properties
Outline: A few more topics Model checking: Residuals, influence points ML can be written as an iteratively reweighted least squares algorithm Predictive accuracy Framework generalizes easily
Main Points Generalized linear modeling provides a flexible regression framework for a variety of response types Continuous, categorical measurement scales Probability distributions tailored to the outcome Systematic model to accommodate Measurement range, interpretation Logistic regression Binary responses (yes, no) Bernoulli / binomial distribution Regression coefficients as log odds ratios for association between predictors and outcomes
Main Points Generalized linear modeling accommodates description, inference, adjustment with the same flexibility as linear modeling Inference “Wald” - statistical tests and confidence intervals via parameter estimator standardization “Likelihood ratio” / “global” – via comparison of log likelihoods from nested models