LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate the effect of a risk on the occurrence of a disease event. Example: Framingham Heart Study Coronary heart disease and blood pressure
LOGISTIC REGRESSION: AN EXAMPLE Event: Coronary Heart Disease Occurrence is the dependent variable, which takes 2 values: Yes or No. Risk factor: Blood pressure Systolic blood pressure is the independent variable X, a continuous measurement. The probability of getting coronary heart disease depends on blood pressure.
DATA
SCATTER PLOT
LINEAR REGRESSION FOR Prob.(CHD): NOT A GOOD IDEA!
PROPORTION WITH CHD BY SBP GROUP Systolic BP Range Proportion mmHg 0/ mmHg 2/ mmHg 3/3 1.00
LOGISTIC REGRESSION PROBABILITY MODEL 1 p(X) = exp (- 0 - X) The probability of the event varies as an S-shaped function of the risk factor X: the logistic curve.
LOGISTIC CURVE MODEL: OCCURRENCE OF CHD AS A FUNCTION OF SBP
LOGISTIC MODEL: LOG ODDS p (X) log = 0 + 1 X 1 - p (X) The log of the odds of the event is a linear function of X. Log(odds of CHD) = (SBP)
ODDS The odds of an event is the chance that the event occurs divided by the chance of its not occurring: Odds = p/(1 - p) = p/q
: KEY PARAMETER OF THE LOGISTIC MODEL p (X) log = 0 + 1 X 1 - p (X) The parameter is like the slope of a linear regression model. = 0 indicates that X has no effect on the probability, e.g., a man’s chance of CHD does not depend on his SBP.
1 : KEY PARAMETER p (X) log = 0 + 1 X 1 - p (X) The coefficient 1 measures the amount of change in the log of the odds per unit change in X.
1 : KEY PARAMETER log odds(X+1) = 0 + 1 (X+1) = 0 + 1 X+ 1 log odds(X) = 0 + 1 X Difference in log odds = 1 E.g., the log of the odds of getting CHD increases by for an increase of 1 mmHg of systolic blood pressure. (Hard to explain to a patient!)
THE COEFFICIENT 1 AND THE ODDS RATIO Difference in log odds given by 1 translates into the odds ratio (OR). exp( 1 ) = OR = ratio of odds at risk level of X+1 to the odds when risk level is X 1 = 0 OR = 1.
THE COEFFICIENT $ 1 AND THE ODDS RATIO For example, the odds of CHD are multiplied by the factor exp(0.0243) = for every increase of 1 mmHg in SBP. A difference of 10 mmHg multiplies the odds of CHD by (1.025) 10, or
ESTIMATION OF THE PARAMETERS Technique: Maximum likelihood estimation For large sample sizes, the normal distribution is used to put a confidence interval around the estimate of the coefficient .
HYPOTHESIS TESTING Ho: 1 = 0 No difference in risk at different levels of the risk factor X. No association between risk factor X and probability of occurrence.
HYPOTHESIS TESTING Ha: 1 =/= 0 or 1 > 0 (risk increases with X) or 1 < 0 (risk goes down as X increases)
HYPOTHESIS TESTING Ho: OR = 1 Ha: OR =/= 1 or OR > 1 (risk increases with X) or OR < 1 (X is protective)
RESULTS OF LOGISTIC REGRESSION OR with confidence interval and p value indicate whether there is a significant association between level of the risk factor and chance of occurrence OR = (1.015, 1.034), p < 0.001
RESULTS OF LOGISTIC REGRESSION Can be used to predict an individual’s risk: prob. of CHD when SBP = 180: p/q = exp{ (180)} Solve for p: prob. of CHD = 0.125
MULTIVARIATE LOGISTIC REGRESSION Model with additional risk factors: p (X) log = 0 + 1 X + 2 X 1 - p (X) Log(odds of CHD) = 0 + 1 (SBP) + 2 (CHOL) + 3 (smoker)