1 PH 240A: Chapter 12 Mark van der Laan University of California Berkeley (Slides by Nick Jewell)
2 Regression Models: Motivation Stratification methods break down (wrt precision) with large number of strata and moderate sample sizes Stratification leads to high degree of freedom tests for interaction, for example, and therefore low power Refined measures of exposure lead to high degree of freedom tests for association and therefore low power Want parsimonious descriptions of patterns of risk Regression Models (how to diet wrt degrees of freedom)
3 Regression Models: Linking Effects x x x x X, numerical measure of Exposure E P(D|E=x) x1x1 x2x2 x4x4 x3x3
4 Linear Models (think CHD and body weight) Form of model: Interpretation of model parameters a and b: b = ER associated with unit increase in X b(x * -x) = ER assoc. with increase in X from x * to x Choose X = 0 value carefully! Choose the scale ofX carefully!
5 Linear Models Pros Good modeling Excess Risk Cons Can’t be applied to case-control data Can predict probabilities 1
6 Log Linear Models Form of model: Interpretation of model parameters a and b: Choose X = 0 value carefully!
7 Log Linear Models Interpretation of model parameters a and b: e b = RR associated with unit increase in X e b(x*-x) = RR assoc. with increase in X from x * to x Choose the scale ofX carefully!
8 Log Linear Models Pros Good modeling Relative Risk Cons Can’t be applied to case-control data Can predict probabilities > 1
9 Logistic Regression Models Form of model: Interpretation of model parameters a and b: Choose X = 0 value carefully!
10 Logistic Regression Models Interpretation of model parameters a and b: e b = OR associated with unit increase in X e b(x*-x) = OR assoc. with increase in X from x * to x Choose the scale ofX carefully!
11 Logistic Regression Models Pros Good modeling Odds Ratio Can be applied to case-control data Predicted probabilities always lie between 0 and 1 Cons Harder to interpret
12 Logistic Regression Models
US Infant Mortality Mother’s Marital Status Infant Mortality Unmarried (X = 1) Married (X = 0) Total Death16,71218,78435,496 Live at 1 Year 1,197,1422,878,4214,075,563 Total1,213,8542,897,2054,111,059 p 1 = 16,712/1,213,854 = p 0 = 18,784/2,897,205 =
14 Infant Mortality and Marital Status: Various Models
15 Logistic Regression: Body Weight and CHD
16 CHD and Body Weight: Various Models
17 Multiple Logistic Regression Models Form of model: Think, e.g., D = CHD, X 1 = Behavior type, X 2 = Body weight
18 Multiple Logistic Regression Models Interpretation of model parameters a, b 1,..., b k : Choose 0 covariate values carefully! Choose covariate scales carefully!
19 Indicator (Dummy) Variables for Discrete Exposures Goal: to model exposures with several discrete levels without assuming dose response Body weight (dose response) Body weight (no assumed pattern)
20 Indicator (Dummy) Variables for Discrete Exposures Model: No assumed pattern!
21 Interpretation of Slope Coefficients with Dummy Variables CHD Event Dnot D Body Wt (lbs) >
22 Indicator (Dummy) Variables for Discrete Exposures ModelParameterEstimateOR a b a b1b b2b b3b b4b
23 Logistic Regression: Body Weight and CHD Dose response (linear) X No pattern (dummy vars.) X 1,..., X 4