01/20151 EPI 5344: Survival Analysis in Epidemiology Interpretation of Models March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa
Objectives Epidemiology often tries to determine risk associated with ‘exposures’ or like events Various types of exposures –nominal –ordinal –count –continuous Session looks at how to use these different types of exposures in Cox models And, how to interpret the results 01/20152
Nominal variables (1) Categories –Male/female –Occupation –City of residence 01/20153
Nominal variables (2) Can not be rank ordered –If you ‘impose’ a ranking, then you are defining a new variable. –No intrinsic numerical value Even if they are commonly coded as 1/2, they are NOT NUMERIC Use of 1/2/3 coding dates back to early days of computers Can assign letters or words to categories (e.g. Male/Female) –Usually analyzed using dummy variables Indicator variables is a better term. 01/20154
Nominal variables (3) SAS must be aware that these variables are categories, not numbers –CLASS statement Indicator variables –‘n’ response levels ‘n-1’ indicator variables –Why? The MLE equations are indeterminate if you include ‘n’ indicator variables. 01/20155
Consider a two-level nominal variable –e.g. M/F Some possible IV codings: All (and many more) are possible. #1 is the most commonly used Nominal variables (4) 01/20156 Level#1#2#3#4 1 (M) 2 (F) Level#1#2#3#4 1 (M)1 2 (F)0 Level#1#2#3#4 1 (M)11 2 (F)0 Level#1#2#3#4 1 (M)112 2 (F)01 Level#1#2#3#4 1 (M) (F)01-18
Suppose we have 3 levels (blue/green/red) Some possible Indicator Variable codings: Nominal variables (5) 01/20157 Reference Effect Orthogonal Polynomial Level#1 X 1 X 2 #2 X 1 X 2 #3 X 1 X 2 1 (blue) 2 (green) 3 (red) Level#1 X 1 X 2 #2 X 1 X 2 #3 X 1 X 2 1 (blue)10 2 (green)01 3 (red)00 Level#1 X 1 X 2 #2 X 1 X 2 #3 X 1 X 2 1 (blue) (green) (red)00 Level#1 X 1 X 2 #2 X 1 X 2 #3 X 1 X 2 1 (blue) (green) (red)
Ordinal variables Still categorical (name) type data. But, there is an implicit ordering with the levels. –Disease severity mild, moderate, severe –Rating scale Agree/don’t care/disagree Interval data which has been categorized is really ‘ordinal’, not interval. Commonly treat as nominal variables and ignore rank order Can use ordinal variables for trend or dose response analyses –Key issue: determining the spacing/numerical values to use 01/20158
Interval variables Has a meaningful ‘0’ Can assume any value (perhaps within a range) –Infinite precision is possible. Examples –Blood pressure –Cholesterol –Income Usually treated as a continuous number Can categorize and then use ordinal methods –Can reduce measurement errors –Useful for EDA to look for non-linear dose-response. 01/20159
Count variables The number of times an event has happened –# pregnancies –# years of education Very common type of data in medical research Often regarded as continuous but isn’t really Can affect choice of model –Poisson distribution 01/201510
Parameter interpretation (1) Cox models Hazard Ratios not the hazard –Can not provide a direct estimates of S(t) or h(t) –Needs additional assumptions or methods –Next week Cox model: Contains no explicit intercept term –That is the h 0 (t) 01/201511
Parameter interpretation (2) To explore Betas, we will use this form of the Cox model: Start with 2-level nominal variable Consider 3 different coding approaches 01/201512
Parameter interpretation (3) We will have only one parameter. 01/201513
x i = 1 if exposed = 0 otherwise 01/ H 0 : β 1 =0 Wald Score Likelihood ratio 95% CI or
x i = 1 if exposed = -1 otherwise 01/ Changing the coding has changed the interpretation of the Beta value ‘Effect’ coding compares each level to the mean effect ‘sort of’ since it is dependent on the number of subjects in each level.
x i = 2 if exposed = 1 otherwise 01/ Changing the coding doesn’t always change the interpretation of the Beta value Need to be aware of the coding used in order to correctly interpret the model
X i is continuous (interval) Let’s consider the effect of changing the exposure by ‘1’ unit 01/ So,β 1 relates to the HR for a 1 unit change in x 1. What about an ‘s’ unit change?
X i is continuous (interval) Let’s consider the effect of changing the exposure by ‘s’ unit 01/ SAS has an option to do this automatically. Important to consider the level for a ‘meaningful’ change
01/ Level#1 X 1 X 2 #2 X 1 X 2 #3 X 1 X 2 1 (blue) (green) (red)
‘reference’ coding; level 3=ref 01/201520
‘effect’ coding 01/201521
Effect coding –β i does NOT let you compare two levels directly –Use ‘Reference Coding’ to do that 01/201522
‘orthogonal polynomial’ coding 01/201523
01/ Doesn’t look that useful or interesting This coding can be used to test for linear and quadratic effects in the dose-response relationship Test β 1 to look for a linear effect Test β 2 to look for a quadratic effect Can be extended to higher orders if there are more levels.
Hypothesis testing (1) Does adding new variable(s) to a model lead to a better model? –Test the statistical significance of the additional variables. Let’s start with 1 variable The likelihood is a function of the variable –L = f(β 1 ) To test β 1 =0, compare the likelihood of the model with the MLE estimate to the ‘β 1 =0’ model –f(0) vs. f(β MLE ) 01/201525
Hypothesis testing (2) As is common in statistics, this gives the best results if you: –Take logarithms –Multiple by -2 Likelihood ratio test of H 0 : β 1 =0 01/201526
Hypothesis testing (3) Can extend to test the null hypothesis that ‘k’ parameters are simultaneously ‘0’. Likelihood ratio test of H 0 : β 1 =β 2 =….. β k =0 Can be used to compare any two models –BUT, Models must be Nested or hierarchical 01/201527
Hypothesis testing (4) Nested –All of the variables in one model must be contained in the other model #1: x 1, x 2, x 3, x 4 #2: x 1, x 2 –AND: Analysis samples must be identical Watch for issues with missing data in extra variables 01/201528
Hypothesis testing (5) You can also do many of these tests using the Wald and Score approximations –Hopefully, you get the same answer Usually do (at least, in the same ballpark) 01/201529
Confidence Intervals (1) Always based on the Wald Approximation First, determine 95% CI’s for the Betas Second, convert to 95% CI’s for the HR’s 01/201530
01/201531