Presentation is loading. Please wait.

Presentation is loading. Please wait.

01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.

Similar presentations


Presentation on theme: "01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health."— Presentation transcript:

1 01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa

2 01/20152 Objectives Review proportional hazards Introduce Cox model and methods of estimation Tied data

3 01/20153 Exponential model (1R) Exponential model –Most common parametric model in epidemiology –Assumes a constant h(t) = λ –How did we create the likelihood function? Subjects can have two types of ‘ends’ –Death –Censored Each contribute to the likelihood function but in different ways

4 01/20154 Exponential Model (2R) Likelihood contribution of a death at time t i : Likelihood contribution if censored at time : –Actual time of ‘failure’ is unknown. –Must survive until at least time –Multiply these across all deaths and all censored events to get full likelihood

5 01/20155 Exponential Model (3R) Where: N = # events PT = Person-time of follow-up

6 01/20156 Exponential Model (4R) How do we find the MLE for λ?

7 01/20157 Exponential Model (5R) What if we want to examine predictors of the outcome? –λ is allowed to vary by sex, age, cholesterol, etc. Use the same approach but now, instead of ‘λ’, we have the following in the likelihood function:

8 End of review 01/20158

9 Proportional hazard models (1) Now, use this approach BUT do not pre-specify form for h(t) We start with proportional hazards Hazard (h(t)) = rate of change in survival conditional on having survived to that point in time. 01/20159

10 Hazard models (2) Suppose we want to compare two treatment groups –Different survival is expected  they have different hazards –How can we summarize this? 01/201510 In general, HR(t) will be different at different follow-up times

11 01/201511 h 2 (t) h 1 (t) This can be hard to describe and interpret Effect of the treatment varies with length of follow-up

12 01/201512 h 2 (t) h 1 (t) HR could switch from below to above 1.0

13 Hazard models (3) SUPPOSE that HR(t) were constant at all follow-up times. –Effect of the treatment is the same at all times PROPORTIONAL HAZARDS model (PH) This does not require that h(t) be constant; It can vary in an unconstrained manner. 01/201513

14 01/201514 h 2 (t) h 1 (t)

15 01/201515 h 2 (t) h 1 (t) HR

16 01/201516

17 01/201517 Cox models (1) For most of the rest of this course, we will assume a Proportional hazards model: h 1 (t) = h 0 (t) * HR h 0 (t) is the ‘baseline’ or reference hazard. –Contains all of the time variability of the hazard. HR is assumed to remain the same for all follow-up time. Constant over follow-up time

18 01/201518 Cox models (2) HR can still be affected by predictor variables –Race –Exposure (low/mid/high) –Sex –Caloric intake For now, we will assume that these are –measured at baseline (time ‘0’) –remain fixed during follow-up

19 01/201519 Cox models (3) In general, we have: Most common model assumes that ln(HR) is a linear function of the predictors. This is similar to the model for logistic regression and linear regression. NOTE: there is no intercept! –This is ‘subsumed’ into the baseline hazard term h 0 (t)

20 01/201520 Cox models (4) HR model can be written: How does the fit into our ‘hazard’ model? Our base model is:

21 01/201521 Cox models (5) This implies: But, so what? How do we estimate the Betas? –As with exponential model, it appears we need to know the shape of h 0 (t)

22 01/201522 Cox models (6) COX (1972) SHOWED THAT THIS IS WRONG! –Can estimate the Beta’s without needing to model h 0 (t) –Semi-parametric model –Based on: Risk sets Partial likelihoods We will skip a lot of math –Use an intuitive approach –Method relates to approach used with exponential model

23 01/201523 Cox models (7) Start off trying to build a likelihood for the data based on the whole model (with baseline hazard included) Concentrate on the times when events happened –Similar to the Kaplan-Meier method S(t) only changes when an event happens can ignore losses between events Action happens within Risk Set at the event times.

24 01/201524 Cox models (8) Action happens within Risk Set at the event times. The theory assumes that only one event happens at any point in time –This is not the ‘real world’ –In theory, time is continuous. So no two events happen at the same time –We’ll deal with ‘ties’ later on

25 01/201525 Cox models (9) Consider the risk set at time ‘t i ’ when an event happens –Each subject in risk set has a probability of being the one having the event Higher hazard  higher probability ‘likelihood’ contribution from person ‘j’ in risk set is:

26 01/201526 Cox models (10) Using the definition of conditional probability, this is: How do we get the numerator and denominator? The hazard is a measure of how likely an event is to occur for a person –Higher hazards  an event is more likely

27 01/201527 Cox models (11) We can get:

28 01/201528 Cox models (12) Now, because the hazards are proportional, we have:

29 Cox models (13) The likelihood contribution from this event (risk set) can be written: Cancel out the h 0 (t) 01/201529

30 Cox models (14) The final likelihood contribution from this risk set is: Which does not depend on h 0 (t) 01/201530

31 01/201531 Cox models (15) Now, multiply all of the contributions from each risk set (defined when an event occurs) Produces a Partial Likelihood Estimate the Betas using MLE.

32 01/201532 Cox models (16) We can ignore censored times since we are not estimating the actual hazard Beta’s depend only on the ranking of events, not on the actual event times –Implies that Cox does not give the same estimates as Person-time epidemiology analyses –Standard Cox models do not estimate survival, just relative survival

33 01/201533 D D D C C t1t1 t2t2 t3t3 Let’s consider a simple example. Three events  three risk sets to consider

34 For subject ‘m’, the hazard function is: 1 st event. risk set: 1/2/3/4/5 Subject with event: 3 Likelihood contribution: 01/201534

35 But, we have: So, likelihood contribution from risk set #1 is: 01/201535

36 Extending this to the other risk sets: 2 nd event. risk set: 1/2/4 Subject with event: 1 Likelihood contribution: 3 nd event. risk set: 4 Subject with event: 4 Likelihood contribution: 01/201536

37 Overall Partial Likelihood is: This can easily be extended to very large data sets. Writing out the entire partial likelihood function would be ‘crazy’ But, this is what our computer has to do 01/201537

38 Suppose that we are using the Cox model. Let’s also limit to one predictor. Then, we have:  Partial Likelihood form is now:  We will see this layout again 01/201538

39 ‘Ties’ (1) Above assumed that only one event happened at any given time –True ‘in theory’ because time is a continuous variable. –No true in reality because time is measured ‘coarsely’. For example –Only get measurement data every year –Time of event measured to the day, not hour/min/second 01/201539

40 ‘Ties’ (2) More than one event at the same time is called a ‘tied’ event. How do we modify the method to handle tied event times? 01/201540

41 ‘Ties’ (3) Two main approaches to ‘ties’ –Discrete models Change the basic theory underlying the model Assumes that event times are discrete points Relates to logistic regression Useful when event time can only occur at fixed points –graduation from high school –Exact method Often implemented using an approximation. 01/201541

42 ‘Ties’ (4) Exact method –Suppose we have two events (s 1 & s 2 ) which occur at the same time due to imprecise measurement of the event time. –IF we had been able to measure the event time with enough precision, we would know if s 1 occurred first or second Birth of twins –We don’t know, so we assume that the two possibilities are equally likely. 01/201542

43 ‘Ties’ (5) Suppose s 1 occurred before s 2. –Likelihood contribution would be: Suppose s 2 occurred before s 1. –Likelihood contribution would be: 01/201543

44 ‘Ties’ (6) Don’t know order. Each is equally likely. Overall likelihood contribution is: 01/201544

45 ‘Ties’ (7) A bit messy but not too bad. However, consider the recidivism data. –5 arrests occurred in week 8 –We don’t know which order they occurred in –120 potential orders (= 5!) –Each order contributes a likelihood product with 5 terms –Need to add up 120 of these products to give ONE contribution. Can rapidly get even worse! 01/201545

46 ‘Ties’ (8) Computationally demanding –Not that big a task for modern computers Two approximate methods have been developed –Breslow –Efron Both are ‘OK’ as long as number of ties is not too big –Efron is better. With modern computers, using the exact approach is likely fine. 01/201546

47 ‘Ties’ (9): Summary 01/201547 SituationComment No tiesAll methods give the same results A few ties (<2%)All methods give similar results Many tiesApproximations are all biased towards ‘0’. Prefer Efron to Breslow. Exact methods are best but be careful about computational demands SAS default method is Breslow

48 01/201548


Download ppt "01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health."

Similar presentations


Ads by Google