01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa
01/20152 Analyzing Survival Data(1) Three methods to analyze survival data: –Parametric methods –semi-parametric methods –non-parametric methods
01/20153 Analyzing Survival Data (2) Parametric methods –Assume one of the functions discussed earlier. Usually assume the probability distribution Estimating the hazard function is key –Estimate the parameters directly Use Maximum Likelihood Estimation methods –Has greatest statistical power provided that the model is correct –Will be discussed in a later session
01/20154 Analyzing Survival Data (3) Semi-Parametric methods (doesn’t ‘estimate’ S(t)) –Assume that there is a parametric relationship between the hazard in different treatment/exposure groups E.g. Males have twice the hazard as females –BUT, let the hazard function be unspecified (non- parametric) Can be any form, including cure models –Cox modeling is most common method used Proportional Hazard assumption is commonly used but not essential More later in course
01/20155 Analyzing Survival Data (4) Non-Parametric methods –Make no assumption about the survival curve, distribution function, etc. –Common approach used in epidemiology and medicine. –Actuarial method (life-table/Cutler-Ederer) Treat time as ‘intervals’ Does not need exact time of the event Used for 100+ years by demographers –Kaplan-Meier (product-limit) method Most frequent approach for RCTs Requires knowing the actual time of event
01/20156 Actuarial Method: Key Concept Divide the follow-up period into smaller time units –Often use 1 year intervals Can be: days, months, decades, etc. –Intervals don’t have to be the same size but usually are Compute survival in each interval Combine these into an overall estimate of S(t)
01/20157 Year# at start# dying What is Cumulative Incidence over 3 years? Standard Epi formula: Consider a simple example: Follow 1000 people for 3 years and count number that die in each year
01/20158 Another view: How can you still be alive after 3 years? Don’t die in year 1 and Don’t die in year 2 and Don’t die in year 3
01/20159 DEAD p1p1 1- p 1 p2p2 1- p 2 p3p3 1- p 3 Year 0 Year 1 Year 2 Year 3
01/ Year# at start# dying Our simple example: Apply the formula: Same Answer! Why? No losses/censoring
01/ Conditional Probs Cumulative Probs ABCDEFG Year# people still alive # people dying in this year Prob die in year Prob survive this year Cum. Prob of surviving to this year Cum. Prob of dying by this year ,0002, ,0001, ,4001, ,1201, , ABCDEFG Year# people still alive # people dying in this year Prob die in year Prob survive this year Cum. Prob of surviving to this year Cum. Prob of dying by this year ,0002, ,0001, ,4001, ,1201, , ABCDEFG Year# people still alive # people dying in this year Prob die in year Prob survive this year Cum. Prob of surviving to this year Cum. Prob of dying by this year ,0002, ,0001, ,4001, ,1201, , ABCDEFG Year# people still alive # people dying in this year Prob die in year Prob survive this year Cum. Prob of surviving to this year Cum. Prob of dying by this year ,0002, ,0001, ,4001, ,1201, , ABCDEFG Year# people still alive # people dying in this year Prob die in year Prob survive this year Cum. Prob of surviving to this year Cum. Prob of dying by this year ,0002, ,0001, ,4001, ,1201, , ABCDEFG Year# people still alive # people dying in this year Prob die in year Prob survive this year Cum. Prob of surviving to this year Cum. Prob of dying by this year ,0002, ,0001, ,4001, ,1201, , ABCDEFG Year# people still alive # people dying in this year Prob die in year Prob survive this year Cum. Prob of surviving to this year Cum. Prob of dying by this year ,0002, ,0001, ,4001, ,1201, , ABCDEFG Year# people still alive # people dying in this year Prob die in year Prob survive this year Cum. Prob of surviving to this year Cum. Prob of dying by this year ,0002, ,0001, ,4001, ,1201, ,
01/ ABCD Year# people still alive # people ‘lost’ in year # people dying in this year ,0005,0001, ,5001, , SAMPLE DATA: MORTALITY AND LOSSES
Actuarial Method (1) Consider the first interval of time: –10,000 people ‘at risk’ at start of interval –1,500 die –5,000 are ‘lost’ before end of interval Is the probability of death: –1,500/10,000 01/ NO
Actuarial Method (2) ‘Lost’ people are only at risk of ‘dying’ until they are lost. When are they lost? –We don’t know. Losses could follow any pattern: 01/201514
01/201515
Actuarial Method (3) The Actuarial ASSUMPTION (two forms) –‘lost’ subjects are ‘at risk’ for one-half of the interval –Only one-half of lost subjects are ‘at risk’ for the interval. For 1990, this implies: 01/201516
Actuarial Method (4) This is identical to the standard formula for estimating Cumulative Incidence learned in Epi 1. 01/201517
01/ ABCDEFGHI Year# people still alive # lost# people dying in this year Effective # at risk Prob die in year Prob survive this year Cum. Prob of surviving to this year Cum. Prob of dying by this year ,0005,0001,5007, ,5001, , , Conditional Probs Cumulative Probs
Actuarial Method (5) Now, consider 1991 (Assume that you survive to the start of 1991) The standard epidemiology formula gives: 01/201519
Actuarial Method (6) What is: Prob(died by 1991)? 01/ AND SO ON
01/ ABCDEFGHI Year# people still alive # lost# people dying in this year Effective # at risk Prob die in year Prob survive this year Cum. Prob of surviving to this year Cum. Prob of dying by this year ,0005,0001,5007, ,5001, , ,
Actuarial Method (7) ‘The Algebra’ 01/ Compute these for each interval. Gives columns A through G
Actuarial Method (7) ‘The Algebra’ – part 2 01/ Compute these for each interval. Gives columns H and I The Cumulative probabilities
01/ Kaplan Meier: Key Concept Similar approach to the actuarial method EXCEPT: –Use exact times for each outcome to define ‘intervals’ rather than a fixed length interval Compute a new survival value at every time where an outcome event occurs –Can ignore times with censored events –Excluded censored people from the ‘at risk’ group
01/ Kaplan Meier: Risk set At any time ‘t’ during follow-up, there will be a group of people still under active follow-up –Excludes People with previous outcomes People who have been censored prior to ‘t’ These are the only people at risk of having an outcome at time ‘t’ Called the RISK SET at time ‘t’
Kaplan-Meier: Formulae Compute at each time when an event happens (t i ) 01/201526
01/ Data: 5, 12, 25, 26, 27, 28, 32+, 33+, 34+, 37, 39, 40+, 42+ ‘i'time# deaths# in risk set S(t) for t i ≤ t ≤ t i+1 Cumulative S(t) / ‘i'time# deaths# in risk set S(t) for t i ≤ t ≤ t i+1 Cumulative S(t) / / ‘i'time# deaths# in risk set S(t) for t i ≤ t ≤ t i+1 Cumulative S(t) / / / / / / /7(1.0) /6(1.0) /5(1.0) / / /2(1.0) /1(1.0)0.27 ‘i'time# deaths# in risk set S(t) for t i ≤ t ≤ t i+1 Cumulative S(t) / / / / / / / /
01/201528
Confidence Intervals for S i (t) 01/ Greenwood’s Formula
01/201530
01/ Median Survival (1) Mean survival –hard to estimate and has limited value –due to right skewing of survival distribution Median survival is more useful: –The time by which 50% of the cohort will have had the outcome –S(t) = 0.50 With no censoring, easy to get: –Use normal formula with the survival times.
01/ Median Survival (2) With censoring, you need to solve: –S(t) = 0.5 –You can do this directly in the KM plot –Can be computed as well –Is based on the rank order of the survival times, not on the actual times Except at the median itself 95% CI can be obtained –Complex formula/method –Tend to be very wide.
K-M: A Couple Of Notes If the last time corresponds to an ‘event’, then S(t last ) must be 0. –This does NOT mean that every one in the group dies. If the last time corresponds to a ‘censoring’, then S(t last ) will be non-zero. –The mean survival time will be biased Under-estimated 01/201533
01/ Numerical example IDTime(mons)Censored 114XXXXX XXXXX 545XXXXX XXXXX 992XXXXX 10111XXXXX
01/ Actuarial Method ABCDEFGH Year# people under follow-up # lost# people dying in this year Effective # at risk Prob die in year Prob survive this year S(t) ABCDEFGH Year# people under follow-up # lost# people dying in this year Effective # at risk Prob die in year Prob survive this year S(t) ABCDEFGH Year# people under follow-up # lost# people dying in this year Effective # at risk Prob die in year Prob survive this year S(t) ABCDEFGH Year# people under follow-up # lost# people dying in this year Effective # at risk Prob die in year Prob survive this year S(t)
01/ Kaplan-Meier method ‘i'time# deaths # in risk set Prob die in interval Prob survive interval S(t 1 ) ‘i'time# deaths # in risk set Prob die in interval Prob survive interval S(t 1 ) ‘i'time# deaths # in risk set Prob die in interval Prob survive interval S(t 1 ) ‘i'time# deaths # in risk set Prob die in interval Prob survive interval S(t 1 ) Note that the last three subjects were censored. Final S(t) value is non-zero.
01/ Actuarial Curve
01/ Kaplan-Meier Curve
01/ Both Curves
01/ Estimating 75 th percentile survival
01/201541