02/20161 EPI 5344: Survival Analysis in Epidemiology Hazard March 8, 2016 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa
02/20162 Objectives Examine non-parametric methods of estimating hazard –Actuarial formulae –Nelson-Aalen method –Application of smoothing –Piecewise constant hazard models Review limitations of estimating hazard Applications of hazard estimates
02/20163 Hazard (1) h(t) –Instantaneous hazard –Rate of event occurring at time ‘t’, conditional having survived event-free until time ‘t’ H(t) –Cumulative hazard –‘Sum’ of all hazards from time ‘0’ to time ‘t’ –Area under the h(t) curve from ‘0’ to ‘t’
02/20164 Hazard (2) Simplest survival model assumes a constant hazard –Yields an exponential survival curve –Leads to basic epidemiology formulae for incidence, etc. Can extend it with the piecewise model –Fits a different constant hazard for given follow-up time intervals.
02/20165 Hazard estimation (1) If hazard is not constant, how does it vary over time?
02/20166 Hazard estimation (2) How can we estimate the hazard? –Parametric methods (not discussed today) –Non-parametric methods We can estimate: –h(t) –H(t) Preference is to estimate H(t) –Nelson-Aalen method is main approach.
02/20167 Hazard estimation (3) Direct hazard estimation has issues –h(t) shows much random variation –Unstable estimates due to small event numbers in time intervals Works ‘best’ with actuarial method since intervals are pre-defined –Interval length is generally the same for each interval (u i ).
Estimating h(t) 03/20168
02/20169 h(t) estimation (1) Let’s start by looking at direct estimation of h(t) Works from a piece-wise constant hazard model Start by dividing follow-up time into intervals –Actuarial has pre-defined intervals –KM uses time between events as intervals.
02/ h(t) direct estimation
02/ h(t) estimation (2) Actuarial method to estimate h(t) –Interval length is u i. Standard Incidence Density formula from Epi
02/ ABCDEFGHI Year# people still alive # lost# people dying in this year Effective # at risk Prob die in year Prob survive this year Cum. Prob of surviving to this year Cum. Prob of dying by this year ,0005,0001,5007, ,5001, , , Last week, we used this data to illustrate actuarial method Let’s use it to estimate h(t)
02/ NtNt WtWt ItIt h(t) Year# people still alive # lost# people dying in this year Effective # at risk Prob die in year ,0005,0001,5007, ,5001, , , : NtNt WtWt ItIt h(t) Year# people still alive # lost# people dying in this year Effective # at risk Prob die in year ,0005,0001,5007, ,5001, , , : NtNt WtWt ItIt h(t) Year# people still alive # lost# people dying in this year Effective # at risk Prob die in year ,0005,0001,5007, ,5001, , , NtNt WtWt ItIt h(t) Year# people still alive # lost# people dying in this year Effective # at risk Prob die in year ,0005,0001,5007, ,5001, , ,
02/ h(t) estimation (3) Look at denominator of this formula –First part is the ‘effective number of people under follow-up during interval’ –Second part is ‘# of years each person followed (on average) in the interval’ –Product is the ‘person-time of follow-up’ in the interval –An approximation since we don’t use data on when each person left follow-up
02/ h(t) estimation (4) Suppose we had exact time of follow-up for each subject? Person-time variant –Divide follow-up time into fixed intervals –Compute actual person-time in each interval (rather than using approximation). –Gives a slightly smoother curve
02/ h(t) estimation (5) Kaplan-Meier method to estimate h(t) –‘interval’ is time between death events Varies irregularly –Formula has same structure as person-time estimate given above: d i = # with event u i = t i – t i-1 n i = size of risk set at ‘t’
02/ h(t) estimation (6) Issues with using KM method to estimate h(t) –Normally, only have 1 or 2 in numerator –Makes estimates ‘unstable’ Liable to considerable random variation and noise –Do not usually estimate h(t) from KM methods Any estimation of h(t) can use a Kernel Smoothing approach to improve estimates
Smoothing & hazard (1) What is smoothing? Think of a moving average –Could present monthly rates –Instead, average the rates for 3 month groups Jan/Feb/March Feb/March/April etc. –Present the 3 month averages –Smooths out variation and reveals trends 02/201618
Smoothing & hazard (2) Many other methods exist Key idea is to define a ‘sliding window’ –Estimate data point in window –Slide to the next target region LOESS smoothing is very commonly used 02/201619
Estimating H(t) 03/201620
02/ Estimating Cumulative hazard: H(t) –Measures the area under the h(t) curve. Tends to be more stable since it is based on number of events from ‘0’ to ‘t’ rather than number in the last interval H(t) estimation (1)
02/ Four ways you can do this: Actuarial using ‘epi’ formula Actuarial using Person-time method Kaplan-Meier approach using Nelson-Aalen estimator Kaplan-Meier approach using –log(S(t)) Let’s talk methods 1 & 2 H(t) estimation (2)
02/ Simple approach –Estimate h(t) assuming a piece-wise constant model –H(t) is the sum of the pieces. –For each ‘piece’ before time ‘t’, compute: Product of the estimated ‘h i ’ for the interval multiplied by the length of the interval it is based on. H(t) estimation (2)
02/ Simple approach (cont) –Add these up across all ‘pieces’ before time ‘t’. Width of last ‘piece’ is up to ‘t’ only –Relates to the density method from epi H(t) estimation (3)
02/ H(t) estimation based on piecewise estimation of h(t)
H(t) estimation (4) The formula are as follows. The sum is done at the end of each interval. 02/201626
02/ Nelson-Aalen estimator for H(t) Apply above approach but define intervals by using the time points for events Most commonly used approach to estimate H(t) Related to Kaplan-Meier method Compute H(t) at each time when event happens: H(t) estimation (5) d i = # with event at ‘t i ’ n i = size of risk set at ‘t i ’
02/ Approach #4 to estimate H(t) Use -log(S(t)) From our basic formulae, we have: Estimate S(t) and convert using this formula H(t) estimation (6)
02/ For those who care, methods 3 and 4 are very similar From KM, the estimate of S(t) is: H(t) estimation (7)
02/ Hence, we have: But, for small values, we have: So, we get: H(t) estimation (8)
02/ Numerical example IDTime(mons)Censored 114XXXXX XXXXX 545XXXXX XXXXX 992XXXXX 10111XXXXX Very coarse: 10 events in 10 years
Year# people under follow- up # lost# people dying in this year h(t)H(t) / Actuarial Method for h(t) Year# people under follow- up # lost# people dying in this year h(t)H(t) Year# people under follow- up # lost# people dying in this year h(t)H(t) Year# people under follow- up # lost# people dying in this year h(t)H(t)
Nelson-Aalen estimate of H(t) IntervalComputationH(t) from actuarial method 0-22H(t) = H(t) = H(t) = H(t) = H(t) = /201633
02/ A new example
02/ H(t) has many uses, largely based on: Applications of H(t) (1)
Applications of H(t) (2) Nelson-Aalen estimate of H(t) gives another way to estimate S(t). Uses formula: 02/201636
Applications of H(t) (3) 02/ IntervalH(t)S(t)Cum Incid(t)
02/201638
Applications of H(t) (4) log(-log(S(t)) and Proportional Hazards Can plot this in SAS using the ‘p=ls’ option 02/201639
02/ Key for testing proportional hazards assumption (later) Applications of H(t) (5)
Applications of H(t) (6) 02/ Suppose the hazard is a constant (λ), then we have: Plot ‘ln(S(t))’ against ‘t’. A straight line indicates a constant hazard. Approach can be used to test other models (e.g. Weibull).
Example (from Allison) Recidivism data set –432 male inmates released from prison –Followed for 52 weeks –Dates of re-arrests were recorded –Study designed to examine the impact of a financial support programme on reducing re- arrest 02/201642
02/201643
02/ Simple hazard estimates using actuarial method Adjusted hazard estimates using actuarial method: last interval ends at 53 weeks, not 60 weeks
02/201645
02/201646
02/201647