Download presentation
Presentation is loading. Please wait.
Published byJoshua Gallagher Modified over 9 years ago
1
01/20141 EPI 5344: Survival Analysis in Epidemiology Quick Review and Intro to Smoothing Methods March 4, 2014 Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa
2
01/20142 Objectives (for entire session) Primary goal is to address two key concepts: –Hazard estimation role in survival methods –Methods to compare two survival curves using non- parametric methods
3
01/20143 Objectives (for entire session) Review –Survival concepts –Hazard Smoothing methods Methods for estimation of hazard Proportional hazards Non-regression comparison of survival curves –Log-rank test –Variations of log-rank test Relate Hazard/ID to person-time
4
Review Material, Session #1 01/20144
5
5 Time ‘0’ (1) Time is usually measured as ‘calendar time’ Patient #1enters on Feb 15, 2000 & dies on Nov 8, 2000 Patient #2enters on July 2, 2000 & is lost (censored) on April 23, 2001 Patient #3Enters on June 5, 2001 & is still alive (censored) at the end of the follow-up period Patient #4Enters on July 13, 2001 and dies on December 12, 2002
6
01/20146 Study course for patients in cohort 2001 2003 2013
7
01/20147
8
8 Histogram of death time -Skewed to right -pdf or f(t) -CDF or F(t) -Area under ‘pdf’ from ‘0’ to ‘t’ t F(t)
9
01/20149 Survival curves (3) Plot % of group still alive (or % dead) S(t) = survival curve = % still surviving at time ‘t’ = P(survive to time ‘t’) Mortality rate = 1 – S(t) = F(t) = Cumulative incidence
10
01/201410 Deaths CI(t) Survival S(t) t S(t) 1-S(t)
11
01/201411 Essentially, you are re-scaling S(t) so that S * (t 0 ) = 1.0 Conditional Survival Curves
12
01/201412 S * (t) = survival curve conditional on surviving to ‘t 0 ‘ CI * (t) = failure/death/cumulative incidence at ‘t’ conditional on surviving to ‘t 0 ‘ Hazard at t 0 is defined as: ‘the slope of CI * (t) at t 0 ’ Hazard (instantaneous) Force of Mortality Incidence rate Incidence density Range: 0 ∞
13
01/201413 Some relationships If the rate of disease is small: CI(t) ≈ H(t) If we assume h(t) is constant (= ID): CI(t)≈ID*t
14
01/201414 DEAD p1p1 1- p 1 p2p2 1- p 2 p3p3 1- p 3 Year 0 Year 1 Year 2 Year 3
15
01/201415 Actuarial Method ABCDEFGH Year# people under follow-up # lost# people dying in this year Effective # at risk Prob die in year Prob survive this year S(t) 0-11000 011 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 ABCDEFGH Year# people under follow-up # lost# people dying in this year Effective # at risk Prob die in year Prob survive this year S(t) 0-11000 011 1-210119.50.1050.895 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 ABCDEFGH Year# people under follow-up # lost# people dying in this year Effective # at risk Prob die in year Prob survive this year S(t) 0-11000 011 1-210119.50.1050.895 2-380180.1250.8750.783 3-4 4-5 5-6 6-7 7-8 8-9 9-10 ABCDEFGH Year# people under follow-up # lost# people dying in this year Effective # at risk Prob die in year Prob survive this year S(t) 0-11000 011 1-210119.50.1050.895 2-380180.1250.8750.783 3-472160.1670.8330.652 4-54004010.652 5-640140.250.750.489 6-73103.5010.489 7-82102.5010.489 8-91101.5010.489 9-100000010.489
16
01/201416 Kaplan-Meier method ‘i'time# deaths # in risk set Prob die in interval Prob survive interval S(t 1 ) 00--- 1.0 122190.1110.889 2 3 4 ‘i'time# deaths # in risk set Prob die in interval Prob survive interval S(t 1 ) 00--- 1.0 122190.1110.889 229180.1250.8750.778 3 4 ‘i'time# deaths # in risk set Prob die in interval Prob survive interval S(t 1 ) 00--- 1.0 122190.1110.889 229180.1250.8750.778 346150.2000.8000.622 4 ‘i'time# deaths # in risk set Prob die in interval Prob survive interval S(t 1 ) 00--- 1.0 122190.1110.889 229180.1250.8750.778 346150.2000.8000.622 461140.2500.7500.467
17
END OF REVIEW MATERIAL 01/201417
18
Smoothing methods Naïve non-parametric regression ‘windows’ Sliding windows Local averaging Kernel estimation 01/201418
19
01/201419
20
01/201420
21
01/201421
22
01/201422
23
01/201423
24
01/201424
25
01/201425
26
Sliding windows (1) The divisions we used created five ‘windows’ into the data. –Within each window, we computed the mean ‘X’ and ‘Y’ and plotted that point for the regression line Why do we need to make the windows ‘fixed’? –Define the width of a window –Slide it from left to right –Compute the ‘window-specific data point’ and plot as before. The essence of ‘smoothing’. 01/201426
27
01/201427
28
Sliding windows (2) The size of the window is a ‘tuning parameter’. –Fixed number of neighboring data points –Fixed width include all points inside Large windows tend to ‘over-smooth’ Small windows do little smoothing and show the random noise. 01/201428
29
Window-specific data point (1) Many ways to compute the representative data point for the window: –X-value Mean of the x’s in window Median of the x’s in window Define window around a specific data point and use that x-value –Y-value Mean of the y’s in the window Median of the y’s in the window Do a regression (linear, quadratic or cubic) of data points in window –use the predicted ‘y’ for the selected ‘x’ 01/201429
30
01/201430
31
Window-specific data point (2) Can ‘weight’ data points –Points closer to the middle should provide more information about the true (x,y) than those further away. The weights are called a ‘kernel’. The method is called ‘Kernel Smoothing’ 01/201431
32
Window-specific data point (3) Many weight functions (kernels) can be used. A common one is the tricube weight Select an ‘x i ’ –Define the window around x i to get points inside window –For each point inside the window let ‘z ij ’ measure how far the point ‘x ij ’ is from the left boundary of the window towards the right boundary –-1 means on the left boundary –+1 means on the right boundary –Then the weight for that point is given by: 01/201432
33
01/201433
34
LOWESS LOWESS = LOcally WEighted Scatterplot Smoothing Use above procedure but compute a linear regression of ‘x’ on ‘y’ and use the regression equation to estimate ‘y i ’ for given ‘x i ’ Implemented in SAS as a PROC (LOESS) –Available through ODS Graphics and elsewhere Can use a higher order polynomial regression instead of the linear model –Linear model is usually OK ‘Tuning’ done by varying the percentage of the data set included in the window. –Empirical/feel are ‘best’ for choosing tuning –Some statistics are available (e.g. residuals) but that is advanced material 01/201434
35
01/201435
36
01/201436
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.