Download presentation
Presentation is loading. Please wait.
Published byDoris Harrington Modified over 8 years ago
1
03/20131 EPI 5344: Survival Analysis in Epidemiology Risk Set Analysis Approaches April 16, 2013 Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa
2
03/20132 Objectives Choice of time scale for observational epidemiology Risk-set based analysis approaches
3
03/20133 Example Study (1) Are Uranium miners at risk for dying from lung cancer? –Uranium is radioactive and has a complex decay process
4
03/20134
5
5 Example Study (1) Are Uranium miners at risk for dying from lung cancer? –Uranium is radioactive and has a complex decay process –Miners work in enclosed areas with high levels of radioactive dust –Is there evidence that their health is affected?
6
03/20136 Example Study (2) Colorado Plateau study –Eligibility Worked underground in uranium mines in the four-state Colorado Plateau area –at least one month of work –2,500 mines in target area Examined at least once by Public Health Service MDs between 1950 and 1960 –Followed-up to Dec 31, 1982 Vital Stats records –Death –Lung cancer death
7
03/20137 Example Study (3) Entry date: –latest of: one month of work and exam by MD January 1, 1952 Main outcome –death from lung cancer
8
03/20138 Example Study (4) Exposure: –43,000 direct measurements of radon levels in mines between 1951 and 1968 –Converted to annual exposure –Combined with worker’s ‘in mine’ work time –Generated Working-Level months (WLM) WL = 20.8 µJ (microjoules) alpha energy per cubic meter (m3) air WLM = 1 WL exposure for 170 hours –Cumulated in five year age intervals 0-5; 5-10; 10-15; 15-20; 20-25; ….
9
03/20139 agest = age at entry to study ageexit = age at exit from study ind = died from lung cancer (=1) rexp20 = WLMs from age 15-20
10
Example study (5) ItemNumberPercent Sample size3,347 Dying (any cause)1,25838% Lung cancer death 2587.7% Lung cancer as proportion of all deaths 20.5% 03/201310 How to apply survival analysis methods to this data?
11
Time scale issues (1) Time scale –Age –Calendar year –Time since entry into study –Time since initial employment Choice affects the shape of the baseline hazard function Risk increases with age Risk relates to radiation exposure, not to length of time since study entry 03/201311
12
Time scale issues (3) Breslow et al (1983) –time-on-study as time scale fine for RCT’s, etc. –Not optimal for cohort studies most outcome death rates increase rapidly with age –Want to maximize control of the age effect time-on-study often strongly correlated with cumulative exposure –Can produce negative bias if used as time scale 03/201312
13
Time scale issues (3) Breslow et al (1983) –Recommendation Age as time scale Stratify by calendar time (5 year groups) –Risk sets consist of people at the same age in each calendar group –Ignores length of time since entry as factor –Has ‘late entry’ Effective time ‘0’ is ‘birth date’ 03/201313
14
Time scale issues (4) Korn et al (1997) –Cox models don’t need a form for h(t) –Best choice of time scale is the one which has the biggest impact on the hazard function shape NOT the biggest impact on the HR! –Which would differ the most: hazard for someone aged 50 vs. aged 60 with 10 years of follow-up? hazard for two 55 year olds, one with 5 years of follow-up and one with 15 years? We’re not interested in the effect of the time scale variable on the outcome 03/201314
15
Time scale issues (5) Korn et al –Recommendation age as time scale stratify by year of birth (birth cohort) –5 year groups are commonly used –Essentially the same model as proposed by Breslow et al 03/201315
16
Time scale issues (6) Korn et al –Also considered this model (commonly used): time-on-study as time scale Adjust for age at entry in model –Results are the same as having age as time scale if: h 0 (t) is exponential in age –can give strong bias, especially with time- dependent covariates. 03/201316
17
Time scale issues (7) Time scaleRR95% CI Time since entry4.73.5 – 6.3 Time since first mining3.62.7 – 4.9 Calendar year5.23.9 – 6.9 Age4.33.2 – 5.7 03/201317 Uranium miners study 4 different time scales not huge differences ‘Age’ is used as time scale in rest of session
18
Analysis 1 (1) Regular Cox model Age as time scale –requires late entry methods Time varying exposure –compare ‘ 500 WLM’ SAS code uses programming statements within Phreg Data file uses layout shown earlier 03/201318
19
* model has ageexit as failure time, ind as failure indicator and agest as entry time; proc phreg data=u.uminers; model ageexit*ind(0)=cr500 / entry=agest risklimits; * Time-dependent programming steps- see PHREG documentation; array rexp {18} rexp5 rexp10 rexp15 rexp20 rexp25 rexp30 rexp35 rexp40 rexp45 rexp50 rexp55 rexp60 rexp65 rexp70 rexp75 rexp80 rexp85 rexp90; m = min((ageexit-2)/5,18); i = 0; cradon = 0; do while (i < m); if (m > (i+1)) then do; cradon = cradon + rexp[i+1]; end; else do; cradon = cradon + (m-i)*(rexp[i+1]); end; i = i+1; end; * Determine whether cumulative radon is >= 500 WLM; cr500 = (cradon >= 500); run; 02/201319
20
proc phreg data=u.uminers; model ageexit*ind(0)=cr500 / entry=agest risklimits; /***** CODE REMOVED FOR CLARITY *****/ cr500 = (cradon >= 500); run; 02/201320
21
Analysis 1 (2) Could do with counting process style input –Need to create one record for each subject for each year. –Code gets complex –I won’t do this Either way, analysis needs to: –create risk set data –compute time varying covariates –do the MLE algorithm Time consuming process –BUT, not a big issue with modern computers. 03/201321
22
Analysis 2 (1) A Different Approach Risk set grouped data –series of records for each risk set –one line per subject in the risk sets Code is complex (not shown) Data file looks like this 03/201322
23
03/201323
24
03/201324
25
03/201325
26
Analysis 2 (2) How do we use this data? Consider any risk set (take risk set #1) Can represent data as 2x2 table 03/201326 Lung CAno Lung CA >=500 WLM145 < 500 WLM088 11213
27
Analysis 2 (3) Treat each risk set as a stratum –matched on age (the time scale variable) Combine tables into an overall estimate –Mantel-Haenzel methods could be used Better approach –Conditional logistic regression. Can do this using either: –Proc Logistic –Proc Phreg Likelihood functions are identical 03/201327
28
Analysis 2 (4) Approach #1 (‘bit of time’ method) –Use Phreg –Need an ‘entry’ and ‘exit’ time for each subject in each risk set –exit time age when the risk set occurred –entry time exit time – 0.001 0.001 is arbitrary but the math works (trust me ) 03/201328
29
03/201329
30
proc phreg data=cumexp; model _rstime*_cc(0)=cr500 / entry=_rsentry rl; run; 02/201330
31
Analysis 2 (5) Approach #2 (separate strata method) –Use Phreg –Uses the risk set ID number as the time variable! Seems weird Risk set ID is not actually a ‘time’ but the math works (trust me ) –No need for a late entry variable 03/201331
32
03/201332
33
proc phreg data=cumexp nosummary; model _setno*_cc(0)=cr500 / rl; strata _setno; run; 02/201333 Identical to Method #1
34
Analysis 2 (6) Approach #3 (binary data method) –Use Logistic –Need to condition on the risk set ID not interested in OR or RR for each risk set –just ‘nuisance’ parameters –Including them in model can lead to strong bias –stratify by the risk set ID stratification removes them from model similar to STRATA statement in Phreg –Model yields an OR. with this sampling approach, OR = RR the math works (trust me ) 03/201334
35
03/201335
36
proc logistic data=cumexp descending; model _cc=cr500 / clodds=wald; strata _setno; run; 02/201336 Identical to Method #1
37
Analysis 2 (7) All three methods gave the same results. –Results are not quite the same as initial Phreg analysis: 03/201337 MethodHR (RR)95% CI Regular Phreg4.2633.175 – 5.722 Risk sets4.2673.179 – 5.728
38
Analysis 2 (8) Why bother with risk set method? Some people claim it is faster –I didn’t see this effect –If true, is this an issue with modern computers? –does 1 sec vs. 2 secs matter? Can handle random effects code better (I am told) More easily extends to nested case-control and case-cohort methods. 03/201338 RegularRS #1RS #2RS #3 0.391.650.471.71
39
03/201339 Full risk data 1 ‘case per risk set Multiple non- cases
40
Nested case-control (1) Most studies will have hundreds or thousands of non-cases in each risk set. Suppose we needed to collect new exposure information on all subjects –genotyping gets very expensive to use whole cohort. Do we need all of the non-cases in each risk set? 03/201340
41
Nested case-control (2) NO!!! Select a random sample of cases from each risk set –usually a small number 1 through 20 collect new exposure information only on select subjects Analyze using only these subjects Use any of the three risk set methods shown here 03/201341
42
Nested case-control (3) Will give an unbiased estimate of the true HR/RR 95% confidence intervals will be larger Why does it work? Go back to the Partial Likelihood for Cox models 03/201342
43
The final likelihood contribution from each risk set is: For the nested case-control, the likelihood contribution is given by: 02/201343
44
Nested case-control (4) PL’s are the same form –denominator sums over the available risk set Can vary method of non-case selection –random sample –matched –counter-matched Easily extends to case-cohort design –Select a random sample from initial cohort –retained as the risk set members through-out follow- up 03/201344
45
Summary Observational epidemiology analysis is more complex than an RCT Survival methods generalize –discrete time methods –risk set approaches Choice of time scale More information on Langholz’s web site –Risk set analysis course, Lanhgolz, USCRisk set analysis course, Lanhgolz, USC 03/201345
46
03/201346
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.