Presentation is loading. Please wait.

Presentation is loading. Please wait.

03/20131 EPI 5344: Survival Analysis in Epidemiology Risk Set Analysis Approaches April 16, 2013 Dr. N. Birkett, Department of Epidemiology & Community.

Similar presentations


Presentation on theme: "03/20131 EPI 5344: Survival Analysis in Epidemiology Risk Set Analysis Approaches April 16, 2013 Dr. N. Birkett, Department of Epidemiology & Community."— Presentation transcript:

1 03/20131 EPI 5344: Survival Analysis in Epidemiology Risk Set Analysis Approaches April 16, 2013 Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa

2 03/20132 Objectives Choice of time scale for observational epidemiology Risk-set based analysis approaches

3 03/20133 Example Study (1) Are Uranium miners at risk for dying from lung cancer? –Uranium is radioactive and has a complex decay process

4 03/20134

5 5 Example Study (1) Are Uranium miners at risk for dying from lung cancer? –Uranium is radioactive and has a complex decay process –Miners work in enclosed areas with high levels of radioactive dust –Is there evidence that their health is affected?

6 03/20136 Example Study (2) Colorado Plateau study –Eligibility Worked underground in uranium mines in the four-state Colorado Plateau area –at least one month of work –2,500 mines in target area Examined at least once by Public Health Service MDs between 1950 and 1960 –Followed-up to Dec 31, 1982 Vital Stats records –Death –Lung cancer death

7 03/20137 Example Study (3) Entry date: –latest of: one month of work and exam by MD January 1, 1952 Main outcome –death from lung cancer

8 03/20138 Example Study (4) Exposure: –43,000 direct measurements of radon levels in mines between 1951 and 1968 –Converted to annual exposure –Combined with worker’s ‘in mine’ work time –Generated Working-Level months (WLM) WL = 20.8 µJ (microjoules) alpha energy per cubic meter (m3) air WLM = 1 WL exposure for 170 hours –Cumulated in five year age intervals 0-5; 5-10; 10-15; 15-20; 20-25; ….

9 03/20139 agest = age at entry to study ageexit = age at exit from study ind = died from lung cancer (=1) rexp20 = WLMs from age 15-20

10 Example study (5) ItemNumberPercent Sample size3,347 Dying (any cause)1,25838% Lung cancer death 2587.7% Lung cancer as proportion of all deaths 20.5% 03/201310 How to apply survival analysis methods to this data?

11 Time scale issues (1) Time scale –Age –Calendar year –Time since entry into study –Time since initial employment Choice affects the shape of the baseline hazard function Risk increases with age Risk relates to radiation exposure, not to length of time since study entry 03/201311

12 Time scale issues (3) Breslow et al (1983) –time-on-study as time scale fine for RCT’s, etc. –Not optimal for cohort studies most outcome death rates increase rapidly with age –Want to maximize control of the age effect time-on-study often strongly correlated with cumulative exposure –Can produce negative bias if used as time scale 03/201312

13 Time scale issues (3) Breslow et al (1983) –Recommendation Age as time scale Stratify by calendar time (5 year groups) –Risk sets consist of people at the same age in each calendar group –Ignores length of time since entry as factor –Has ‘late entry’ Effective time ‘0’ is ‘birth date’ 03/201313

14 Time scale issues (4) Korn et al (1997) –Cox models don’t need a form for h(t) –Best choice of time scale is the one which has the biggest impact on the hazard function shape NOT the biggest impact on the HR! –Which would differ the most: hazard for someone aged 50 vs. aged 60 with 10 years of follow-up? hazard for two 55 year olds, one with 5 years of follow-up and one with 15 years? We’re not interested in the effect of the time scale variable on the outcome 03/201314

15 Time scale issues (5) Korn et al –Recommendation age as time scale stratify by year of birth (birth cohort) –5 year groups are commonly used –Essentially the same model as proposed by Breslow et al 03/201315

16 Time scale issues (6) Korn et al –Also considered this model (commonly used): time-on-study as time scale Adjust for age at entry in model –Results are the same as having age as time scale if: h 0 (t) is exponential in age –can give strong bias, especially with time- dependent covariates. 03/201316

17 Time scale issues (7) Time scaleRR95% CI Time since entry4.73.5 – 6.3 Time since first mining3.62.7 – 4.9 Calendar year5.23.9 – 6.9 Age4.33.2 – 5.7 03/201317 Uranium miners study 4 different time scales not huge differences ‘Age’ is used as time scale in rest of session

18 Analysis 1 (1) Regular Cox model Age as time scale –requires late entry methods Time varying exposure –compare ‘ 500 WLM’ SAS code uses programming statements within Phreg Data file uses layout shown earlier 03/201318

19 * model has ageexit as failure time, ind as failure indicator and agest as entry time; proc phreg data=u.uminers; model ageexit*ind(0)=cr500 / entry=agest risklimits; * Time-dependent programming steps- see PHREG documentation; array rexp {18} rexp5 rexp10 rexp15 rexp20 rexp25 rexp30 rexp35 rexp40 rexp45 rexp50 rexp55 rexp60 rexp65 rexp70 rexp75 rexp80 rexp85 rexp90; m = min((ageexit-2)/5,18); i = 0; cradon = 0; do while (i < m); if (m > (i+1)) then do; cradon = cradon + rexp[i+1]; end; else do; cradon = cradon + (m-i)*(rexp[i+1]); end; i = i+1; end; * Determine whether cumulative radon is >= 500 WLM; cr500 = (cradon >= 500); run; 02/201319

20 proc phreg data=u.uminers; model ageexit*ind(0)=cr500 / entry=agest risklimits; /***** CODE REMOVED FOR CLARITY *****/ cr500 = (cradon >= 500); run; 02/201320

21 Analysis 1 (2) Could do with counting process style input –Need to create one record for each subject for each year. –Code gets complex –I won’t do this Either way, analysis needs to: –create risk set data –compute time varying covariates –do the MLE algorithm Time consuming process –BUT, not a big issue with modern computers. 03/201321

22 Analysis 2 (1) A Different Approach Risk set grouped data –series of records for each risk set –one line per subject in the risk sets Code is complex (not shown) Data file looks like this 03/201322

23 03/201323

24 03/201324

25 03/201325

26 Analysis 2 (2) How do we use this data? Consider any risk set (take risk set #1) Can represent data as 2x2 table 03/201326 Lung CAno Lung CA >=500 WLM145 < 500 WLM088 11213

27 Analysis 2 (3) Treat each risk set as a stratum –matched on age (the time scale variable) Combine tables into an overall estimate –Mantel-Haenzel methods could be used Better approach –Conditional logistic regression. Can do this using either: –Proc Logistic –Proc Phreg Likelihood functions are identical 03/201327

28 Analysis 2 (4) Approach #1 (‘bit of time’ method) –Use Phreg –Need an ‘entry’ and ‘exit’ time for each subject in each risk set –exit time age when the risk set occurred –entry time exit time – 0.001 0.001 is arbitrary but the math works (trust me ) 03/201328

29 03/201329

30 proc phreg data=cumexp; model _rstime*_cc(0)=cr500 / entry=_rsentry rl; run; 02/201330

31 Analysis 2 (5) Approach #2 (separate strata method) –Use Phreg –Uses the risk set ID number as the time variable! Seems weird Risk set ID is not actually a ‘time’ but the math works (trust me ) –No need for a late entry variable 03/201331

32 03/201332

33 proc phreg data=cumexp nosummary; model _setno*_cc(0)=cr500 / rl; strata _setno; run; 02/201333 Identical to Method #1

34 Analysis 2 (6) Approach #3 (binary data method) –Use Logistic –Need to condition on the risk set ID not interested in OR or RR for each risk set –just ‘nuisance’ parameters –Including them in model can lead to strong bias –stratify by the risk set ID stratification removes them from model similar to STRATA statement in Phreg –Model yields an OR. with this sampling approach, OR = RR the math works (trust me ) 03/201334

35 03/201335

36 proc logistic data=cumexp descending; model _cc=cr500 / clodds=wald; strata _setno; run; 02/201336 Identical to Method #1

37 Analysis 2 (7) All three methods gave the same results. –Results are not quite the same as initial Phreg analysis: 03/201337 MethodHR (RR)95% CI Regular Phreg4.2633.175 – 5.722 Risk sets4.2673.179 – 5.728

38 Analysis 2 (8) Why bother with risk set method? Some people claim it is faster –I didn’t see this effect –If true, is this an issue with modern computers? –does 1 sec vs. 2 secs matter? Can handle random effects code better (I am told) More easily extends to nested case-control and case-cohort methods. 03/201338 RegularRS #1RS #2RS #3 0.391.650.471.71

39 03/201339 Full risk data 1 ‘case per risk set Multiple non- cases

40 Nested case-control (1) Most studies will have hundreds or thousands of non-cases in each risk set. Suppose we needed to collect new exposure information on all subjects –genotyping gets very expensive to use whole cohort. Do we need all of the non-cases in each risk set? 03/201340

41 Nested case-control (2) NO!!! Select a random sample of cases from each risk set –usually a small number 1 through 20 collect new exposure information only on select subjects Analyze using only these subjects Use any of the three risk set methods shown here 03/201341

42 Nested case-control (3) Will give an unbiased estimate of the true HR/RR 95% confidence intervals will be larger Why does it work? Go back to the Partial Likelihood for Cox models 03/201342

43 The final likelihood contribution from each risk set is: For the nested case-control, the likelihood contribution is given by: 02/201343

44 Nested case-control (4) PL’s are the same form –denominator sums over the available risk set Can vary method of non-case selection –random sample –matched –counter-matched Easily extends to case-cohort design –Select a random sample from initial cohort –retained as the risk set members through-out follow- up 03/201344

45 Summary Observational epidemiology analysis is more complex than an RCT Survival methods generalize –discrete time methods –risk set approaches Choice of time scale More information on Langholz’s web site –Risk set analysis course, Lanhgolz, USCRisk set analysis course, Lanhgolz, USC 03/201345

46 03/201346


Download ppt "03/20131 EPI 5344: Survival Analysis in Epidemiology Risk Set Analysis Approaches April 16, 2013 Dr. N. Birkett, Department of Epidemiology & Community."

Similar presentations


Ads by Google