Presentation is loading. Please wait.

Presentation is loading. Please wait.

01/20151 EPI 5344: Survival Analysis in Epidemiology Age as time scale March 31, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.

Similar presentations


Presentation on theme: "01/20151 EPI 5344: Survival Analysis in Epidemiology Age as time scale March 31, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive."— Presentation transcript:

1 01/20151 EPI 5344: Survival Analysis in Epidemiology Age as time scale March 31, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa

2 01/20152 Objectives Choice of time scale for observational epidemiology Risk-set based analysis approaches

3 01/20153 Example Study (1) Are Uranium miners at risk for dying from lung cancer? –Uranium is radioactive and has a complex decay process

4 01/20154

5 5 Example Study (1) Are Uranium miners at risk for dying from lung cancer? –Uranium is radioactive and has a complex decay process –Miners work in enclosed areas with high levels of radioactive dust –Is there evidence that their health is affected?

6 01/20156 Example Study (2) Colorado Plateau study –Subject eligibility Worked underground in uranium mines in the four-state Colorado Plateau area –at least one month of work 2,500 mines in target area Examined at least once by Public Health Service MDs between 1950 and 1960 –Followed-up to Dec 31, 1982 Vital Stats records –Death –Lung cancer death

7 01/20157 Example Study (3) Entry date: –latest of: one month of work and exam by MD January 1, 1952 Main outcome –death from lung cancer

8 01/20158 Example Study (4) Exposure: –43,000 direct measurements of radon levels in mines between 1951 and 1968 –Converted to annual exposure –Combined with worker’s ‘in mine’ work time –Generated Working-Level months (WLM) WL = 20.8 µJ (microjoules) alpha energy per cubic meter (m 3 ) air WLM = 1 WL exposure for 170 hours –Cumulated in five year age intervals 0-5; 5-10; 10-15; 15-20; 20-25; ….

9 01/20159 agest = age at entry to study ageexit = age at exit from study ind = died from lung cancer (=1) rexp20 = WLMs from age 15-20

10 Example study (5) ItemNumberPercent Sample size3,347 Dying (any cause)1,25838% Lung cancer death 2587.7% Lung cancer as proportion of all deaths 20.5% 01/201510 How to apply survival analysis methods to this data?

11 Example study (6) Based on course to now: –Time is the number of years (month, days, etc.) from initial entry into the study –Time ‘0’ is the entry date –End of follow-up Death (or death from lung cancer) Censored if –lost –died from ‘wrong cause’ 01/201511

12 Example study (7) Based on course to now: –Exposure is time varying Cumulative Mean Peak –We will look at exposure to more than 500 WLM –Use PHREG to generate HR estimates 01/201512

13 Choosing a time scale (1) Time scale choices include: –Age –Calendar year –Time since entry into study –Time since initial employment 01/201513

14 Choosing a time scale (2) Cox model is: Choice of time scale affects the shape of the baseline hazard It also affects which people belong together in a risk set Betas will have different values 01/201514

15 Choosing a time scale (3) Time on study –Hazard affected by cumulative exposure Length of time for disease to develop post-exposure –Usually a ‘gentle’ increase –Risk set groups people with same time post-entry Combines people of different ages Averages age-specific hazards 01/201515

16 Choosing a time scale (4) The actual year (calendar time) –Hazard affected by Temporal changes in exposure or risk –increased air pollution –climate change –legislation –Changes usually slow –Hazard is fairly constant, controlling for age, etc. –Risk set groups people in same years –Most commonly used for trend analyses with Poisson regression models 01/201516

17 Choosing a time scale (5) Age –Hazard affected by Cumulative exposure Aging –Often shows a very strong effect on hazard Prostate cancer hazard increases ‘super- exponentially’ –Risk set groups people of the same age Ignores how long you have been ‘on study’ 01/201517

18 Choosing a time scale (6) Choices are not independent –One year of follow-up increases all three time scale measures by one year Cox models ‘work’ best if the baseline hazard captures a lot of hazard variation 01/201518

19 Choosing a time scale (7) For an RCT, ‘time on study’ is appropriate –follow-up time is usually short –Intervention has a strong effect, overwhelms age effect For etiological studies –Risk increases with age –Risk relates to exposure, not to length of time since study entry –Length of time is a proxy for cumulative exposure For etiological studies, several people have studied the choice of time scale 01/201519

20 Choosing a time scale (8) Breslow et al (1983) –Time-on-study as time scale fine for RCT’s, etc. –Not optimal for cohort studies Most outcome death rates increase rapidly with age –Want to maximize control of the age effect Time-on-study often strongly correlated with cumulative exposure –Can produce negative bias if used as time scale 01/201520

21 Choosing a time scale (9) Breslow et al (1983) –Recommendation Use age as time scale Stratify by calendar time (5 year groups) –Risk sets consist of people at the same age in each calendar group –Ignores length of time since entry as factor –Subjects are left truncated (‘late entry’) Time ‘0’ is ‘birth date’ 01/201521

22 Choosing a time scale (10) Korn et al (1997) –Cox models don’t specify a form for h(t) –Best choice of time scale is the one which has the biggest impact on the hazard function shape NOT the biggest impact on the HR! –Which would differ the most: hazard for people aged 50 vs. aged 60, both with 10 years of follow-up? hazard for two 55 year olds, one with 5 years of follow-up and one with 15 years? –Cannot study in the effect of the time scale variable 01/201522

23 Choosing a time scale (11) Korn et al –Recommendation Use age as time scale Stratify by year of birth (birth cohort) –5 year groups are commonly used –Essentially the same model as proposed by Breslow et al 01/201523

24 Choosing a time scale (12) Korn et al –Considered s second model (commonly used): ‘Time-on-study’ as time scale Adjust for age at entry in model –Results are the same as having age as time scale if: h 0 (t) is exponential in age –can give strong bias, especially with time- dependent covariates. 01/201524

25 Choosing a time scale (13) 01/201525 Uranium miners study 4 different time scales Differences are not big HR/RR all around 3.5-5.2 ‘Age’ is used as time scale in rest of session Time scaleRR95% CI Time since entry4.73.5 – 6.3 Time since first mining3.62.7 – 4.9 Calendar year5.23.9 – 6.9 Age4.33.2 – 5.7

26 Implications for Analysis Age is the time variable –Left truncated –Requires ‘late entry’ methods Compute exposure as a time varying variable –Cumulative –Mean Analysis option #1: –Use regular Cox model Other options –risk set modelling methods 01/201526

27 Regular Cox models (1) Uses the Phreg approach Time varying exposure –e.g. use ‘500 WLM’ as time varying cut-point SAS code uses programming statements within Phreg Data file uses layout shown earlier 01/201527

28 * model has ageexit as failure time, ind as failure indicator and agest as entry time; proc phreg data=u.uminers; model ageexit*ind(0)=cr500 / entry=agest risklimits; * Time-dependent programming steps- see PHREG documentation; array rexp {18} rexp5 rexp10 rexp15 rexp20 rexp25 rexp30 rexp35 rexp40 rexp45 rexp50 rexp55 rexp60 rexp65 rexp70 rexp75 rexp80 rexp85 rexp90; m = min((ageexit-2)/5,18); i = 0; cradon = 0; do while (i < m); if (m > (i+1)) then do; cradon = cradon + rexp[i+1]; end; else do; cradon = cradon + (m-i)*(rexp[i+1]); end; i = i+1; end; * Determine whether cumulative radon is >= 500 WLM; cr500 = (cradon >= 500); run; 01/201528

29 proc phreg data=u.uminers; model ageexit*ind(0)=cr500 / entry=agest risklimits; /***** CODE REMOVED FOR CLARITY *****/ cr500 = (cradon >= 500); run; 01/201529

30 Regular Cox models (2) Could do with counting process style input –Need to create one record for each subject for each year. –Code gets complex –I won’t show this Either way, Phreg needs to: –create risk set data for each risk set –compute time varying covariates –do the MLE algorithm Time consuming process –BUT, not a big issue with modern computers. 01/201530

31 Risk Set Methods (1) A Different Approach Use the data step to create new data set with the risk set data Risk set grouped data –series of records for each risk set –one line for each subject in the risk set Code is complex (not shown) 01/201531

32 01/201532

33 01/201533

34 Risk Set Methods (2) How can we use this data? Consider any risk set (take risk set #1) Can represent data as 2x2 table 01/201534 Lung CAno Lung CA >=500 WLM145 < 500 WLM088 11213

35 Risk Set Methods (3) Treat each risk set as a stratum –matched on age (the time scale variable) Combine tables into an overall estimate –Mantel-Haenzel methods could be used Better approach –Conditional logistic regression. Can do this using either: –Proc Logistic –Proc Phreg Likelihood functions are identical 01/201535

36 Risk Set Methods (4) Three approaches can be used to do these analyses: –the ‘bit of time’ method (phreg) –the ‘separate strata’ method (phreg) –the ‘binary data’ method (logistic) 01/201536

37 Risk Set Methods (5) Approach #1 (‘bit of time’ method) –Use Phreg –Treat the risk set file as a counting process structure –Need to add an ‘entry’ and ‘exit’ time for each subject in each risk set 01/201537

38 Risk Set Methods (6) Approach #1 (‘bit of time’ method) –Need to add an ‘entry’ and ‘exit’ time for each subject in each risk set exit time –age when the risk set occurred entry time –exit time – 0.001 –0.001 is arbitrary but the math works (trust me ) 01/201538

39 01/201539

40 Risk Set Methods (7) Approach #1 (‘bit of time’ method) –Ignores all of the time between risk sets –Seem weird but the math works (trust me ) 01/201540

41 proc phreg data=cumexp; model _rstime*_cc(0)=cr500 / entry=_rsentry rl; run; 01/201541

42 Risk Set Methods (8) Approach #2 (separate strata method) –Use Phreg –Number the risk sets from 1 to n –Use the risk set ID number as the time variable! Seems weird Risk set ID is not actually a ‘time’ But the math works (trust me ) –No need for a late entry variable 01/201542

43 01/201543

44 proc phreg data=cumexp nosummary; model _setno*_cc(0)=cr500 / rl; strata _setno; run; 01/201544 Identical to Method #1

45 Risk Set Methods (9) Approach #3 (binary data method) –Uses Proc Logistic –Treats each risk set as a stratum Remember my 2x2 table from an earlier slide –Uses conditional logistic regression Condition on the risk set ID Not interested in OR or RR for each risk set –just ‘nuisance’ parameters Including strata parameter can lead to strong bias 01/201545

46 Risk Set Methods (10) Approach #3 (binary data method) –Stratify by the risk set ID similar to STRATA statement in Phreg –Model yields an OR. with this sampling approach, OR = RR the math works (trust me ) 01/201546

47 01/201547

48 proc logistic data=cumexp descending; model _cc=cr500 / clodds=wald; strata _setno; run; 01/201548 Identical to Method #1

49 Risk Set Methods (11) All three methods gave the same results. –Results are not quite the same as initial Phreg analysis (with age as the time scale): 01/201549 MethodHR (RR)95% CI Regular Phreg4.2633.175 – 5.722 Risk sets4.2673.179 – 5.728

50 Risk Set Methods (12) Why bother with risk set method? –Some people claim it is faster I didn’t see this effect If true, is this an issue with modern computers? does 1 sec vs. 2 secs matter? 01/201550 RegularRS #1RS #2RS #3 0.391.650.471.71

51 Risk Set Methods (13) Why bother with risk set method? –Can handle random effects code better (I am told) –More easily extends to nested case-control and case-cohort methods. 01/201551

52 01/201552 Full risk data 1 ‘case’ per risk set Multiple non- cases

53 Nested case-control (1) Most studies will have hundreds or thousands of non-cases in each risk set. Suppose we needed to collect new exposure information on all subjects –genotyping Gets very expensive to use whole cohort. 01/201553

54 Nested case-control (2) Do we need all of the non-cases in each risk set? NO!!! 01/201554

55 Nested case-control (3) Select a random sample of non-cases from each risk set –Usually a small number 4 is common up to 20 in pharmaco-epidemiology studies A person can be used more than once –Multiple time as control –As control and case Collect new exposure information only on selected subjects Analyze using only these subjects Use any of the three risk set methods shown here 01/201555

56 Nested case-control (4) Will give an unbiased estimate of the true HR/RR 95% confidence intervals will be larger Why does it work? Go back to the Partial Likelihood for Cox models 01/201556

57 The final likelihood contribution from each risk set is: For the nested case-control, the likelihood contribution is given by: 01/201557

58 Nested case-control (5) Likelihoods are the same form –denominator sums over the available risk set Can vary method of non-case selection –random sample –matched –counter-matched Easily extends to case-cohort design –Select a random sample from initial cohort –Entire sample is retained as the risk set members through- out follow-up treats case status as a time varying covariate 01/201558

59 Summary Observational epidemiology analysis is more complex than an RCTs Survival methods generalize –discrete time methods –risk set approaches Choice of time scale More information on Langholz’s web site –Risk set analysis course, Lanhgolz, USCRisk set analysis course, Lanhgolz, USC 01/201559

60 01/201560


Download ppt "01/20151 EPI 5344: Survival Analysis in Epidemiology Age as time scale March 31, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive."

Similar presentations


Ads by Google