01/20151 EPI 5344: Survival Analysis in Epidemiology Age as time scale March 31, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.

Slides:



Advertisements
Similar presentations
Agency for Healthcare Research and Quality (AHRQ)
Advertisements

Topic: Several Approaches to Modeling Recurrent Event Data Presenter: Yu Wang.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Intermediate methods in observational epidemiology 2008 Instructor: Moyses Szklo Measures of Disease Frequency.
Correcting for measurement error in nutritional epidemiology Ruth Keogh MRC Biostatistics Unit MRC Centre for Nutritional Epidemiology in Cancer Prevention.
Chapter 11 Survival Analysis Part 3. 2 Considering Interactions Adapted from "Anderson" leukemia data as presented in Survival Analysis: A Self-Learning.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Using time-dependent covariates in the Cox model THIS MATERIAL IS NOT REQUIRED FOR YOUR METHODS II EXAM With some examples taken from Fisher and Lin (1999)
Cohort Studies.
Measures of disease frequency (I). MEASURES OF DISEASE FREQUENCY Absolute measures of disease frequency: –Incidence –Prevalence –Odds Measures of association:
COHORT AND CASE-CONTROL DESIGNS Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa SUMMER COURSE: INTRODUCTION TO EPIDEMIOLOGY.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
Survival Analysis Diane Stockton. Survival Curves Y axis, gives the proportion of people surviving from 1 at the top to zero at the bottom, representing.
How do cancer rates in your area compare to those in other areas?
01/20141 EPI 5344: Survival Analysis in Epidemiology Quick Review and Intro to Smoothing Methods March 4, 2014 Dr. N. Birkett, Department of Epidemiology.
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
G Lecture 121 Analysis of Time to Event Survival Analysis Language Example of time to high anxiety Discrete survival analysis through logistic regression.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
1 Using martingale residuals to assess goodness of fit for sampled risk set data Ørnulf Borgan Department of Mathematics University of Oslo Based on joint.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 2: Diagnostic Classification.
03/20141 EPI 5344: Survival Analysis in Epidemiology Log-rank vs. Mantel-Hanzel testing Dr. N. Birkett, Department of Epidemiology & Community Medicine,
01/20151 EPI 5344: Survival Analysis in Epidemiology Maximum Likelihood Estimation: An Introduction March 10, 2015 Dr. N. Birkett, School of Epidemiology,
01/20151 EPI 5344: Survival Analysis in Epidemiology Interpretation of Models March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
Time-dependent covariates and further remarks on likelihood construction Presenter Li,Yin Nov. 24.
A short introduction to epidemiology Chapter 2b: Conducting a case- control study Neil Pearce Centre for Public Health Research Massey University Wellington,
01/20151 EPI 5344: Survival Analysis in Epidemiology Time varying covariates March 24, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
01/20151 EPI 5344: Survival Analysis in Epidemiology Epi Methods: why does ID involve person-time? March 10, 2015 Dr. N. Birkett, School of Epidemiology,
01/20141 EPI 5344: Survival Analysis in Epidemiology Epi Methods: why does ID involve person-time? March 13, 2014 Dr. N. Birkett, Department of Epidemiology.
01/20151 EPI 5344: Survival Analysis in Epidemiology Survival curve comparison (non-regression methods) March 3, 2015 Dr. N. Birkett, School of Epidemiology,
Applied Epidemiologic Analysis - P8400 Fall 2002
HSRP 734: Advanced Statistical Methods July 17, 2008.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
HSRP 734: Advanced Statistical Methods July 31, 2008.
Epidemiologic design from a sampling perspective Epidemiology II Lecture April 14, 2005 David Jacobs.
03/20131 EPI 5344: Survival Analysis in Epidemiology Risk Set Analysis Approaches April 16, 2013 Dr. N. Birkett, Department of Epidemiology & Community.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Lecture 9: Analysis of intervention studies Randomized trial - categorical outcome Measures of risk: –incidence rate of an adverse event (death, etc) It.
11/20091 EPI 5240: Introduction to Epidemiology Confounding: concepts and general approaches November 9, 2009 Dr. N. Birkett, Department of Epidemiology.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
31/7/20091 Summer Course: Introduction to Epidemiology August 21, Confounding: control, standardization Dr. N. Birkett, Department of Epidemiology.
Satistics 2621 Statistics 262: Intermediate Biostatistics Jonathan Taylor and Kristin Cobb April 20, 2004: Introduction to Survival Analysis.
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 2: Aging and Survival.
01/20151 EPI 5344: Survival Analysis in Epidemiology Estimating S(t) from Cox models March 24, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
01/20151 EPI 5344: Survival Analysis in Epidemiology Confounding and Effect Modification March 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
01/20151 EPI 5344: Survival Analysis in Epidemiology Quick Review from Session #1 March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health &
01/20151 EPI 5344: Survival Analysis in Epidemiology Hazard March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
01/20141 EPI 5344: Survival Analysis in Epidemiology Estimating S(t) from Cox models April 1, 2014 Dr. N. Birkett, Department of Epidemiology & Community.
Case-Control Studies September 2014 Alexander M. Walker MD, DrPH With Sonia Hernández-Díaz MD, DrPH.
02/20161 EPI 5344: Survival Analysis in Epidemiology Hazard March 8, 2016 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
Additional Regression techniques Scott Harris October 2009.
EPI 5344: Survival Analysis in Epidemiology Week 6 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa 03/2016.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 8.1: Cohort sampling for the Cox model.
Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.
03/20161 EPI 5344: Survival Analysis in Epidemiology Estimating S(t) from Cox models March 29, 2016 Dr. N. Birkett, School of Epidemiology, Public Health.
Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.
03/20161 EPI 5344: Survival Analysis in Epidemiology Testing the Proportional Hazard Assumption April 5, 2016 Dr. N. Birkett, School of Epidemiology, Public.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
April 18 Intro to survival analysis Le 11.1 – 11.2
Statistics 262: Intermediate Biostatistics
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Presentation transcript:

01/20151 EPI 5344: Survival Analysis in Epidemiology Age as time scale March 31, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa

01/20152 Objectives Choice of time scale for observational epidemiology Risk-set based analysis approaches

01/20153 Example Study (1) Are Uranium miners at risk for dying from lung cancer? –Uranium is radioactive and has a complex decay process

01/20154

5 Example Study (1) Are Uranium miners at risk for dying from lung cancer? –Uranium is radioactive and has a complex decay process –Miners work in enclosed areas with high levels of radioactive dust –Is there evidence that their health is affected?

01/20156 Example Study (2) Colorado Plateau study –Subject eligibility Worked underground in uranium mines in the four-state Colorado Plateau area –at least one month of work 2,500 mines in target area Examined at least once by Public Health Service MDs between 1950 and 1960 –Followed-up to Dec 31, 1982 Vital Stats records –Death –Lung cancer death

01/20157 Example Study (3) Entry date: –latest of: one month of work and exam by MD January 1, 1952 Main outcome –death from lung cancer

01/20158 Example Study (4) Exposure: –43,000 direct measurements of radon levels in mines between 1951 and 1968 –Converted to annual exposure –Combined with worker’s ‘in mine’ work time –Generated Working-Level months (WLM) WL = 20.8 µJ (microjoules) alpha energy per cubic meter (m 3 ) air WLM = 1 WL exposure for 170 hours –Cumulated in five year age intervals 0-5; 5-10; 10-15; 15-20; 20-25; ….

01/20159 agest = age at entry to study ageexit = age at exit from study ind = died from lung cancer (=1) rexp20 = WLMs from age 15-20

Example study (5) ItemNumberPercent Sample size3,347 Dying (any cause)1,25838% Lung cancer death % Lung cancer as proportion of all deaths 20.5% 01/ How to apply survival analysis methods to this data?

Example study (6) Based on course to now: –Time is the number of years (month, days, etc.) from initial entry into the study –Time ‘0’ is the entry date –End of follow-up Death (or death from lung cancer) Censored if –lost –died from ‘wrong cause’ 01/201511

Example study (7) Based on course to now: –Exposure is time varying Cumulative Mean Peak –We will look at exposure to more than 500 WLM –Use PHREG to generate HR estimates 01/201512

Choosing a time scale (1) Time scale choices include: –Age –Calendar year –Time since entry into study –Time since initial employment 01/201513

Choosing a time scale (2) Cox model is: Choice of time scale affects the shape of the baseline hazard It also affects which people belong together in a risk set Betas will have different values 01/201514

Choosing a time scale (3) Time on study –Hazard affected by cumulative exposure Length of time for disease to develop post-exposure –Usually a ‘gentle’ increase –Risk set groups people with same time post-entry Combines people of different ages Averages age-specific hazards 01/201515

Choosing a time scale (4) The actual year (calendar time) –Hazard affected by Temporal changes in exposure or risk –increased air pollution –climate change –legislation –Changes usually slow –Hazard is fairly constant, controlling for age, etc. –Risk set groups people in same years –Most commonly used for trend analyses with Poisson regression models 01/201516

Choosing a time scale (5) Age –Hazard affected by Cumulative exposure Aging –Often shows a very strong effect on hazard Prostate cancer hazard increases ‘super- exponentially’ –Risk set groups people of the same age Ignores how long you have been ‘on study’ 01/201517

Choosing a time scale (6) Choices are not independent –One year of follow-up increases all three time scale measures by one year Cox models ‘work’ best if the baseline hazard captures a lot of hazard variation 01/201518

Choosing a time scale (7) For an RCT, ‘time on study’ is appropriate –follow-up time is usually short –Intervention has a strong effect, overwhelms age effect For etiological studies –Risk increases with age –Risk relates to exposure, not to length of time since study entry –Length of time is a proxy for cumulative exposure For etiological studies, several people have studied the choice of time scale 01/201519

Choosing a time scale (8) Breslow et al (1983) –Time-on-study as time scale fine for RCT’s, etc. –Not optimal for cohort studies Most outcome death rates increase rapidly with age –Want to maximize control of the age effect Time-on-study often strongly correlated with cumulative exposure –Can produce negative bias if used as time scale 01/201520

Choosing a time scale (9) Breslow et al (1983) –Recommendation Use age as time scale Stratify by calendar time (5 year groups) –Risk sets consist of people at the same age in each calendar group –Ignores length of time since entry as factor –Subjects are left truncated (‘late entry’) Time ‘0’ is ‘birth date’ 01/201521

Choosing a time scale (10) Korn et al (1997) –Cox models don’t specify a form for h(t) –Best choice of time scale is the one which has the biggest impact on the hazard function shape NOT the biggest impact on the HR! –Which would differ the most: hazard for people aged 50 vs. aged 60, both with 10 years of follow-up? hazard for two 55 year olds, one with 5 years of follow-up and one with 15 years? –Cannot study in the effect of the time scale variable 01/201522

Choosing a time scale (11) Korn et al –Recommendation Use age as time scale Stratify by year of birth (birth cohort) –5 year groups are commonly used –Essentially the same model as proposed by Breslow et al 01/201523

Choosing a time scale (12) Korn et al –Considered s second model (commonly used): ‘Time-on-study’ as time scale Adjust for age at entry in model –Results are the same as having age as time scale if: h 0 (t) is exponential in age –can give strong bias, especially with time- dependent covariates. 01/201524

Choosing a time scale (13) 01/ Uranium miners study 4 different time scales Differences are not big HR/RR all around ‘Age’ is used as time scale in rest of session Time scaleRR95% CI Time since entry – 6.3 Time since first mining – 4.9 Calendar year – 6.9 Age – 5.7

Implications for Analysis Age is the time variable –Left truncated –Requires ‘late entry’ methods Compute exposure as a time varying variable –Cumulative –Mean Analysis option #1: –Use regular Cox model Other options –risk set modelling methods 01/201526

Regular Cox models (1) Uses the Phreg approach Time varying exposure –e.g. use ‘500 WLM’ as time varying cut-point SAS code uses programming statements within Phreg Data file uses layout shown earlier 01/201527

* model has ageexit as failure time, ind as failure indicator and agest as entry time; proc phreg data=u.uminers; model ageexit*ind(0)=cr500 / entry=agest risklimits; * Time-dependent programming steps- see PHREG documentation; array rexp {18} rexp5 rexp10 rexp15 rexp20 rexp25 rexp30 rexp35 rexp40 rexp45 rexp50 rexp55 rexp60 rexp65 rexp70 rexp75 rexp80 rexp85 rexp90; m = min((ageexit-2)/5,18); i = 0; cradon = 0; do while (i < m); if (m > (i+1)) then do; cradon = cradon + rexp[i+1]; end; else do; cradon = cradon + (m-i)*(rexp[i+1]); end; i = i+1; end; * Determine whether cumulative radon is >= 500 WLM; cr500 = (cradon >= 500); run; 01/201528

proc phreg data=u.uminers; model ageexit*ind(0)=cr500 / entry=agest risklimits; /***** CODE REMOVED FOR CLARITY *****/ cr500 = (cradon >= 500); run; 01/201529

Regular Cox models (2) Could do with counting process style input –Need to create one record for each subject for each year. –Code gets complex –I won’t show this Either way, Phreg needs to: –create risk set data for each risk set –compute time varying covariates –do the MLE algorithm Time consuming process –BUT, not a big issue with modern computers. 01/201530

Risk Set Methods (1) A Different Approach Use the data step to create new data set with the risk set data Risk set grouped data –series of records for each risk set –one line for each subject in the risk set Code is complex (not shown) 01/201531

01/201532

01/201533

Risk Set Methods (2) How can we use this data? Consider any risk set (take risk set #1) Can represent data as 2x2 table 01/ Lung CAno Lung CA >=500 WLM145 < 500 WLM

Risk Set Methods (3) Treat each risk set as a stratum –matched on age (the time scale variable) Combine tables into an overall estimate –Mantel-Haenzel methods could be used Better approach –Conditional logistic regression. Can do this using either: –Proc Logistic –Proc Phreg Likelihood functions are identical 01/201535

Risk Set Methods (4) Three approaches can be used to do these analyses: –the ‘bit of time’ method (phreg) –the ‘separate strata’ method (phreg) –the ‘binary data’ method (logistic) 01/201536

Risk Set Methods (5) Approach #1 (‘bit of time’ method) –Use Phreg –Treat the risk set file as a counting process structure –Need to add an ‘entry’ and ‘exit’ time for each subject in each risk set 01/201537

Risk Set Methods (6) Approach #1 (‘bit of time’ method) –Need to add an ‘entry’ and ‘exit’ time for each subject in each risk set exit time –age when the risk set occurred entry time –exit time – –0.001 is arbitrary but the math works (trust me ) 01/201538

01/201539

Risk Set Methods (7) Approach #1 (‘bit of time’ method) –Ignores all of the time between risk sets –Seem weird but the math works (trust me ) 01/201540

proc phreg data=cumexp; model _rstime*_cc(0)=cr500 / entry=_rsentry rl; run; 01/201541

Risk Set Methods (8) Approach #2 (separate strata method) –Use Phreg –Number the risk sets from 1 to n –Use the risk set ID number as the time variable! Seems weird Risk set ID is not actually a ‘time’ But the math works (trust me ) –No need for a late entry variable 01/201542

01/201543

proc phreg data=cumexp nosummary; model _setno*_cc(0)=cr500 / rl; strata _setno; run; 01/ Identical to Method #1

Risk Set Methods (9) Approach #3 (binary data method) –Uses Proc Logistic –Treats each risk set as a stratum Remember my 2x2 table from an earlier slide –Uses conditional logistic regression Condition on the risk set ID Not interested in OR or RR for each risk set –just ‘nuisance’ parameters Including strata parameter can lead to strong bias 01/201545

Risk Set Methods (10) Approach #3 (binary data method) –Stratify by the risk set ID similar to STRATA statement in Phreg –Model yields an OR. with this sampling approach, OR = RR the math works (trust me ) 01/201546

01/201547

proc logistic data=cumexp descending; model _cc=cr500 / clodds=wald; strata _setno; run; 01/ Identical to Method #1

Risk Set Methods (11) All three methods gave the same results. –Results are not quite the same as initial Phreg analysis (with age as the time scale): 01/ MethodHR (RR)95% CI Regular Phreg – Risk sets – 5.728

Risk Set Methods (12) Why bother with risk set method? –Some people claim it is faster I didn’t see this effect If true, is this an issue with modern computers? does 1 sec vs. 2 secs matter? 01/ RegularRS #1RS #2RS #

Risk Set Methods (13) Why bother with risk set method? –Can handle random effects code better (I am told) –More easily extends to nested case-control and case-cohort methods. 01/201551

01/ Full risk data 1 ‘case’ per risk set Multiple non- cases

Nested case-control (1) Most studies will have hundreds or thousands of non-cases in each risk set. Suppose we needed to collect new exposure information on all subjects –genotyping Gets very expensive to use whole cohort. 01/201553

Nested case-control (2) Do we need all of the non-cases in each risk set? NO!!! 01/201554

Nested case-control (3) Select a random sample of non-cases from each risk set –Usually a small number 4 is common up to 20 in pharmaco-epidemiology studies A person can be used more than once –Multiple time as control –As control and case Collect new exposure information only on selected subjects Analyze using only these subjects Use any of the three risk set methods shown here 01/201555

Nested case-control (4) Will give an unbiased estimate of the true HR/RR 95% confidence intervals will be larger Why does it work? Go back to the Partial Likelihood for Cox models 01/201556

The final likelihood contribution from each risk set is: For the nested case-control, the likelihood contribution is given by: 01/201557

Nested case-control (5) Likelihoods are the same form –denominator sums over the available risk set Can vary method of non-case selection –random sample –matched –counter-matched Easily extends to case-cohort design –Select a random sample from initial cohort –Entire sample is retained as the risk set members through- out follow-up treats case status as a time varying covariate 01/201558

Summary Observational epidemiology analysis is more complex than an RCTs Survival methods generalize –discrete time methods –risk set approaches Choice of time scale More information on Langholz’s web site –Risk set analysis course, Lanhgolz, USCRisk set analysis course, Lanhgolz, USC 01/201559

01/201560