01/20151 EPI 5344: Survival Analysis in Epidemiology Time varying covariates March 24, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa
01/20152 Objectives Introduce time varying covariates Methods of inclusion into Cox models SAS (computer) issues
01/20153 Does heart transplantation improve survival? –Epidemiological study with ID measures –Observational study (not an RCT) Introduction (1)
01/20154 Assume that transplant has no effect on survival –IDR = candidates for transplant 2 year follow-up No losses 50% of people get a transplant –Always occurs on their first anniversary of entering study 25% of group die in first year 25% of first year survivors die in second year Introduction (2)
01/2015 Introduction (3) Ignore transplant status 5
01/20156 Introduction (4) Stratify by transplant status Transplant Done
01/20157 Introduction (5) Stratify by transplant status NO Transplant Done
01/20158 What is the observed IDR under this method of analysis? Transplant ID = 0.133/yr No transplant ID = 0.526/yr IDR = Correct IDR = 1.0 Introduction (6) STRONG BIAS Doing an RCT does NOT fix this issue as long as transplant is not done at time ‘0’
01/20159 How do we fix this? –No-one is at risk of dying with a transplant until the transplant has taken place Solution using epi methods: –People who never have transplant –People who have a transplant Accumulate PT (and events) to the non-transplant group until after a transplant occurs Accumulate PT (and events) to the transplant group only after transplant occurs Introduction (7)
01/ Introduction (8) CORRECT WAY: No Transplant Done
01/ Introduction (9) CORRECT WAY Transplant Done
01/ What is the observed IDR under this method of analysis? Transplant ID = 0.286/yr No transplant ID = 0.286/yr IDR = 1.0 Correct IDR = 1.0 Introduction (10) TIME VARYING COVARIATE Transplant status
01/ Exposures can change during follow-up –People stop/start smoking –BP increases –Air pollution varies from year to year Hazard often depends more strongly on recent values than original exposure –Not always true –Can depend on cumulative exposure Lagged exposure Time Varying Covariates (1)
01/ Produces non-proportional hazards –Change in exposure level causes hazard to change in one group Still proportional conditional on value of time varying exposure. Time Varying Covariates (2)
01/201515
01/201516
Before t*, HR = 1.0 After t*, HR* < 1.0 Time Varying Covariates (3) NOT PH over all time If we ignore the time of exposure and just treat these as two groups with PH, we get a biased estimate of the hazard ratio –A type of average of 1.0 and HR* (> HR*) 01/201517
01/ BUT: before t*, hazards are proportional after t*, hazards are proportional The true impact of the exposure is HR* and only occurs after t* Need an analysis approach to reflect this Time Varying Covariates (4)
01/ Is this hard to do? –YES and NO Consider a situation where all subjects start off as ‘unexposed’ but at some time in the future, some people become exposed Time Varying Covariates (5)
01/ Standard Cox Model Time Varying Covariates (6) Time Varying Cox Model Only change
01/ The theory really is this simple! WHY? Time Varying Covariates (7) RISK SETS
01/ Likelihood function for Cox model is computed at each time point when an event occurs –Depends only on subjects “at risk” at the event time –RISK SET Time Varying Covariates (8) x ij is the value of ‘x’ AT THE TIME of this event
01/ Fixed covariates: Time Varying Covariates (9) x ij is the same at all times Time varying covariates: Use the x ij which corresponds to the event time of this risk set Keep doing this over all risk sets
01/ So why isn’t it simple to do this? Practical Issues intrude!!!! To fit a time varying covariate, SAS needs to know the value of the covariate for every risk set. –Need to compute a value of the covariate at the time of every event. Interpretation is also tricky (later) Time Varying Covariates (10)
Time Varying Covariates (11) Example –4 subjects –2 get transplant at t = 15 & t = 25 –Want to include a time-varying covariate for transplant status. 01/ IDOutcomeTime of event TransplantTime of transplant 1dead10N. 2dead20Y15 3dead30N. 4dead40Y25 4 risk sets at t=10, 20, 30, & 40
Time Varying Covariates (12) 01/ Risk setIDX trans
01/ Two ways to do this in SAS: –Use programming statements in ‘Proc Phreg’. –Re-structure the data set and use a different method of describing the model to SAS Counting Process Input. Other programmes have similar options and choices Time Varying Covariates (13)
01/ We’ll look at both ways. –Some things can only be done in the Phreg programming approach –Counting Process input has some strong benefits. –Counting process approach can be tricky to use with age as the time scale Time Varying Covariates (14)
01/ SAS lets you include programme statements within PROC PHREG: proc phreg data=njb1; model surv*vs(0)=age sex x1; if (surv > 20) then x1 = 2; else x1 = 1; run; Proc Phreg programming (1)
01/ This code is processed once for each risk set ‘surv’ is the time when the risk set occurs –It is NOT the survival time for the subject ‘x1’ is the value of the variable in the subject at the time of the specific risk set under consideration. –Here, it is ‘1’ if the risk set occurs before time 20 but ‘2’ otherwise File can get VERY BIG Hard to de-bug your code –But, SAS 9.4 allows ‘out’ statements to be used Proc Phreg programming (2)
Stanford Heart Transplant Study 01/201531
01/201532
01/ Standard phreg analysis. Defines the ‘transplant’ status in the ‘data step’ using code like this: data njb1; set stanford; if (dot =.) then trans = 0; else trans = 1; run; proc phreg data=njb1; model time*cens(0)=trans; run;
01/ Trans=1 a) Had a transplant b) Lived long enough to have a transplant
01/ Hazard curves look something like this. Transplant No Transplant Transplant time In this interval, HR = 0 Overall HR is biased
01/ Stanford Heart Transplant Study: with time varying effect IDSurv1DeadWait For each event time, we need to define the transplant variable for every subject still in risk set plant = 0 no transplant by risk set time 1 transplant done on or before risk set time
01/ Risk set time ID’sWait timeplant
01/ Risk set time ID’sWait timeplant
01/ SAS Code to create ‘plant’ and run analysis proc phreg data=stan; model surv1*dead(0)=plant surg ageaccept/ ties=exact; if (wait > surv1 or wait =.) then plant = 0; else plant = 1; run;
Counting Process Input (1) Counting processes are a different way to look at survival –mathematically more powerful –essentially, each subject follows a ‘process’ ‘count up’ the events they experience can handle recurrent events enhances modeling of exposure. Don’t need to know all this to use SAS counting process style input. 01/201540
Counting Process Input (2) Data set needs to be restructured. To-date –one record per subject –To code covariate changes, need multiple variables value at baseline (v1) time of first change (t1) and new value (v2) and so on –Need to use ‘phreg’ programming to define value at risk set. 01/201541
Counting Process Input (3) New approach –Similar to piece-wise exponential model –Split data for each subject into multiple records Define intervals where every covariate is constant –[t1, t2) Each interval has one line (record) of data –Intervals continue until: Subject censored Subject has outcome event. 01/201542
01/ Need to re-structure data file Each interval needs a record in the data set Need to code Start of this interval End of this interval Outcome status at end of interval Value of time varying covariate(s) during the interval Values of fixed covariates, etc. Counting Process Input (4)
01/ Let’s use data from the Stanford Heart Transplant Study the same data as before. But, we only include transplant status Ignore other variables for now. Only have one time varying covariate. Counting Process Input (5)
01/2015 IDSurv1DeadWait Original data Re-structured data IDStartStopStatusplant IDStartStopStatusplant IDStartStopStatusplant IDStartStopStatusplant IDStartStopStatusplant IDStartStopStatusplant IDStartStopStatusplant IDStartStopStatusplant IDStartStopStatusplant IDStartStopStatusplant
01/ DATA stanlong; SET allison.stan; plant=0; start=0; IF (trans=0) THEN DO; dead2=dead; stop=surv1; IF (stop=0) THEN stop=.1; OUTPUT; END; ELSE DO; stop=wait; IF (stop=0) THEN stop=.1; dead2=0; OUTPUT; plant=1; start=wait; IF (stop=.1) THEN start=.1; stop=surv1; dead2=dead; OUTPUT; END; RUN; SAS Code to re-structure data DATA stanlong; SET allison.stan; plant=0; start=0; IF (trans=0) THEN DO; dead2=dead; stop=surv1; OUTPUT; END; ELSE DO; stop=wait; dead2=0; OUTPUT; plant=1; start=wait; stop=surv1; dead2=dead; OUTPUT; END; RUN;
01/ PROC PHREG DATA=stanlong; MODEL (start,stop)*dead2(0)=plant surg ageaccpt / TIES=EFRON; RUN; SAS Code for counting-process input analysis Identical to previous time-varying analysis
01/ Types of time varying covariates Internal (endogenous) –Change in the covariate is related to the behaviour of the subject. –Measurement requires subject to be under periodic examination Blood pressure Cholesterol Smoking –More challenging for analysis Often part of causal pathway Time Varying Covariates (15)
01/ External (exogenous) –Variables which vary independently of the subject’s normally biological processes. –The values do not depend on subject-specific information –Measurement does not require subject monitoring Hourly pollen count Time Varying Covariates (16)
01/ Some pattern types –Non-reversible dichotomy Transplant –Reversible dichotomy Smoking Drug use –Continuous variable Cholesterol Time Varying Covariates (17)
01/ Some issues –Need for valid measures for all subjects at all follow- up time Missing data ‘coarse’ measurement intervals Imputation Interpolation –Computationally intense Reverse causation effects Intermediate variables in the causal pathway Time Varying Covariates (18)
01/ Some Logical fallacies Can not use the future to predict the future! Example #1 –Recruit a cohort of neonates Age at entry = 0 for all subjects –Not useful as a predictor –Suggestion is made to use average age during follow-up to predict outcome –INVALID Average age during follow-up depends on ‘future’ information High average age is due to long survival Time Varying Covariates (19)
01/ Intermediaries (Internal covariates) RCT of anti-hypertensive treatment Outcome: time to stroke Main Q: Does drug rate of stroke Model 1: ln(HR) = β 1 (drug) BUT, we measured BP on all subjects during follow-up. –Why not include this as a time-varying covariate? Time Varying Covariates (20)
01/ Intermediaries (cont) Model 1: ln(HR) = β 1 (drug) Model 2: ln(HR) = β 1 *(drug) + β 2 BP(t) Results Model 1 β 1 : p < Model 2 β 1 *: p =0.6 Time Varying Covariates (21) WHY?
01/ Drug drop in BP drop in stroke risk Effect of drug on stroke is already accounted for in the BP term Estimate from model of ‘drug’ effect is the effect of the drug after adjusting for changes in BP That is, after adjusting for the drug effect. Time Varying Covariates (22)
01/ Study of prisoners released from jail –One year follow-up –Monitor every week If subject was re-arrested, record the week of the arrest Recidivated –Key question Does financial security post-release reduce risk of recidivism? SAS examples (1)
01/201557
01/201558
01/201559
01/ Study also collected information about employment status for every week of follow-up after release Time varying covariate Hypothesis –Being in full-time employment reduces the risk of recidivism. SAS examples (2)
01/ IDEMP1EMP2EMP3………EMP ……… … and so on Data layout for employment information
01/ PROC PHREG DATA=allison.recid; MODEL week*arrest(0)=fin age race wexp mar paro prio employed / TIES=EFRON; ARRAY emp(*) emp1-emp52; employed=emp[week]; RUN;
01/ BUT: if you get arrested in week 10, you can’t work fulltime in week 10 REVERSE CAUSATION Lagged exposure
01/ title 'Single week lag'; PROC PHREG data=allison.recid; WHERE week>1; MODEL week*arrest(0)=fin age race wexp mar paro prio employed / TIES=EFRON; ARRAY emp(*) emp1-emp52; employed=emp[week-1]; RUN;
01/ Allison looks at some other models –Other lag intervals –cumulative work experience Worth reviewing for code examples and interpretation SAS examples (3)
01/ Albumin and death –Question: Does a falling serum albumin predict an increased likelihood of death? SAS examples (4)
01/ Albumin measured on the first day of each month –Ad-hoc measurement –Not available on every day of the month Can not use ‘average’ albumin around death date –No post-death value Use ‘closest’ value before risk set date SAS examples (5)
01/ DATA bloodcount; INFILE 'c:\blood.dat'; INPUT deathday status alb1-alb12; ARRAY alb(*) alb1-alb12; status2=0; deathmon=CEIL(deathday/30.4); DO j=1 TO deathmon; start=(j-1)*30.4; stop=start+30.4; albumin=alb(j); IF (j=deathmon) THEN DO; status2=status; stop=deathday-start; END; OUTPUT; END; Run; PROC PHREG DATA=bloodcount; MODEL (start,stop)*status2(0)=albumin; RUN; Uses counting process style input
01/ Alcohol cirrhosis and survival –Prothrombin time (a measure of blood clotting) is hypothesized as a predictor of survival –Cohort of men were followed up –Lab measures were taken at ‘clinically relevant’ times No pattern to the times Varied for each subject SAS examples (6)
01/201570
01/ DATA alcocount; SET allison.alco; time1=0; time11=.; ARRAY t(*) time1-time11; ARRAY p(*) pt1-pt10; dead2=0; DO j=1 TO 10 WHILE (t(j) NE.); start=t(j); pt=p(j); stop=t(j+1); IF (t(j+1)=.) THEN DO; stop=surv; dead2=dead; END; OUTPUT; END; run; PROC PHREG DATA=alcocount; MODEL (start,stop)*dead2(0)=pt; RUN; Uses counting process style input
01/201572