Lecture 15: Time Varying Covariates Time-varying covariates
Time-Dependent Covariates Thus far we’ve only considered “fixed” time covariates Examples of time varying covariates – Cumulative exposure – Smoking status – Blood pressure Now, data structure is – [T, d, Z(t); 0 < t < T]
CPHM with Time Varying Covariates The model looks like what we’ve been working with. Now however, Z is a function of t :
Likelihood Time Varying Covariates Again, we can use the partial likelihood estimation approach for estimating b But Z is now a function of t (as in the model statement): Otherwise, testing and estimation are the same as for fixed covariates
Example: Bone Marrow Transplant Main covariate of interest is disease type: – ALL – low risk AML – high risk AML Interest is in determining factors associated with disease-free survival (death or relapse)
BMT Fixed Time Covariates There are several fixed time covariates we’ve found to be important – Patient Age – Donor Age – FAB identification – Disease type – Hospital
BMT Time Varying Covariates There are also several time varying covariates – Acute graft vs. host disease (AGvHD) – Chronic graft vs. host disease (CGvHD) – Platelet recovery (PR) These all occur after BMT or not at all They can also vary over the course of the study
R: Time-Varying Covariates Expand data to describe all scenarios Need to consider the possible combinations of events Example: AGVHD and DFS – Possible scenarios at any point in time during the study for subject 1 No AGVDH: DFS? AGVHD: DFS? – For all patients with TTAGVHD < DFS, need two rows in dataset to describe variation – For all patients with TTAGVHD > DFS, need only one row in the dataset
Timeline Examples: Observed Event t 0 to t a : no AGVHD until t a, no event t a to t e : AGVHD, event t 0 to t e : no AGVHD, event t0t0 t0t0 tata tete tete
Timeline Examples: Censored Event t 0 to t a : no AGVHD, no event t a to t c : AGVHD, no event (censored) t 0 to t c : no AGVHD, no event t0t0 t0t0 tata tete tete tctc tctc
Time-Varying Covariates First, look at each time varying covariate Which (if any) are associated with DFS, adjusting for diagnosis Estimation and inference are the same as with fixed time covariates Difference – Data structure
Data Set-up >data[1:15,c(1,25,4:8)] ID Disease DFS Death Relapse Either TAGvH AGvH
Expansion Consider row 1 – Now, two rows – Row 1: start time = 0, stop time = 67, agvhd = 0, … – Row 2: start time = 67, stop time = 2081, agvhd = 1, … Consider row 2 – Still 1 row – Row 1: start time = 0, stop time = 1602, agvhd = 0, …
What About Dependence? You might be asking whether we need to worry about correlated data? In this case we do not need to worry about it. There two exceptions: – When subjects have multiple events – When a subject appears in overlapping intervals The 2 nd case is almost always a data error A subject can be at risk in multiple strata at the same time – Corresponds to being simultaneously at risk for two distinct outcomes.
R Expansion n<-nrow(bmt) adata<-bmt[, c(1:2,14:23)] #fixed time columns for (i in 1:n) { times1<-c(bmt$TAGvH[i], bmt$TCGvH[i], bmt$TRP[i], bmt$DFS[i]) events<-c(bmt$AGvH[i], bmt$CGvH[i], bmt$RP[i], bmt$Either[i]) times2<-times1[which(times1<=times1[4])] utimes<-sort(unique(times2)) for (j in 1:length(utimes)) { if (length(utimes)==1) {vec<-events} if (length(utimes)>1 & j==1) {vec<-c(0,0,0,0)} if (j>1 & j<length(utimes)){loc<-which(times1==utimes[j-1]) vec<-replace(vec, loc, events[loc]) } if (j>1 & j==length(utimes)) {loc<-which(times1==utimes[j-1]) vec<-replace(vec, c(loc,4), events[c(loc,4)])} if (j==1 & i==1) {bmt.long<-unlist(c(0, utimes[j], adata[i,], vec))} if (j==1 & i>1) {bmt.long<-rbind(bmt.long, c(0, utimes[j], adata[i,],vec))} if (j>1) {bmt.long<-rbind(bmt.long, c(utimes[j-1], utimes[j], adata[i,],vec))} } bmt.long<-as.data.frame(matrix(as.vector(unlist(bmt.long)), nrow=342, ncol=18, byrow=F)) colnames(bmt.long)<-c("Tstart","Tstop",colnames(adata),"AGvH","CGvH","PR","event") sum(bmt.long$event)
Expanded Data > bmt[1:2,] ID Disease TTD TTR Death Relapse Either TAGvH AGvH TCGvH CGvH TRP RP PtAge …. > bmt.long[1:8,] Tstart Tstop ID Disease PtAge AGvH CGvH PR event ….
Alternatively Use: expand.breakpoints Previous creates dataset per time-dependent covariate Above created by John Maindonald Expands dataset into rows per person using either observed number of times, or pre- specified number of times
expand.breakpoints Approach > bps<-sort(unique(c(bmt$DFS, bmt$TAGvH, bmt$TCGvH, bmt$TRP))) > bps [1] … [215] > bmt.long2<-expand.breakpoints(bmt, index="id", status="Either", tevent="DFS", breakpoints=bps) > bmt.long2 ID Tstart Tstop Either epoch Disease TTD TTR Death Relapse TAGvH AGvH TCGvH CGvH TRP RP …
Still Not Done That provides us with separate intervals per patient for all intervals of interest BUT, treats AGvHD, CGvHD, and PR as “fixed” time covariates We need to create time-dependent versions
R #create time-dependent covariates > bmt.long$AGvHt<-ifelse(bmt.long$TAGvH<=bmt.long$Tstart & bmt.long$AGvH==1, 1, 0) > bmt.long$CGvHt<-ifelse(bmt.long$TCGvH<=bmt.long$Tstart & bmt.long$CGvH==1, 1, 0) > bmt.long$PRt<-ifelse(bmt.long$TRP<=bmt.long$Tstart & bmt.long$PR==1, 1, 0) #Look again at pts 1 and 2 to see time dependent variables > bmt.long2$AGvH[which(bmt.long2$ID==1)] [1] … [175] > bmt.long$AGvHt[which(bmt.long$id==1)] [1] … [175]
Syntax in R To define time to event variable, there are two options: – Surv(time, y) – Surv(start.time, stop.time, y) For time varying covariates (or left-truncated data), usually simpler to use the latter convention In most other cases, simpler to use the former
Testing Time-Varying Covariates Controlling for Diagnosis #Acute graft vs. host disease #Chronic graft vs. host disease #Platelet recovery time rega<-coxph(Surv(Tstart, Tstop, event)~ AGvHDt+factor(Disease), data=bmt.long2) regc<-coxph(Surv(Tstart, Tstop, event)~ CGvHDt+factor(Disease), data=bmt.long2) regp<-coxph(Surv(Tstart, Tstop, event)~ PRt+factor(Disease), data=bmt.long2)
AGvHD > rega Call: coxph(formula = Surv(Tstart, Tstop, Either) ~ AGvHt + factor(Disease), data = bmt.long2) coef exp(coef) se(coef) z p AGvH factor(Disease) factor(Disease) Likelihood ratio test=14.7 on 3 df, p= n= 19070, number of events= 83
CGvHD > regc Call: coxph(formula = Surv(Tstart, Tstop, Either) ~ CGvHt + factor(Disease), data = bmt.long2) coef exp(coef) se(coef) z p CGvHt factor(Disease) factor(Disease) Likelihood ratio test=13.9 on 3 df, p= n= 19070, number of events= 83
Platelet Recovery > regp Call: coxph(formula = Surv(Tstart, Tstop, Either) ~ PRt + factor(Disease), data = bmt.long2) coef exp(coef) se(coef) z p PRt factor(Disease) factor(Disease) Likelihood ratio test=22.9 on 3 df, p=4.32e-05 n= 19070, number of events= 83
Interpretation? Patients with low risk AML have less risk of an event compare to ALL patients Patients with high risk AML have greater risk of an event relative to patients with ALL Patients who experience platelet recovery at a given time have less risk of an event relative to those who have not experienced platelet recovery
Back to Our Original Models Only platelet recovery is significantly associated with disease free survival Now investigate model that adjusts for previously mentioned fixed time covariates – Disease type – FAB – Donor/patient age and interaction – hospital
Models with and without PRt #Model w/ donor/patient age, intx, FAB, dx, hosp, & PR > st<-Surv(bmt.long2$Tstart, bmt.long2$Tstop, bmt.long2$Either) > reg.fixed<-coxph(st~factor(Disease)+FAB+PtAge+DonAge+ PtAge*DonAge, data=bmt.long2) > reg.tv<-coxph(st~factor(Disease)+PRt, data=bmt.long2) > reg.all<-coxph(st~factor(Disease)+FAB+PtAge+DonAge+ PtAge*DonAge+PRt, data=bmt.long2) > LRT<-2*(reg.all$loglik[2]-reg.tv$loglik[2]) > pchisq(LRT, 4, lower.tail=F) [1]
Recall Fixed Time Covariate Model > reg.fixed Call: coxph(formula = st ~ factor(Disease) + FAB + PtAge + DonAge + PtAge * DonAge, data = bmt.long) coef exp(coef) se(coef) z p factor(Disease) factor(Disease) FAB PtAge DonAge PtAge:DonAge Likelihood ratio test=32.8 on 6 df, p=1.14e-05 n= 342, number of events= 83
Time Covariate + Disease Type > reg.tv Call: coxph(formula = st ~ factor(Disease) + PR, data = bmt.long) coef exp(coef) se(coef) z p factor(Disease) factor(Disease) PR Likelihood ratio test=22.9 on 3 df, p=4.32e-05 n= 342, number of events= 83
Full Model > reg.all Call: coxph(formula = st ~ factor(Disease) + FAB + PtAge + DonAge + PtAge * DonAge + PR, data = bmt.long) coef exp(coef) se(coef) z p factor(Disease) factor(Disease) FAB PtAge DonAge PR PtAge:DonAge Likelihood ratio test=39.9 on 7 df, p=1.3e-06 n= 342, number of events= 83
Interactions Coding by Hand #Interaction coding #Diagnosis 2 (low risk AML)*PRT #Diagnosis 3 (hi risk AML)*PRT #FAB*PRT #PRT*donor age, PRT*patient age, PRT*Donor age*Patient age bmt.long2$ageint<-(bmt.long2$PtAge-28)* (bmt.long2$DonAge-28) bmt.long2$dx2.pr<-ifelse(bmt.long2$PRt==1 & bmt.long2$Disease==2, 1, 0) bmt.long2$dx3.pr<-ifelse(bmt.long2$PRt==1 & bmt.long2$Disease==3, 1, 0) bmt.long2$fab.pr<-bmt.long2$PRt*bmt.long2$FAB bmt.long2$dnr.pr<-bmt.long2$PRt*(bmt.long2$DonAge-28) bmt.long2$pt.pr<-bmt.long2$PRt*(bmt.long2$PtAge-28) bmt.long2$pt.pr.dnr<-bmt.long2$PRt*(bmt.long2$ageint)
Interactions 1.Diag 2 x PRT 2.Diag 3 x PRT 3.PRT x donor age 4.PRT x patient age 5.PRT x donor age x patient age (confusing) 1. “additional hazard of failure after platelet recovery in those with diagnosis of low risk AML vs. those with ALL” 2. “additional hazard of failure after platelet recovery in those with diagnosis of high risk AML vs. those with ALL” 3. “additional hazard of failure after platelet recovery with an increase in donor age” 4. “additional hazard of failure after platelet recovery with an increase in patient age” 5. “additional hazard of failure after platelet recovery with an increase in the interaction between the patient and donor age”
Series of Models reg1<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt, data=bmt.long2) reg2<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+dx3.pr, data=bmt.long2) reg3<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+fab.pr, data=bmt.long2) reg4<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dnr.pr+pt.pr+ pt.pr.dnr, data=bmt.long2) reg5<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+ dx3.pr+fab.pr, data=bmt.long2) reg6<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+ dx3.pr+dnr.pr+pt.pr+pt.pr.dnr, data=bmt.long2) reg7<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+ dx3.pr+fab.pr+dnr.pr+pt.pr+pt.pr.dnr, data=bmt.long2)
Full Model with Interactions > reg7 coef exp(coef) se(coef) z p factor(Disease) factor(Disease) FAB DonAge PtAge ageint PRt dx2.pr dx3.pr fab.pr dnr.pr pt.pr pt.pr.dnr Likelihood ratio test=63.6 on 13 df, p=1.19e-08 n= 19070, number of events= 83
Fitting Interactions Directly > reg7b Call: coxph(formula = st ~ factor(Disease) + FAB + DonAge + PtAge + DonAge * PtAge + PR + PR * factor(Disease) + PR * FAB + PR* DonAge + PR * PtAge + DonAge * PtAge * PR, data = bmt.long) coef exp(coef) se(coef) z p factor(Disease) factor(Disease) FAB DonAge PtAge PR DonAge:PtAge factor(Disease)2:PR factor(Disease)3:PR FAB:PR DonAge:PR PtAge:PR DonAge:PtAge:PR Likelihood ratio test=63.6 on 13 df, p=1.19e-08 n= 342, number of events= 83
Low Risk AML vs. ALL Interaction between diagnosis and platelet recovery Low-risk AML vs. ALL, prior to platelet recovery – b = – HR (95% CI): 3.76 (0.76, 18.76) Low-risk AML vs. ALL, after platelet recovery – b = (-3.06) = – HR (95% CI): 0.18 (0.08, 0.41)
R Code for the HR and 95% CI > betahr<-reg7$coef[1]+reg7$coef[8] > betahr factor(Disease) > seintx<-sqrt(reg7$var[1,1]+reg7$var[8,8]+2*reg7$var[1,8]) > seintx [1] > exp(betahr - qnorm(0.975)*seintx) factor(Disease) > exp(betahr + qnorm(0.975)*seintx) factor(Disease)
Other Interactions? High risk AML vs. ALL? High risk AML vs. Low Risk AML? Age? …
What About Continuous Covariates Continuous variables can change over time as well Given the times measurements are taken, we can expand the data in the same way. We are assuming the value is unchanging during the interval between which it was measured – A little unrealistic BUT… – This is no different from treating a single measure (e.g. blood pressure) as a fixed time covariate
Next Time Regression Diagnostics… checking the proportional hazards assumption.