Download presentation
Presentation is loading. Please wait.
Published byCharlene Stevenson Modified over 9 years ago
1
Lecture 15: Time Varying Covariates Time-varying covariates
2
Time-Dependent Covariates Thus far we’ve only considered “fixed” time covariates Examples of time varying covariates – Cumulative exposure – Smoking status – Blood pressure Now, data structure is – [T, d, Z(t); 0 < t < T]
3
CPHM with Time Varying Covariates The model looks like what we’ve been working with. Now however, Z is a function of t :
4
Likelihood Time Varying Covariates Again, we can use the partial likelihood estimation approach for estimating b But Z is now a function of t (as in the model statement): Otherwise, testing and estimation are the same as for fixed covariates
5
Example: Bone Marrow Transplant Main covariate of interest is disease type: – ALL – low risk AML – high risk AML Interest is in determining factors associated with disease-free survival (death or relapse)
6
BMT Fixed Time Covariates There are several fixed time covariates we’ve found to be important – Patient Age – Donor Age – FAB identification – Disease type – Hospital
7
BMT Time Varying Covariates There are also several time varying covariates – Acute graft vs. host disease (AGvHD) – Chronic graft vs. host disease (CGvHD) – Platelet recovery (PR) These all occur after BMT or not at all They can also vary over the course of the study
8
R: Time-Varying Covariates Expand data to describe all scenarios Need to consider the possible combinations of events Example: AGVHD and DFS – Possible scenarios at any point in time during the study for subject 1 No AGVDH: DFS? AGVHD: DFS? – For all patients with TTAGVHD < DFS, need two rows in dataset to describe variation – For all patients with TTAGVHD > DFS, need only one row in the dataset
9
Timeline Examples: Observed Event t 0 to t a : no AGVHD until t a, no event t a to t e : AGVHD, event t 0 to t e : no AGVHD, event t0t0 t0t0 tata tete tete
10
Timeline Examples: Censored Event t 0 to t a : no AGVHD, no event t a to t c : AGVHD, no event (censored) t 0 to t c : no AGVHD, no event t0t0 t0t0 tata tete tete tctc tctc
11
Time-Varying Covariates First, look at each time varying covariate Which (if any) are associated with DFS, adjusting for diagnosis Estimation and inference are the same as with fixed time covariates Difference – Data structure
12
Data Set-up >data[1:15,c(1,25,4:8)] ID Disease DFS Death Relapse Either TAGvH AGvH 1 1 2081 0 0 0 67 1 2 1 1602 0 0 0 1602 0 3 1 1496 0 0 0 1496 0 4 1 1462 0 0 0 70 1 5 1 1433 0 0 0 1433 0 6 1 1377 0 0 0 1377 0 7 1 1330 0 0 0 1330 0 8 1 996 0 0 0 72 1 9 1 226 0 0 0 226 0 10 1 1199 0 0 0 1199 0 11 1 1111 0 0 0 1111 0 12 1 530 0 0 0 38 1 13 1 1182 0 0 0 1182 0 14 1 1167 0 0 0 39 1 15 1 418 1 0 1 418 0
13
Expansion Consider row 1 – Now, two rows – Row 1: start time = 0, stop time = 67, agvhd = 0, … – Row 2: start time = 67, stop time = 2081, agvhd = 1, … Consider row 2 – Still 1 row – Row 1: start time = 0, stop time = 1602, agvhd = 0, …
14
What About Dependence? You might be asking whether we need to worry about correlated data? In this case we do not need to worry about it. There two exceptions: – When subjects have multiple events – When a subject appears in overlapping intervals The 2 nd case is almost always a data error A subject can be at risk in multiple strata at the same time – Corresponds to being simultaneously at risk for two distinct outcomes.
15
R Expansion n<-nrow(bmt) adata<-bmt[, c(1:2,14:23)] #fixed time columns for (i in 1:n) { times1<-c(bmt$TAGvH[i], bmt$TCGvH[i], bmt$TRP[i], bmt$DFS[i]) events<-c(bmt$AGvH[i], bmt$CGvH[i], bmt$RP[i], bmt$Either[i]) times2<-times1[which(times1<=times1[4])] utimes<-sort(unique(times2)) for (j in 1:length(utimes)) { if (length(utimes)==1) {vec<-events} if (length(utimes)>1 & j==1) {vec<-c(0,0,0,0)} if (j>1 & j<length(utimes)){loc<-which(times1==utimes[j-1]) vec<-replace(vec, loc, events[loc]) } if (j>1 & j==length(utimes)) {loc<-which(times1==utimes[j-1]) vec<-replace(vec, c(loc,4), events[c(loc,4)])} if (j==1 & i==1) {bmt.long<-unlist(c(0, utimes[j], adata[i,], vec))} if (j==1 & i>1) {bmt.long<-rbind(bmt.long, c(0, utimes[j], adata[i,],vec))} if (j>1) {bmt.long<-rbind(bmt.long, c(utimes[j-1], utimes[j], adata[i,],vec))} } bmt.long<-as.data.frame(matrix(as.vector(unlist(bmt.long)), nrow=342, ncol=18, byrow=F)) colnames(bmt.long)<-c("Tstart","Tstop",colnames(adata),"AGvH","CGvH","PR","event") sum(bmt.long$event)
16
Expanded Data > bmt[1:2,] ID Disease TTD TTR Death Relapse Either TAGvH AGvH TCGvH CGvH TRP RP PtAge 1 1 2081 2081 0 0 0 67 1 121 1 13 1 26 2 1 1602 1602 0 0 0 1602 0 139 1 18 1 21 …. > bmt.long[1:8,] Tstart Tstop ID Disease PtAge AGvH CGvH PR event 0 13 1 1 26 0 0 0 0 13 67 1 1 26 0 0 1 0 67 121 1 1 26 1 0 1 0 121 2081 1 1 26 1 0 1 0 0 18 2 1 21 0 0 0 0 18 139 2 1 21 0 0 1 0 139 1602 2 1 21 0 1 1 0 0 12 3 1 26 0 0 0 0 ….
17
Alternatively Use: expand.breakpoints Previous creates dataset per time-dependent covariate Above created by John Maindonald Expands dataset into rows per person using either observed number of times, or pre- specified number of times
18
expand.breakpoints Approach > bps<-sort(unique(c(bmt$DFS, bmt$TAGvH, bmt$TCGvH, bmt$TRP))) > bps [1] 1 2 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 … [215] 1850 1857 1870 2024 2081 2133 2140 2204 2218 2246 2252 2409 2430 2506 2569 2640 > bmt.long2<-expand.breakpoints(bmt, index="id", status="Either", tevent="DFS", breakpoints=bps) > bmt.long2 ID Tstart Tstop Either epoch Disease TTD TTR Death Relapse TAGvH AGvH TCGvH CGvH TRP RP 1 0 1 0 1 1 2081 2081 0 0 67 1 121 1 13 1 1 1 2 0 2 1 2081 2081 0 0 67 1 121 1 13 1 1 2 7 0 3 1 2081 2081 0 0 67 1 121 1 13 1 1 7 8 0 4 1 2081 2081 0 0 67 1 121 1 13 1 … 1 1870 2024 0 218 1 2081 2081 0 0 67 1 121 1 13 1 1 2024 20 81 0 219 1 2081 2081 0 0 67 1 121 1 13 1 2 0 1 0 1 1 1602 1602 0 0 1602 0 139 1 18 1
19
Still Not Done That provides us with separate intervals per patient for all intervals of interest BUT, treats AGvHD, CGvHD, and PR as “fixed” time covariates We need to create time-dependent versions
20
R #create time-dependent covariates > bmt.long$AGvHt<-ifelse(bmt.long$TAGvH<=bmt.long$Tstart & bmt.long$AGvH==1, 1, 0) > bmt.long$CGvHt<-ifelse(bmt.long$TCGvH<=bmt.long$Tstart & bmt.long$CGvH==1, 1, 0) > bmt.long$PRt<-ifelse(bmt.long$TRP<=bmt.long$Tstart & bmt.long$PR==1, 1, 0) #Look again at pts 1 and 2 to see time dependent variables > bmt.long2$AGvH[which(bmt.long2$ID==1)] [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 … [175] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > bmt.long$AGvHt[which(bmt.long$id==1)] [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 … [175] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
21
Syntax in R To define time to event variable, there are two options: – Surv(time, y) – Surv(start.time, stop.time, y) For time varying covariates (or left-truncated data), usually simpler to use the latter convention In most other cases, simpler to use the former
22
Testing Time-Varying Covariates Controlling for Diagnosis #Acute graft vs. host disease #Chronic graft vs. host disease #Platelet recovery time rega<-coxph(Surv(Tstart, Tstop, event)~ AGvHDt+factor(Disease), data=bmt.long2) regc<-coxph(Surv(Tstart, Tstop, event)~ CGvHDt+factor(Disease), data=bmt.long2) regp<-coxph(Surv(Tstart, Tstop, event)~ PRt+factor(Disease), data=bmt.long2)
23
AGvHD > rega Call: coxph(formula = Surv(Tstart, Tstop, Either) ~ AGvHt + factor(Disease), data = bmt.long2) coef exp(coef) se(coef) z p AGvH 0.323 1.381 0.285 1.31 0.264 factor(Disease)2 -0.551 0.576 0.288 -1.91 0.055 factor(Disease)3 0.435 1.546 0.272 1.60 0.110 Likelihood ratio test=14.7 on 3 df, p=0.00214 n= 19070, number of events= 83
24
CGvHD > regc Call: coxph(formula = Surv(Tstart, Tstop, Either) ~ CGvHt + factor(Disease), data = bmt.long2) coef exp(coef) se(coef) z p CGvHt -0.186 0.830 0.288 -0.646 0.520 factor(Disease)2 -0.620 0.538 0.296 -2.094 0.036 factor(Disease)3 0.367 1.444 0.268 1.368 0.170 Likelihood ratio test=13.9 on 3 df, p=0.00309 n= 19070, number of events= 83
25
Platelet Recovery > regp Call: coxph(formula = Surv(Tstart, Tstop, Either) ~ PRt + factor(Disease), data = bmt.long2) coef exp(coef) se(coef) z p PRt -1.120 0.326 0.329 -3.40 0.00067 factor(Disease)2 -0.497 0.608 0.289 -1.72 0.08600 factor(Disease)3 0.382 1.465 0.268 1.43 0.15000 Likelihood ratio test=22.9 on 3 df, p=4.32e-05 n= 19070, number of events= 83
26
Interpretation? Patients with low risk AML have less risk of an event compare to ALL patients Patients with high risk AML have greater risk of an event relative to patients with ALL Patients who experience platelet recovery at a given time have less risk of an event relative to those who have not experienced platelet recovery
27
Back to Our Original Models Only platelet recovery is significantly associated with disease free survival Now investigate model that adjusts for previously mentioned fixed time covariates – Disease type – FAB – Donor/patient age and interaction – hospital
28
Models with and without PRt #Model w/ donor/patient age, intx, FAB, dx, hosp, & PR > st<-Surv(bmt.long2$Tstart, bmt.long2$Tstop, bmt.long2$Either) > reg.fixed<-coxph(st~factor(Disease)+FAB+PtAge+DonAge+ PtAge*DonAge, data=bmt.long2) > reg.tv<-coxph(st~factor(Disease)+PRt, data=bmt.long2) > reg.all<-coxph(st~factor(Disease)+FAB+PtAge+DonAge+ PtAge*DonAge+PRt, data=bmt.long2) > LRT<-2*(reg.all$loglik[2]-reg.tv$loglik[2]) > pchisq(LRT, 4, lower.tail=F) [1] 0.001878685
29
Recall Fixed Time Covariate Model > reg.fixed Call: coxph(formula = st ~ factor(Disease) + FAB + PtAge + DonAge + PtAge * DonAge, data = bmt.long) coef exp(coef) se(coef) z p factor(Disease)2 -1.09065 0.336 0.354279 -3.08 0.00210 factor(Disease)3 -0.40391 0.668 0.362777 -1.11 0.27000 FAB 0.83742 2.310 0.278464 3.01 0.00260 PtAge -0.08164 0.922 0.036107 -2.26 0.02400 DonAge -0.08459 0.919 0.030097 -2.81 0.00490 PtAge:DonAge 0.00316 1.003 0.000951 3.32 0.00089 Likelihood ratio test=32.8 on 6 df, p=1.14e-05 n= 342, number of events= 83
30
Time Covariate + Disease Type > reg.tv Call: coxph(formula = st ~ factor(Disease) + PR, data = bmt.long) coef exp(coef) se(coef) z p factor(Disease)2 -0.497 0.608 0.289 -1.72 0.08600 factor(Disease)3 0.382 1.465 0.268 1.43 0.15000 PR -1.120 0.326 0.329 -3.40 0.00067 Likelihood ratio test=22.9 on 3 df, p=4.32e-05 n= 342, number of events= 83
31
Full Model > reg.all Call: coxph(formula = st ~ factor(Disease) + FAB + PtAge + DonAge + PtAge * DonAge + PR, data = bmt.long) coef exp(coef) se(coef) z p factor(Disease)2 -1.03245 0.356 0.353200 -2.92 0.0035 factor(Disease)3 -0.41398 0.661 0.365222 -1.13 0.2600 FAB 0.81180 2.252 0.283236 2.87 0.0042 PtAge -0.07102 0.931 0.035449 -2.00 0.0450 DonAge -0.07607 0.927 0.030007 -2.54 0.0110 PR -0.98307 0.374 0.338109 -2.91 0.0036 PtAge:DonAge 0.00287 1.003 0.000935 3.07 0.0021 Likelihood ratio test=39.9 on 7 df, p=1.3e-06 n= 342, number of events= 83
32
Interactions Coding by Hand #Interaction coding #Diagnosis 2 (low risk AML)*PRT #Diagnosis 3 (hi risk AML)*PRT #FAB*PRT #PRT*donor age, PRT*patient age, PRT*Donor age*Patient age bmt.long2$ageint<-(bmt.long2$PtAge-28)* (bmt.long2$DonAge-28) bmt.long2$dx2.pr<-ifelse(bmt.long2$PRt==1 & bmt.long2$Disease==2, 1, 0) bmt.long2$dx3.pr<-ifelse(bmt.long2$PRt==1 & bmt.long2$Disease==3, 1, 0) bmt.long2$fab.pr<-bmt.long2$PRt*bmt.long2$FAB bmt.long2$dnr.pr<-bmt.long2$PRt*(bmt.long2$DonAge-28) bmt.long2$pt.pr<-bmt.long2$PRt*(bmt.long2$PtAge-28) bmt.long2$pt.pr.dnr<-bmt.long2$PRt*(bmt.long2$ageint)
33
Interactions 1.Diag 2 x PRT 2.Diag 3 x PRT 3.PRT x donor age 4.PRT x patient age 5.PRT x donor age x patient age (confusing) 1. “additional hazard of failure after platelet recovery in those with diagnosis of low risk AML vs. those with ALL” 2. “additional hazard of failure after platelet recovery in those with diagnosis of high risk AML vs. those with ALL” 3. “additional hazard of failure after platelet recovery with an increase in donor age” 4. “additional hazard of failure after platelet recovery with an increase in patient age” 5. “additional hazard of failure after platelet recovery with an increase in the interaction between the patient and donor age”
34
Series of Models reg1<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt, data=bmt.long2) reg2<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+dx3.pr, data=bmt.long2) reg3<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+fab.pr, data=bmt.long2) reg4<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dnr.pr+pt.pr+ pt.pr.dnr, data=bmt.long2) reg5<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+ dx3.pr+fab.pr, data=bmt.long2) reg6<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+ dx3.pr+dnr.pr+pt.pr+pt.pr.dnr, data=bmt.long2) reg7<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+ dx3.pr+fab.pr+dnr.pr+pt.pr+pt.pr.dnr, data=bmt.long2)
35
Full Model with Interactions > reg7 coef exp(coef) se(coef) z p factor(Disease)2 1.325 3.765 0.819 1.618 0.1100 factor(Disease)3 1.134 3.108 1.225 0.926 0.3500 FAB -1.250 0.286 1.112 -1.124 0.2600 DonAge 0.116 1.123 0.043 2.679 0.0074 PtAge -0.154 0.857 0.054 -2.820 0.0048 ageint 0.0026 1.003 0.001 1.337 0.1800 PRt -0.286 0.751 0.695 -0.412 0.6800 dx2.pr -3.057 0.047 0.926 -3.299 0.0010 dx3.pr -1.894 0.150 1.291 -1.467 0.1400 fab.pr 2.471 11.831 1.159 2.131 0.0330 dnr.pr -0.147 0.863 0.048 -3.054 0.0023 pt.pr 0.193 1.213 0.058 3.289 0.0010 pt.pr.dnr 0.000 1.000 0.002 0.060 0.9500 Likelihood ratio test=63.6 on 13 df, p=1.19e-08 n= 19070, number of events= 83
36
Fitting Interactions Directly > reg7b Call: coxph(formula = st ~ factor(Disease) + FAB + DonAge + PtAge + DonAge * PtAge + PR + PR * factor(Disease) + PR * FAB + PR* DonAge + PR * PtAge + DonAge * PtAge * PR, data = bmt.long) coef exp(coef) se(coef) z p factor(Disease)2 1.3257 3.765 0.81952 1.618 0.11000 factor(Disease)3 1.1341 3.108 1.22487 0.926 0.35000 FAB -1.2503 0.286 1.11245 -1.124 0.26000 DonAge 0.0436 1.045 0.05866 0.744 0.46000 PtAge -0.2264 0.797 0.09118 -2.484 0.01300 PR -1.4817 0.227 2.11360 -0.701 0.48000 DonAge:PtAge 0.0026 1.003 0.00194 1.337 0.18000 factor(Disease)2:PR -3.0568 0.047 0.92646 -3.299 0.00097 factor(Disease)3:PR -1.8941 0.150 1.29132 -1.467 0.14000 FAB:PR 2.4707 11.831 1.15926 2.131 0.03300 DonAge:PR -0.1506 0.860 0.06967 -2.162 0.03100 PtAge:PR 0.1894 1.209 0.10127 1.871 0.06100 DonAge:PtAge:PR 0.000138 1.000 0.00230 0.060 0.95000 Likelihood ratio test=63.6 on 13 df, p=1.19e-08 n= 342, number of events= 83
37
Low Risk AML vs. ALL Interaction between diagnosis and platelet recovery Low-risk AML vs. ALL, prior to platelet recovery – b = 1.326 – HR (95% CI): 3.76 (0.76, 18.76) Low-risk AML vs. ALL, after platelet recovery – b = 1.326 + (-3.06) = -1.73 – HR (95% CI): 0.18 (0.08, 0.41)
38
R Code for the HR and 95% CI > betahr<-reg7$coef[1]+reg7$coef[8] > betahr factor(Disease)2 -1.731125 > seintx<-sqrt(reg7$var[1,1]+reg7$var[8,8]+2*reg7$var[1,8]) > seintx [1] 0.4263292 > exp(betahr - qnorm(0.975)*seintx) factor(Disease)2 0.07678741 > exp(betahr + qnorm(0.975)*seintx) factor(Disease)2 0.408389
39
Other Interactions? High risk AML vs. ALL? High risk AML vs. Low Risk AML? Age? …
40
What About Continuous Covariates Continuous variables can change over time as well Given the times measurements are taken, we can expand the data in the same way. We are assuming the value is unchanging during the interval between which it was measured – A little unrealistic BUT… – This is no different from treating a single measure (e.g. blood pressure) as a fixed time covariate
41
Next Time Regression Diagnostics… checking the proportional hazards assumption.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.