Download presentation
Presentation is loading. Please wait.
Published byGary Garrison Modified over 9 years ago
1
Lecture 16: Regression Diagnostics I Proportional Hazards Assumption -graphical methods -regression methods
2
Regression Diagnostics Most interested in testing proportional hazards assumption Also looking for functional form of covariates Two types of methods – Graphical approaches – Regression approaches
3
Graphical Approaches Recall our graphical checks – Kernel smoothing – Smoothing splines Both will provide information about whether or not the hazards cross Both can be implemented in R – “muhaz” package: kernel smoothing for survival – “gss” package: smoothing splines for survival
4
Graphical Approaches Consider CPHM with single binary covariate… This means we can also can consider the following plot… If hazards proportional, this should be ≈ equal to No package in R – Calculate NA hazard estimates for each condition at each unique time point
5
Examples Lets explore the graphical and regression checks for proportional hazards – Kidney Infection data Surgical vs. percutaneous – BMT data FAB classification Methotrexate use
6
Survival for Kidney
7
Graphical Checks ### KIDNEY INFECTION EXAMPLE### #log cum haz plots Library(Kmsurv); library(survival) coxph(Surv(time, delta)~factor(type), data=kidney) dat1<-kidney[kidney$type==1, ] dat2<-kidney[kidney$type==2, ] fit1<-survfit(coxph(Surv(time, delta)~1, data=dat1), type="aalen") fit2<-survfit(coxph(Surv(time, delta)~1, data=dat2), type="aalen") times<-sort(unique(kidney$time)) ch1<--log(fit1$surv) ch2<--log(fit2$surv) ch1<-c(0, ch1[1:17],ch1[17],ch1[17],ch1[18:20],ch1[20],ch1[21:23],ch1[23]) ch2<-c(ch2[1:13],ch2[13],ch2[14:19],ch2[19],ch2[20],ch2[20],ch2[21:23],ch2[23],ch2[24]) plot(times, log(ch2)-log(ch1), type="s", xlab="time", ylab="log(H[t|Z=perc])-log(H[t|Z=surg])", lwd=2) lines(times, rep(0.613, length(times)), lwd=2, col=2)
8
Graphical Checks #Smoothing splines cath<-kid$cath[order(kid$Time)] event<-kid$d[order(kid$Time)] times<-sort(kid$Time) library(gss) hazfit<-sshzd(Surv(times, event)~cath*times) haz<-hzdrate.sshzd(hazfit, data.frame(times=times, cath=cath)) h1<-haz[cath==1]; id1<-order(h1); t1<-times[cath==1] h2<-haz[cath==0]; id2<-order(h2); t2<-times[cath==0] plot(times[cath==0], haz[cath==0], xlim=c(0, max(times)), ylim=range(haz), xlab="Time", type="l",ylab="hazard", lwd=2, col=1) lines(times[cath==1], haz[cath==1], lwd=2, col=2) legend(0,.1, c("percutaneous","surgical"), col=1:3, lwd=2, cex=0.8)
9
Graphical Checks: Kidney
10
BMT Data Let’s conduct graphical checks for – French/American/British Disease classification Recall this was significant in our original model – Methotrexate use Recall this was not
11
Survival Curves for BMT: FAB and MTX
12
BMT Graphical Checks: FAB
13
BMT Graphical Checks: MTX
14
Graphical Approaches Pretty pictures are nice and can be intuitive but… We generally prefer a statistical means of determining if an assumption is true This leads us to regression approaches
15
Regression Approaches Impose a time-dependent covariate into the model General idea: – If PHM is valid, time-dependent covariate will not be significant – If time-dependent covariate is significant, then there is “something” going on in terms of the HRs that varies over time
16
Introduce Time Dependent Covariate Create an new variable Z 2 (t) = Z 1 ×g(t), where g(t) is a function of time We don’t know the functional form of g(t) Try several possibilities, for example
17
Binary Case Consider a binary covariate Z 1 Generate Model is: Hazard ratio is:
18
New Time Dependent Covariate Fit proportional hazards model with Z 1, Z 2 (t), and estimate b 1, b 2 Test local hypothesis: – H 0 : b 2 = 0 vs. H A : b 2 ≠ 0 If you reject H 0, can not assume proportional hazards Do this for each covariate in question
19
Examples Lets explore the regression check for proportional hazards in our two examples… – Kidney Infection data Surgical vs. percutaneous – BMT data FAB classification Methotrexate use
20
Regression Check: Kidney ### Kidney Example (are hazards for percutaneous and surgical proportional?) times<-sort(unique(kidney$time)) kidney$id<-1:nrow(kidney) kid.long<-expand.breakpoints(kidney, index="id", status="delta", tevent="time", breakpoints=times) kid.long$ttype1<-log(kid.long$Tstop)*(kid.long$type-1) kid.long$ttype2<-kid.long$Tstop*(kid.long$type-1) kid.long$ttype3 7.5, (kid.long$type-1), 0) kid.long$ttype4 7.5, kid.long$Tstop*(kid.long$type-1), 0) m1<-coxph(Surv(Tstart, Tstop, delta)~type+ttype1, data=kid.long) m2<-coxph(Surv(Tstart, Tstop, delta)~type+ttype2, data=kid.long) m3<-coxph(Surv(Tstart, Tstop, delta)~type+ttype3, data=kid.long) m4<-coxph(Surv(Tstart, Tstop, delta)~type+ttype4, data=kid.long)
21
Results: Kidney > m1 coef exp(coef) se(coef) z p type 1.44 4.21 1.029 1.40 0.160 ttype1 -1.47 0.23 0.587 -2.51 0.012 > m2 coef exp(coef) se(coef) z p type 0.961 2.614 0.751 1.28 0.200 ttype2 -0.256 0.774 0.117 -2.18 0.029 > m3 coef exp(coef) se(coef) z p type 0.35 1.4193 0.549 0.637 0.520 ttype3 -2.89 0.0555 1.184 -2.443 0.015 > m4 coef exp(coef) se(coef) z p type 0.241 1.272 0.5317 0.453 0.650 ttype4 -0.185 0.831 0.0875 -2.113 0.035
22
BMT Regression Check: FAB ### BMT Example (are hazards FAB classes proportional?) bps<-sort(unique(c(bmt$DFS))) bmt.long<-expand.breakpoints(bmt, index="id", status="Either", tevent="DFS", breakpoints=bps) #create time-dependent covariates bmt.long$txfab1<-log(bmt.long$Tstop)*(bmt.long$FAB) bmt.long$txfab2<-bmt.long$Tstop*(bmt.long$FAB) bmt.long$txfab3 100, (bmt.long$FAB), 0) bmt.long$txfab4 100, bmt.long$Tstop*(bmt.long$FAB), 0) m1<-coxph(Surv(Tstart, Tstop, Either)~FAB+txfab1, data=bmt.long) m2<-coxph(Surv(Tstart, Tstop, Either)~FAB+txfab2, data=bmt.long) m3<-coxph(Surv(Tstart, Tstop, Either)~FAB+txfab3, data=bmt.long) m4<-coxph(Surv(Tstart, Tstop, Either)~FAB+txfab4, data=bmt.long)
23
Results: FAB > m1 coef exp(coef) se(coef) z p FAB 0.0253 1.03 0.956 0.0264 0.98 txfab1 0.1202 1.13 0.182 0.6605 0.51 > m2 coef exp(coef) se(coef) z p FAB 0.541241 1.72 0.299949 1.804 0.071 txfab2 0.000341 1.00 0.000706 0.483 0.630 > m3 coef exp(coef) se(coef) z p FAB 0.782 2.185 0.408 1.914 0.056 txfab3 -0.206 0.814 0.488 -0.421 0.670 > m4 coef exp(coef) se(coef) z p FAB 0.564772 1.76 0.287651 1.963 0.05 txfab4 0.000274 1.00 0.000681 0.403 0.69
24
BMT Regression Check: MTX #create time-dependent covariates for MTX bmt.long$txmtx1<-log(bmt.long$Tstop)*(bmt.long$MTX) bmt.long$txmtx2<-bmt.long$Tstop*(bmt.long$MTX) bmt.long$txmtx3 400, (bmt.long$MTX), 0) bmt.long$txmtx4 400, bmt.long$Tstop*(bmt.long$MTX), 0) m1<-coxph(Surv(Tstart, Tstop, Either)~MTX+txmtx1, data=bmt.long) m2<-coxph(Surv(Tstart, Tstop, Either)~MTX+txmtx2, data=bmt.long) m3<-coxph(Surv(Tstart, Tstop, Either)~MTX+txmtx3, data=bmt.long) m4<-coxph(Surv(Tstart, Tstop, Either)~MTX+txmtx4, data=bmt.long)
25
Results: MTX > m1 coef exp(coef) se(coef) z p MTX 2.682 14.614 1.124 2.39 0.017 txmtx1 -0.459 0.632 0.222 -2.07 0.038 > m2 coef exp(coef) se(coef) z p MTX 1.22592 3.407 0.38088 3.22 0.0013 txmtx2 -0.00377 0.996 0.00154 -2.45 0.0140 > m3 coef exp(coef) se(coef) z p MTX 0.71 2.033 0.263 2.70 0.0069 txmtx3 -1.73 0.178 0.789 -2.19 0.0290 > m4 coef exp(coef) se(coef) z p MTX 0.70796 2.030 0.26188 2.70 0.0069 txmtx4 -0.00303 0.997 0.00143 -2.12 0.0340
26
Checking Model with >1 Covariate? ### Model where MTX is not time-varying > m5a<-coxph(Surv(Tstart, Tstop, Either)~factor(Disease) + FAB + PtAge + DonAge + PtAge*DonAge+MTX, data=bmt.long) > m5a Call: coxph(formula = Surv(Tstart, Tstop, Either) ~ factor(Disease) + FAB + PtAge + DonAge + PtAge * DonAge + MTX, data = bmt.long) coef exp(coef) se(coef) z p factor(Disease)2 -1.00606 0.3660.362370 -2.776 0.00550 factor(Disease)3-0.35406 0.702 0.370966 -0.954 0.34000 FAB 0.84303 2.323 0.279461 3.017 0.00260 PtAge -0.08522 0.918 0.035708 -2.387 0.01700 DonAge -0.08390 0.920 0.030318 -2.767 0.00570 MTX 0.30342 1.354 0.252929 1.200 0.23000 PtAge:DonAge 0.00315 1.003 0.000943 3.337 0.00085 Likelihood ratio test=34.2 on 7 df, p=1.58e-05 n= 8665, number of events= 83
27
Checking Model with >1 Covariate? ### Model where MTX is not time-varying > m5<-coxph(Surv(Tstart, Tstop, Either)~factor(Disease) + FAB + PtAge + DonAge + PtAge*DonAge+MTX+txmtx1, data=bmt.long) > m5 Call: coxph(formula = Surv(Tstart, Tstop, Either) ~ factor(Disease) + FAB + PtAge + DonAge + PtAge * DonAge + MTX + txmtx1, data = bmt.long) coef exp(coef) se(coef) z p factor(Disease)2 -1.01485 0.362 0.362198 -2.802 0.0051 factor(Disease)3 -0.32998 0.719 0.368393 -0.896 0.3700 FAB 0.88170 2.415 0.278466 3.166 0.0015 PtAge -0.08201 0.9210.035909 -2.284 0.0220 DonAge -0.08490 0.919 0.030965 -2.742 0.0061 MTX 2.69860 14.859 1.152782 2.341 0.0190 txmtx1 -0.47772 0.620 0.225152 -2.122 0.0340 PtAge:DonAge 0.00305 1.003 0.000955 3.194 0.0014 Likelihood ratio test=39.5 on 8 df, p=3.93e-06 n= 8665, number of events= 83
28
Alternative Form of Time-Varying Covariate So far we’ve guessed at g(t) Problem is we don’t necessarily know the correct functional form Consider a binary covariate, Z 1 Assume for covariate Z 1, the relative risk changes over time What if we use the data instead – Get “best estimate” from the data
29
Change Point Model Let This gives us proportional hazards model with a change point at (Liang et. Al (1990)) Fit proportional hazards model with Z 1 and Z 2 (t) We now have a PH model with HR: – Model: H(t|Z (t)) = h 0 (t)exp{b 1 Z 1 + b 2 Z 2 } – h(t|Z(t)) = h 0 (t)exp{b 1 Z 1 } if t < – h(t|Z(t)) = h 0 (t)exp{(b 1 + b 2 )Z 1 } if t > So we are fitting a PH model that includes a change point, which allows the HR to change after a specified time
30
How to Determine A change point for the relative risk was introduced. Where is the best change point? Recall the partial likelihood only changes at event times Calculate log likelihood at each event time where represents specific event times Choose that yields the largest log-likelihood
31
Kidney Example #Change point model >cps<-sort(unique(kidney$time[which(kidney$delta==1)])) >LL<-c() >for (i in 1:length(cps)) >{ > z2 cps[i], kid.long$type, 0) > mod<-coxph(Surv(Tstart, Tstop, delta)~type+z2, data=kid.long, > method="breslow") > LL<-append(LL, mod$loglik[2]) >} > round(LL, digits=3) [1] -97.878 -100.224 -97.630 -97.501 -99.683 -100.493 -98.856 -100.428 [9] -101.084 -101.668 -102.168 -100.829 -101.477 -102.059 -102.620 -103.229
32
Change Point Results Event TimesLog Partial Likelihood 0.5-97.878 1.5-100.224 2.5-97.630 3.5-97.501 4.5-99.683 5.5-100.493 6.5-98.856 8.5-100.428 9.5-101.084 10.5-101.668 11.5-102.168 15.5-100.829 16.5-101.477 18.5-102.059 23.5-102.620
33
What About Multiple Comparisons? Is it “fishing” to try many cutpoints? No, we are conducting diagnostics so we don’t worry so much We aren’t sure of the form of a time- dependence so we are being flexible to identify if we are missing something
34
Hazards Not Proportional Proportional hazards assumption doesn’t hold… what can we do? Single binary covariate… consider a piecewise regression – Change-point identified by data (alternate coding) Many covariates, consider stratified model on non-proportional covariate
35
Kidney: 1-Covariate with Change-Point > kid.long$z2 3.5, kid.long$cath, 0) > kid.long$z3<-ifelse(kid.long$Tstop<=3.5, kid.long$cath, 0) > mod<-coxph(Surv(Tstart, Tstop, d)~z2+z3, data=kid.long, method="breslow") > mod Call: coxph(formula = Surv(Tstart, Tstop, d) ~ z2 + z3, data = kid.long, method = "breslow") coef exp(coef) se(coef) z p z2 -2.09 0.124 0.760 -2.75 0.006 z3 1.08 2.950 0.783 1.38 0.170 Likelihood ratio test=13.9 on 2 df, p=0.000956 n= 1132, number of events= 26
36
Interpretation? Up to 3.5 months, there is a there is not a significant difference in risk of infection between the two groups. However, after 3.5 months the relative risk of infection in patients with percutaneously placed catheters is 0.12 times the risk relative to patients with surgically placed catheters. Recall our hazard rate plots – Hazards crossed at about 3.5 months
37
A Few Points A single cutpoint may not be enough – There are “two” models – Within in each piece, we are still assuming proportional hazards Check proportional hazards models within each of the time intervals Can generate additional time varying covariates within each interval
38
Stratified Cox Regression Recall stratification Estimates ‘pooled’ association across strata Stratification in regression – Estimates pooled regression coefficient – Strong assumption that associations between covariate and outcome are the same across strata
39
Estimation: Partial Likelihood Approach Partition dataset based on strata Define log-likelihood per strata Log-likelihood based on J strata Maximize LL( ) w.r.t. Notice is common across all the strata specific partial log-likelihoods
40
BMT: Stratified Model Steps: 1) Check proportional hazards assumption 2) Fit stratified cox model 3) Check model assumptions (i.e. constant ’s… more on this in a moment) We’ve already seen that the proportional hazards assumption for Methotrexate use is incorrect.
41
BMT Data Associations between covariates and DFS, stratified by diagnosis > reg2a<-coxph(Surv(Tstart, Tstop, Either)~ factor(Disease)+ FAB+DonAge:PtAge+TRP+strata(MTX), data = bmt.long2) > reg2a Call: coxph(formula=Surv(Tstart, Tstop,Either)~factor(Disease)+FAB+DonAge+ PtAge+DonAge*PtAge+PRt+strata(MTX), data = bmt.long2) coef exp(coef) se(coef) z p factor(Disease)2 -1.00911 0.365 0.364333 -2.770 0.0056 factor(Disease)3 -0.34552 0.708 0.370254 -0.933 0.3500 FAB 0.89008 2.435 0.280684 3.171 0.0015 DonAge -0.08415 0.919 0.031123 -2.704 0.0069 PtAge -0.08175 0.921 0.036255 -2.255 0.0240 PRt 2.08909 8.078 1.095274 1.907 0.0560 DonAge:PtAge 0.00301 1.003 0.000957 3.143 0.0017 Likelihood ratio test=33.4 on 7 df, p=2.23e-05 n= 19070, number of events= 83
42
Is Assumption of Constant Reasonable? Testing assumption Divide dataset into J strata Fit model with p covariates in each strata Define Define based on stratified model Test significance via LRT
43
Checking Constant Assumption #Testing stratification assumption reg2a<-coxph(Surv(Tstart, Tstop, Either) ~ factor(Disease) + FAB + DonAge + PtAge + DonAge*PtAge +PRt + strata(MTX), data=bmt.long2) dat1<-bmt.long2[bmt.long2$MTX==0,] reg2b<-coxph(Surv(Tstart, Tstop, Either) ~ factor(Disease) + FAB + DonAge + PtAge + DonAge*PtAge +PRt + strata(MTX), data=dat1) dat2<-bmt.long2[bmt.long2$MTX==1,] reg2c<-coxph(Surv(Tstart, Tstop, Either) ~ factor(Disease) + FAB + DonAge + PtAge + DonAge*PtAge +PRt + strata(MTX), data=dat2) LL2<-reg2$loglik[2] LL3<-reg3a$loglik[2]+reg3b$loglik[2]+reg3c$loglik[2] lrt<-2*(LL3-LL2) p.lrt<-1-pchisq(lrt, 2)
44
> reg2b coef exp(coef) se(coef) z p factor(Disease)2 -1.19655 0.302 0.4585 -2.610 0.0091 factor(Disease)3 -0.29025 0.748 0.4451 -0.652 0.5100 FAB 1.08896 2.971 0.3384 3.218 0.0013 DonAge -0.08378 0.920 0.0371 -2.258 0.0240 PtAge -0.03630 0.964 0.0536 -0.677 0.5000 PRt -0.86747 0.420 0.4793 -1.810 0.0700 DonAge:PtAge 0.00227 1.002 0.0014 1.618 0.1100 > reg2c coef exp(coef) se(coef) z p factor(Disease)2 -0.56372 0.569 0.63847 -0.8829 0.380 factor(Disease)3 -0.85828 0.424 0.91761 -0.9353 0.350 FAB 0.34408 1.411 0.65122 0.5284 0.600 DonAge -0.00452 0.995 0.08152 -0.0555 0.960 PtAge -0.02724 0.973 0.08073 -0.3374 0.740 PRt -1.00688 0.365 0.55118 -1.8268 0.068 DonAge:PtAge 0.00138 1.001 0.00227 0.6098 0.540
45
Results > LL.piece<-reg2b$loglik[2]+reg2c$loglik[2] > LL.strat<-reg2a$loglik[2] > lrt<-2*(LL.piece-LL.strat) > lrt [1] 6.118211 > p.lrt<-1-pchisq(lrt, 7) > p.lrt [1] 0.5260164
46
Next Time Regression diagnostics using residuals!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.