Lecture 19: Competing Risk Regression
When Competing Risks? Recall censoring assumption: Event times and censoring times are independent If this is questionable, then competing risks is likely more appropriate But… must be able to distinguish the other events/risks
Competing Risks Data Subject can fail from any of K events types, but only the earliest failure time can be observed. As in the non-competing risks setting, observations take the form of (T, δ). T is the minimum of t1, t2, … tK δ is 1, 2, … k if failed Can also have 0 if no failure has yet occurred Z are the covariates we are interested in
Examples of Types of Observations
Examples of Types of Observations
Summarizing Competing Risks For a population, three approaches Kaplan Meier: “net” Cumulative incidence: “crude” Conditional probability Cumulative incidence most appropriate (and most commonly used) for most settings of CR However, each provides it own potentially useful information
Recall The Issue We’ve already discussed this for estimation of the survival distribution. In the case of Kaplan-Meier analysis recall Assumes that the event of interest is the only risk acting on the population Censors all other events i.e. treats all other events the same as LTFU or drop-out
Competing Risks For estimation in a population, it is usually shown using ‘cumulative incidence’ instead of survival Let’s review how CR approach differs from KM approach…
Recall: KM Survival Estimate of survival for event r at time ti
“Net”: KM Cumulative Incidence (CI) Estimate of cumulative incidence at time ti (The sum of the CI current incidence rate plus the previous incidence rate)
Cumulative Incidence Approach Estimating CI: For t > ti Or Alternatively it can be written as:
How does the cumulative incidence (CI) differ from Kaplan Meier (KM)…
Comparison KM CI
Why Competing Risk Regression? Understand the effect of therapy on different subgroups Allow us to target interventions to those most likely to benefit Allow us to summarize the absolute failure via estimation of cause-specific failure probabilties
Cause-Specific Cumulative Hazard Chief quantity in competing risks setting is the cause-specific hazard function lk This also can be used to define the cause-specific cumulative hazard Overall cumulative hazard function is
Cumultative Incidence How does this all relate to our earlier discussion of estimating overall incidence? We are still interested in cumulative incidence
Cumultative Incidence Sum of the K cumulative incidence functions is the probability of failure from any cause… NOTE! The cumulative probability of event k in the presence of competing risks is often miscalculated as
Competing Risk Model Recall a subject can fail from one of K events This means that we have partial information on all events A subject who experience the kth event at time ti is know to have survived to time ti for all other competing events.
Competing Risk Model There are two basic approaches to competing risk regression Modeling cause-specific hazard Modeling cumulative incidence Both are analogous to the Cox proportional hazards model
Modeling Cause Specific Hazard Model denoted as Function of some unspecified baseline hazard function for kth cause Also function of covariate vector Z and regression coefficients b
BMT Example Recall our BMT data examining the association between time to event disease type ALL, AML low risk, AML high risk Other factors: FAB class, donor/patient characteristics, Waiting time, platelet recovery, Originally we had considered time to relapse or death as a single event. What if we wanted to examine factors for each?
Cause Specific Hazard Models library(survival); library(Kmsurv) ### Cause-specific hazard model ### data(bmt) colnames(bmt)<-c("dgroup","TTD","DFS","dead","relapse","Either","tAGVHD", "AGvHD","tCGVHD", "CGvHD","tPR","PR","PtAge","DonAge","PtSex","DonSex", "PtCMV","DonCVM", "TTTrans","FAB","Hosp","MTX") ### Either Death or Relapse rreg.cox<-coxph(Surv(DFS, Either)~factor(dgroup) + FAB + PtAge + DonAge + PtAge*DonAge, data=bmt) #Relapse Model rreg.cox<-coxph(Surv(DFS, relapse)~factor(dgroup) + FAB + PtAge + DonAge + PtAge*DonAge,
Death or Relapse Model > summary(reg.cox) coxph(formula = Surv(DFS, relapse) ~ factor(dgroup)+FAB+PtAge+DonAge+PtAge*DonAge, data = bmt) n= 137, number of events= 83 coef exp(coef) se(coef) z Pr(>|z|) factor(dgroup)2 -1.0906 0.3359 0.354279 -3.078 0.002080 ** factor(dgroup)3 -0.4039 0.6677 0.362776 -1.113 0.265549 FAB 0.8374 2.3104 0.278464 3.007 0.002636 ** PtAge -0.08164 0.9216 0.036107 -2.261 0.023756 * DonAge -0.08459 0.9189 0.030097 -2.810 0.004947 ** PtAge:DonAge 0.00316 1.0032 0.000951 3.323 0.000891 *** Concordance= 0.665 (se = 0.033 ) Rsquare= 0.213 (max possible= 0.996 ) Likelihood ratio test= 32.8 on 6 df, p=1.144e-05 Wald test = 33.02 on 6 df, p=1.039e-05 Score (logrank) test = 35.75 on 6 df, p=3.078e-06
Relapse Specific Hazard Model > summary(rreg.cox) coxph(formula = Surv(DFS, relapse) ~ factor(dgroup)+FAB+PtAge+DonAge+PtAge*DonAge, data = bmt) n= 137, number of events= 42 coef exp(coef) se(coef) z Pr(>|z|) factor(dgroup)2 -1.8406 0.1587 0.582111 -3.162 0.00157 ** factor(dgroup)3 -0.5794 0.5602 0.540797 -1.071 0.28403 FAB 1.4239 4.1531 0.433179 3.287 0.00101 ** PtAge -0.0384 0.9624 0.052539 -0.730 0.46511 DonAge -0.0835 0.9199 0.044086 -1.894 0.05822 . PtAge:DonAge 0.0024 1.0024 0.001432 1.648 0.09937 . Concordance= 0.75 (se = 0.046 ) Rsquare= 0.203 (max possible= 0.938 ) Likelihood ratio test= 31.16 on 6 df, p=2.361e-05 Wald test = 27.15 on 6 df, p=0.0001355 Score (logrank) test = 31.55 on 6 df, p=1.993e-05
Interpretation of Cause-Specific Model Interpret the hazard ratio for two individuals the same as from a normal Cox model However, there is a dependency between the failure types… Problem Effects seen in the model may reflect the influence of the competing events As a result, covariate effects don’t necessarily pertain to the cumulative incidence of the kth event
Modeling Cumulative Incidence Analogous to our general approach for a study population with competing risk, there is a model base on cumulative incidence. Competing risk regression model (Fine and Gray) Direct regression of the effect of covariates on cumulative incidence Distinguish between patients who have had other events and those at risk for event of interest Based on PHM approach
Modeling Cumulative Incidence In this case the form of the model is Where is referred to as the sub-distribution hazard crude hazard from CIcr
Sub-Distribution Hazard Expression for the sub-distribution hazard Can also think of this as the hazard function for an improper random variable
Risk Set in CRR Model Just as in the case of the Cox model, we need a likelihood expression for estimation of the model Definition of the risk set slightly altered
Partial Likelihood The expression for the partial Likelihood based on our newly defined risk set is Partial log-likelihood
Estimation & Testing We can maximize the partial log-likelihood in the same way we did with the Cox PHM Additionally, Fine and Gray developed a score test to make inference about the regression coefficients in the model
What About Censoring Up until now, we have been assuming we have “complete” data. If we observe censoring we must change the definition of the risk set slightly…
Competing Risk Regression in R Can implement the cause-specific hazard model using the coxph function in the survival library The “cmprsk” package in R implements the Fine and Gray model we’ve just discussed So back to our BMT example
“cmprsk” Library crr: Fit the Fine and Gray model of the subdistribution functions in competing risk ftime and fstatus: define survival data cov1 and cov2: matrix of covariates tf: functions of time for covariate in cov2 failcode: which event are you modeling? Other functionality deals with estimating and comparing cumulative incidence across groups
Fitting CRR Model ### Fine and Gray Cumulative Incidence Model ### ###First we have to generate a single event type variable etime<-ifelse(bmt$relapse==1, bmt$DFS, bmt$TTD) etype<-ifelse(bmt$relapse==1, 1, 0) etype<-ifelse(bmt$dead==1 & bmt$relapse==0, 2, etype) dx2<-ifelse(bmt$dgroup==2, 1, 0) dx3<-ifelse(bmt$dgroup==3, 1, 0) ptdn.intx<-bmt$PtAge*bmt$DonAge fab<-bmt$FAB ptage<-bmt$PtAge dnage<-bmt$DonAge cov<-cbind(dx2, dx3, fab, ptage, dnage, ptdn.intx)
CRR Model > rreg.crr<-crr(etime, etype, cov, failcode=1) > summary(rreg.crr) Competing Risks Regression Call: crr(ftime = etime, fstatus = etype, cov1 = cov, failcode = 1) coef exp(coef) se(coef) z p-value dx2 -1.55581 0.211 0.55215 -2.818 0.0048 dx3 -0.53913 0.583 0.52981 -1.018 0.3100 fab 1.30349 3.682 0.43894 2.970 0.0030 ptage -0.01688 0.983 0.06499 -0.260 0.8000 dnage -0.05874 0.943 0.04879 -1.204 0.2300 ptdn.intx 0.00152 1.002 0.00176 0.864 0.3900 Num. cases = 137 Pseudo Log-likelihood = -187 Pseudo likelihood ratio test = 24.1 on 6 df,
CRR Model Results exp(coef) exp(-coef) 2.5% 97.5% dx2 0.211 4.739 0.0715 0.623 dx3 0.583 1.715 0.2065 1.648 fab 3.682 0.272 1.5577 8.704 ptage 0.983 1.017 0.8657 1.117 dnage 0.943 1.060 0.8570 1.038 ptdn.intx 1.002 0.998 0.9981 1.005 Num. cases = 137 Pseudo Log-likelihood = -187 Pseudo likelihood ratio test = 24.1 on 6 df,
Additional Notes Can also include a matrix of time varying covariates Can NOT use factor here… Must create dummy variables for all factor variables of interest in the data
Comparison of CHR and CRR Cause-specific Competing Risks b HR p AML low risk -1.841 0.16 (0.05, 0.50) 0.0016 -1.556 0.0048 0.21 (0.07, 0.62) AML high risk -0.579 0.56 (0.19, 1.62) 0.2800 -0.539 0.3100 0.58 (0.21, 1.65) FAB 1.424 4.15 (1.78, 9.71) 0.0010 1.303 0.0030 3.68 (1.56, 8.70) Patient Age -0.038 0.96 (0.87, 1.07) 0.4700 -0.017 0.8000 0.98 (0.87, 1.12) Donor Age -0.084 0.92 (0.84, 1.00) 0.0580 -0.059 0.2300 0.94 (0.86, 1.04) PatAge x DonAge 0.0024 1.00 (0.999, 1.005) 0.0990 0.0015 0.3900 1.00 (0.998, 1.005)
Cause-Specific vs CRR approach Competing risks are truly independent Cause-specific model provides valid estimates of the risk of each event CRR model tends to be biased towards the null If competing risks are dependent Cause specific model
Which One to Use? It depends… The cause-specific approach can give cause-specific hazards and CIFs. However, we cannot examine covariate effects Subdistribution approach allows us to test covariate effects on the CIF but subdistribution hazards are dicult to interpret and should be used with caution.
Which One to Use? Cumulative incidence model more realistically models treatment effect in a population. If we want real world probabilities of death then competing risks methodology should be used as opposed to standard survival analysis methods. Allows us to separate the probability of death into different causes.
Model Selection for CRR A nice feature of the CRR approach is that we can evaluate associations between covariates and our different events This means we can conduct model selection to find a more parsimonious model Use same approaches as before P-values AIC, BIC Forward/backward selection
Example (forward using p-values) #p-value approach, with forward selection dx2<-ifelse(bmt$dgroup==2, 1, 0); dx3<-ifelse(bmt$dgroup==3, 1, 0) ptage<-bmt$PtAge; dnage<-bmt$DonAge; ptdn.intx<-bmt$PtAge*bmt$DonAge fab<-bmt$FAB tttrans<-bmt$TTTrans mtx<-bmt$MTX ptsex<-bmt$PtSex; dnsex<-bmt$DonSex; sx.intx<-bmt$PtSex*bmt$DonSex h2<-ifelse(bmt$Hosp==2, 1, 0); h3<-ifelse(bmt$Hosp==3, 1, 0); h4<-ifelse(bmt$Hosp==4, 1, 0) cov1a<-cbind(dx2, dx3, fab) cov1b<-cbind(dx2, dx3, ptage, dnage, ptdn.intx) cov1c<-cbind(dx2, dx3, ptsex, dnsex, sx.intx) cov1d<-cbind(dx2, dx3, tttrans) cov1e<-cbind(dx2, dx3, mtx) cov1f<-cbind(dx2, dx3, h2,h3,h4) rreg.crra<-crr(etime, etype, cov1a, failcode=1) #p FAB = 0.0039 rreg.crrb<-crr(etime, etype, cov1b, failcode=1) #p Pt/Dn age = 0.80, 0.23, 0.39 rreg.crrc<-crr(etime, etype, cov1c, failcode=1) #p Pt/Dn sex = 0.36, 0.65, 0.69 rreg.crrd<-crr(etime, etype, cov1d, failcode=1) #p TTTrans = 0.055 rreg.crre<-crr(etime, etype, cov1e, failcode=1 #p MTX = 0.73 rreg.crrf<-crr(etime, etype, cov1f, failcode=1) #p Hops = 0.94
Example (forward using p-values) #Final Model (choosing p<0.1) > rreg.crra<-crr(etime, etype, cov2a, failcode=1) > summary(rreg.crra) Competing Risks Regression Call: crr(ftime = etime, fstatus = etype, cov1 = cov2a, failcode = 1) coef exp(coef) se(coef) z p-value dx2 -1.58888 0.204 0.481107 -3.303 0.00096 dx3 -0.40978 0.664 0.500140 -0.819 0.41000 fab 1.20707 3.344 0.423032 2.853 0.00430 tttrans -0.00105 0.999 0.000562 -1.875 0.06100 exp(coef) exp(-coef) 2.5% 97.5% dx2 0.204 4.898 0.0795 0.524 dx3 0.664 1.506 0.2491 1.769 fab 3.344 0.299 1.4593 7.661 tttrans 0.999 1.001 0.9978 1.000 Num. cases = 137 Pseudo Log-likelihood = -187 Pseudo likelihood ratio test = 25 on 4 df,
Automated Model Selection There is an automated selection algorithm for the fine and Gray Model Traditional selection criteria include AIC = -2logLp + 2p BIC = -2logLp + plogn Alternative proposed in the literature BICcr = -2logLp + plog(n*)
BICcr selection criteria Should select a more parsimonious model than the AIC, and has a less stringent penalty than the BIC. Provides a good compromise for working with the Fine and Gray model for competing risk data.
Crrstep function in R R package called crrstep implements stepwise model selection for the Fine and Gray competing risks model. Available selection criterion include AIC, BIC, or BICcr selection criteria for choosing covariates.
BMT Example ###Automated Approach library(crrstep) mAIC<-crrstep(etime~factor(dgroup)+FAB+TTTrans+MTX+PtSex+DonSex+PtAge+ DonAge+PtAge*DonAge, etype=etype, data=bmt, direction="forward", failcode=1, criterion = "AIC") mBIC<-crrstep(etime~factor(dgroup)+FAB+TTTrans+MTX+PtSex+DonSex+PtAge+ DonAge+PtAge*DonAge, etype=etype, data=bmt, direction="forward", failcode=1, criterion = "BIC") mBICcr<-crrstep(etime~factor(dgroup)+FAB+TTTrans+MTX+PtSex+DonSex+PtAge+ DonAge+PtAge*DonAge, etype=etype, data=bmt, direction="forward", failcode=1, criterion = "BICcr")
BMT Example (AIC) > mAIC<-crrstep(etime~factor(dgroup)+FAB+TTTrans+MTX+PtSex+DonSex+PtAge+DonAge+PtAge*DonAge, etype=etype, data=bmt, direction="forward", failcode=1, criterion = "AIC") NULL AIC +FAB 390.38 +factor(dgroup) 390.46 <none> 398.60 +PtSex 399.53 +TTTrans 399.54 +DonSex 399.69 +MTX 400.36 +PtAge:DonAge 400.47 +PtAge 400.54 +DonAge 400.58 [1] "FAB"
BMT Example (AIC) > mAIC<-crrstep(etime~factor(dgroup)+FAB+TTTrans+MTX+PtSex+DonSex+PtAge+DonAge+PtAge*DonAge, etype=etype, data=bmt, direction="forward", failcode=1, criterion = "AIC") … [1] "FAB" "factor(dgroup)" "TTTrans" AIC <none> 381.57 +DonSex 383.08 +PtSex 383.19 +PtAge:DonAge 383.25 +MTX 383.26 +DonAge 383.58 +PtAge 384.73
Comparison AIC to p-value approach > mAIC $coefficients estimate std.error t-stat FAB 1.21000 0.423000 2.850 factor(dgroup)2 -1.59000 0.481000 3.300 factor(dgroup)3 -0.41000 0.500000 0.819 TTTrans -0.00105 0.000562 1.870 $log.likelihood [1] -186.79 > summary(rreg.crra) Competing Risks Regression Call: crr(ftime = etime, fstatus = etype, cov1 = cov2a, failcode = 1) coef exp(coef) se(coef) z p-value dx2 -1.58888 0.204 0.481107 -3.303 0.00096 dx3 -0.40978 0.664 0.500140 -0.819 0.41000 fab 1.20707 3.344 0.423032 2.853 0.00430 tttrans -0.00105 0.999 0.000562 -1.875 0.06100 Pseudo Log-likelihood = -187
Comparison of Three Criterion > mAIC estimate std.error t-stat FAB 1.21000 0.423000 2.850 factor(dgroup)2 -1.59000 0.481000 3.300 factor(dgroup)3 -0.41000 0.500000 0.819 TTTrans -0.00105 0.000562 1.870 > mBIC FAB 1.220 0.422 2.890 factor(dgroup)2 1.360 0.485 2.800 factor(dgroup)3 -0.303 0.499 0.608 > mBICcr factor(dgroup)2 -1.360 0.485 2.800
One Final Note Competing risk regression generally assumes that events are mutually exclusive BMT data this isn’t true We can’t really look at relapse as competing for death Joint Frailty modeling offers an alternative and can be implemented using frailtypack Idea is to model recurrent events jointly with some terminal event
Next Time A little about sample size estimation and power