Lecture 14: CoxPHM III Model building.

Lecture 14: CoxPHM III Model building

Model Building using PHM
Regression uses Adjust for potential confounders when a specific hypothesis is to be tested Predict the distribution of the time to some event from a list of explanatory variables (“prediction”)

Adjustment Approach Perform “unadjusted” analysis of factor of interest with time to event outcome Repeat for “confounders” Include factor of interest and significant confounders One at a time (Kl & Mo) “full” model Use p-values to determine if confounders should be included/retained or not

Other Approaches AIC = Akaike information criteria (1973), where p = number of parameters in the model BIC = Bayesian information criteria (Schwartz, 1978) See also DIC (deviance information criteria, Spiegelhalter, 2002)

Pros and Cons Chi-square statistics behave differently under large and small sample situations BIC and AIC give you “yes” and “no” or best fit model without level of evidence BIC has larger sample size adjustment AIC and BIC can compare non-nested models

Prediction Approach Interested in identifying set of variables to model survival (or use as “base” model for future testing) Look at individual predictors Start model building with most predictive Then add predictors based on significance

Model Building via Subset Selection
Forward entry Start with most significant Continue until criterion is met (e.g. p > 0.05) Entered variables not re-evaluated Backward elimination Put all variables in model that meet some criteria (e.g. p <0.20) Remove least significant variables and repeat until a criterion is met (e.g. p > 0.05) These will not be reconsidered again Stepwise selection

Comments on Stepwise Model Building
Can be computer automated Provide list of all possible covariates to include Forward, backward or stepwise All are “stepwise” optimal Do not consider scientific relevance Problematic when interactions are of interest

Problems with Subset Selection
Can be systematic but still may not provide a clear best subset Done post-hoc and often can’t be replicated Subset selection is also discontinuous Result  subset selection is often unstable and highly variable, particularly when p is large small changes in the data can results in very different estimates

Problems with Subset Selection
Models identified by subset selection are reported as if the model was specified a priori Violates statistical principles for estimation and hypothesis testing: standard errors biased low p-values falsely small regression coefficients biased away from zero

Penalized Regression In statistical testing we tend to believe that, without tremendous evidence to the contrary, the null hypothesis is true Tend to think most of the model coefficients are closer to zero We are arguing that large coefficients are “less likely” Penalized regression is an alternative in which we penalize coefficients the further they get from zero

Penalized Regression Introduces a penalty term to address some of the issues with regression approaches M(q ): Objective function p(q ): penalizes “less realistic” values of the b ’s The regularization (shrinkage) parameter l: controls the tradeoff between bias and variance

Penalty Function & Regularization Parameter
The penalty function takes the form: l controls the trade-off between the penalty and the fit. l is too small  the tendency is to over-fit resulting in a model with large variance l is too large  the model may be under-fit yielding a model that is too simple

Other Considerations Predictors can vary in magnitude (and type)
Consider a predictor that takes values [0,1] and a another that takes values [0, ] In this case, certainly a one unit change is not equivalent in the two predictors q Thus, predictors are usually standardized prior to fitting the model to have m = 0 and s = 1

Common Penalized Regression Models
L2 – penalty: Ridge Regression L1 – penalty: Lasso Regression Least absolute shrinkage and selection operator L1+L2: Elastic Net

Penalized Regression in Cox Models
Implementation requires efficient algorithms to solve the penalty expression Relatively straight forward in linear models However, much more computationally demanding for Cox models In 2010, Goemann presented a full gradient algorithm to estimate penalized Cox models Implemented in the penalized package in R

Example of Model Building: BMT
Remember our BMT data examining the association between time to relapse or death and disease type ALL, AML low risk, AML high risk Our primary question is whether on not time to event is associated with disease type controlling for other factors

ALL/AML example Z1 is an indicator of disease type
Possible Confounders: Z2 : Waiting time from diagnosis Z3 : French/American/British classification for AML patients Z4 : Indicator of whether patient was given MTX (methotrexate) Z5, Z6, …, Z13: additional covariates

R: Univariate Model Is disease type significant in model by itself?
> data<-read.csv("H:\\public_html\\BMTRY_722_Summer2019\\Data\\BMT_1_3.csv") > DFS<-ifelse(data$Death==0 & data$Relapse==0, data$TTR, NA) > DFS<-ifelse(data$Relapse==1, data$TTR, DFS) > DFS<-ifelse(data$Death==1 & data$TTR>=data$TTD, data$TTD, DFS) > event<-ifelse(data$Death==1 | data$Relapse==1, 1, 0) > bmt<-cbind(data, DFS, event) > st<-Surv(bmt$DFS, bmt$event) > coxph(st~factor(bmt$Disease)) Call: coxph(formula = st ~ factor(bmt$Disease)) coef exp(coef) se(coef) z p factor(bmt$Disease) factor(bmt$Disease) Likelihood ratio test=13.4 on 2 df, p= n= 137, number of events= 83

Add Single Covariates > ### Models with Disease + another covariate
> nl<-coxph(st~factor(Disease), data=bmt) > m1<-coxph(st~factor(Disease) + FAB, data=bmt) > summary(m1)$coef coef exp(coef) se(coef) z Pr(>|z|) factor(Disease) factor(Disease) FAB > lrt1<-2*(m1$loglik[2]-nl$loglik[2]) > pval1a<-pchisq(lrt1, df=1, lower=F) > pval1a [1]

Add Single Covariates > ### Models with Disease + another covariate > nl<-coxph(st~factor(Disease), data=bmt) > m2<-coxph(st~factor(Disease) + PtAge + DonAge + PtAge*DonAge, data=bmt) > summary(m2)$coef coef exp(coef) se(coef) z Pr(>|z|) factor(Disease) factor(Disease) PtAge DonAge PtAge:DonAge

Add Single Covariates > ### Models with Disease + “age” > lrt2<-2*(m2$loglik[2]-nl$loglik[2]) > pval2a<-pchisq(lrt1, df=3, lower=F) > pval2a [1] > wald2<-t(m2$coef[3:5])%*%solve(m2$var[3:5,3:5])%*%m2$coef[3:5] > pval2b<-pchisq(wald1, df=3, lower=F) > pval2b [,1] [1,]

Local Tests for Each Variable (adjusted for disease type)
Wald c2 p LRT c2 Age 3 12.01 0.007 10.13 0.017 Gender 1.91 0.595 1.85 0.605 FAB class 1 8.10 0.004 8.29 CMV Status 0.19 0.980 0.18 Waiting Time 1.18 0.278 1.34 0.248 MTX 2.02 0.155 1.94 0.164 Hospital 9.37 0.025 9.05 0.029

“Best” Model Based on p-values
> ###Fitting full model using p-values (forward step-wise) > r1<-coxph(st~factor(Disease) + FAB + PtAge + DonAge + PtAge*DonAge, data=bmt) > lrt<-2*(r1$loglik[2]-m3$loglik[2]) > p1a<-pchisq(lrt, df=3, lower=F) > wald1<-r1$coef[4:6]%*%solve(r1$var[4:6,4:6])%*%r1$coef[4:6] > p1b<-pchisq(wald1, df=3, lower=F) > p1a [1] > p1b [,1] [1,] … > r3<-coxph(st~factor(Disease) + FAB + PtAge + DonAge + PtAge*DonAge + + factor(Hosp) + TTTrans, data=bmt) > lrt<-2*(r3$loglik[2]-r2$loglik[2]) > p3a<-pchisq(lrt, df=1, lower=F) > p3a [1]

Best Model via P-values
Variable Coefficient exp(Coef) SE (Coef) Lower CI Upper CI p AML Low Risk -0.776 0.460 0.364 0.226 0.939 0.033 AML High Risk -0.238 0.788 0.358 0.391 1.589 0.506 FAB 0.834 2.303 0.282 1.325 4.004 0.003 Patient Age -0.098 0.906 0.038 0.842 0.976 0.009 Donor Age -0.082 0.921 0.030 0.868 0.977 0.006 Pt Age*Don Age 0.777 2.175 0.339 1.119 4231 0.022 Hospital (Alferd) -0.277 0.758 0.337 0.392 1.467 0.411 Hospital (St. Vincent) -0.888 0.420 0.181 0.938 0.035 Hospital (Hahnemann) 0.004 1.004 0.0001 1.001 1.005 <0.001

“Best” Model Based on AIC
> r1<-coxph(st~factor(Disease), data=bmt) > aic1<--2*r1$loglik[2]+2*2 > aic1 [1] > r2<-coxph(st~factor(Disease) + FAB, data=bmt) > aic2<--2*r2$loglik[2]+2*3 > aic2 [1] … > r5<-coxph(st~factor(Disease) + FAB + PtAge + DonAge + PtAge*DonAge + factor(Hosp) + TTTrans, data=bmt) > aic5<--2*r5$loglik[2]+2*10 > aic5 [1]

Best Model via AIC Model AIC Disease Type 737.14 Disease Type + FAB
730.85 Disease Type + FAB + PtAge + DonAge + PtAge*DonAge 725.79 Disease Type + FAB + PtAge + DonAge + PtAge*DonAge + Hospital 719.58 Disease Type + FAB + PtAge + DonAge + PtAge*DonAge + Hospital + Transplant Waiting Time 720.62

Models via Penalized Regression
First we have to normalized the continuous predictors: ###Fitting model for Disease while controlling for other factors using penalized regression > library(penalized) ### Scaling and centering the continuous covariates > Disease<-bmt$Disease > PtAge<-scale(bmt$PtAge) > DonAge<-scale(bmt$DonAge) > TTTrans<-scale(bmt$TTTrans) > FAB<-bmt$FAB > MTX<-bmt$MTX > PtSex<-bmt$PtSex > DonSex<-bmt$DonSex > Hosp<-bmt$Hosp > sbmt<-cbind(DFS, event, Disease, FAB, PtAge, DonAge, TTTrans, MTX, PtSex, DonSex, Hosp) > colnames(sbmt)[5:7]<-c("PtAge", "DonAge","TTTrans") > sbmt<-as.data.frame(sbmt)

Full Model (no penalty)
> fullmod<-coxph(st ~factor(Disease) + FAB + PtAge + DonAge + PtAge*DonAge + MTX + PtSex + DonSex + PtSex*DonSex + TTTrans + factor(Hosp), data = sbmt) Warning message: In coxph(st ~ factor(Disease) + FAB + PtAge + DonAge + PtAge * DonAge + : X matrix deemed to be singular; variable 11 > fullmod Call: coxph(formula = st ~ factor(Disease) + FAB + PtAge + DonAge + PtAge * DonAge + MTX + PtSex + DonSex + PtSex * DonSex + TTTrans + factor(Hosp), data = sbmt) coef exp(coef) se(coef) z p factor(Disease) factor(Disease) FAB PtAge DonAge MTX PtSex DonSex TTTrans factor(Hosp) factor(Hosp)3 NA NA NA NA factor(Hosp) PtAge:DonAge PtSex:DonSex Likelihood ratio test=47.1 on 13 df, p=9.15e-06 n= 137, number of events= 83

Ridge Model > set.seed(148) > ridg_pen<-optL2(st, unpenalized = ~factor(Disease), penalized = ~+ FAB + PtAge + DonAge + PtAge*DonAge + MTX + PtSex + DonSex + PtSex*DonSex + TTTrans + factor(Hosp), fold=5, standardize=FALSE, data=sbmt) lambda= Inf cvl= lambda= 1 cvl= lambda= 10 cvl= lambda= 100 cvl= lambda= cvl= … lambda= cvl= > names(ridg_pen) [1] "lambda" "cvl" "predictions" "fold" "fullfit" > ridg_pen$lambda [1] > ridg_pen$fullfit Penalized cox regression object 15 regression coefficients Loglikelihood = L2 penalty = at lambda2 = > slotNames(ridg_pen$fullfit) [1] "penalized" "unpenalized" "residuals" "fitted" "lin.pred" "loglik" "penalty" "iterations" "converged" "model" [11] "lambda1" "lambda2" "nuisance" "weights" "formula"

Ridge Model > 4) factor(Disease)2 factor(Disease) > 4) FAB PtAge DonAge MTX PtSex DonSex TTTrans factor(Hosp)1 factor(Hosp)2 factor(Hosp)3 factor(Hosp) PtAge:DonAge PtSex:DonSex

Lasso Model > set.seed(148) > lass_pen<-optL1(st, unpenalized = ~factor(Disease), penalized = ~+ FAB + PtAge + DonAge + PtAge*DonAge + MTX + PtSex + DonSex + PtSex*DonSex + TTTrans + factor(Hosp), fold=5, standardize=FALSE, data=sbmt) lambda= cvl= … lambda= cvl= > lass_pen$fullfit Penalized cox regression object 15 regression coefficients of which 4 are non-zero Loglikelihood = L1 penalty = at lambda1 = > 4) FAB PtAge DonAge MTX PtSex DonSex TTTrans factor(Hosp) factor(Hosp)2 factor(Hosp)3 factor(Hosp)4 PtAge:DonAge PtSex:DonSex > 4) factor(Disease)2 factor(Disease)

Lambda vs. cvl > lass_pen$lambda [1] > set.seed(148) > fl<-profL1(st, unpenalized = ~factor(Disease), penalized = ~+ FAB + PtAge + DonAge + PtAge*DonAge + MTX + PtSex + DonSex + PtSex*DonSex + TTTrans + factor(Hosp), minlambda1=0.1, maxlambda1=25, fold=5, data=bmt, log=FALSE, plot=TRUE)

Elastic Net Model 1 > set.seed(148) > en_pen<-optL2(st, unpenalized = ~factor(Disease), penalized = ~+ FAB + PtAge + DonAge + PtAge*DonAge + MTX + PtSex + DonSex + PtSex*DonSex + TTTrans + factor(Hosp), lambda1=lass_pen$lambda, fold=5, standardize=FALSE, data=sbmt) lambda= Inf cvl= lambda= 1 cvl= lambda= 10 cvl= lambda= 0.1 cvl= lambda= 0.01 cvl= lambda= cvl= lambda= cvl= … lambda= cvl= > 4) FAB PtAge DonAge MTX PtSex DonSex TTTrans factor(Hosp)1 factor(Hosp)2 factor(Hosp)3 factor(Hosp)4 PtAge:DonAge PtSex:DonSex > 4) factor(Disease)2 factor(Disease)

Elastic Net Model 2 > set.seed(148) > en_pen1<-optL1(st, unpenalized = ~factor(Disease), penalized = ~+ FAB + PtAge + DonAge + PtAge*DonAge + MTX + PtSex + DonSex + PtSex*DonSex + TTTrans + factor(Hosp), lambda2=ridg_pen$lambda, fold=5, standardize=FALSE, data=sbmt) lambda= cvl= lambda= cvl= … lambda= e-08 cvl= lambda= e-08 cvl= > 4) FAB PtAge DonAge MTX PtSex DonSex TTTrans factor(Hosp)1 factor(Hosp)2 factor(Hosp)3 factor(Hosp)4 PtAge:DonAge PtSex:DonSex > 4) factor(Disease)2 factor(Disease)

Elastic Net Model 3 > en_pen<-penalized(st, unpenalized = ~factor(Disease), penalized = ~+ FAB + PtAge + DonAge + PtAge*DonAge + MTX + PtSex + DonSex + PtSex*DonSex + TTTrans + factor(Hosp), lambda1=lass_pen$lambda, lambda2=ridg_pen$lambda, standardize=FALSE, data=sbmt) # non-zero ceofficients: 4 > 4) FAB PtAge DonAge MTX PtSex DonSex TTTrans factor(Hosp)1 factor(Hosp)2 factor(Hosp)3 factor(Hosp)4 PtAge:DonAge PtSex:DonSex > 4) factor(Disease)2 factor(Disease)

Comparison of Coefficients
Full Ridge Lasso ElasticNet (l Lasso) Elastic Net (l Ridge) (l both) AML Low Risk -0.842 -0.755 -0.678 -0.677 -0.637 AML High Risk -0.307 0.012 0.177 0.178 0.246 FAB 0.812 0.284 0.118 0.043 Patient Age -0.011 0.027 Donor Age 0.199 0.053 MTX -0.275 0.117 Patient Sex -0.045 -0.074 Donor Sex 0.142 -0.008 Time To Transplant -0.112 -0.075 Hospital (OSU) NA 0.071 Hospital (Alferd) 0.962 0.200 Hospital (St. Vinc) -0.083 Hospital (Hahn) -1.006 -0.187 P.Age x D.Age 0.348 0.258 0.227 0.187 P.Sex x D.Sex -0.291 -0.100

Unpenalized Model Penalized regression yields downwardly biased estimates of the regression coefficients We can refit an unpenalized model selected based on penalization… > refit<-coxph(st~factor(Disease) + FAB + PtAge + DonAge + PtAge*DonAge, data=bmt) > round(summary(refit)$coef, 4) coef exp(coef) se(coef) z Pr(>|z|) factor(Disease) factor(Disease) FAB PtAge DonAge PtAge:DonAge

Identifying “Best” Predictive Model
P-values? Not useful for determining if model is best predictor Why? Covariates with significant p-values can add more “noise” than signal to performance Other approaches R2 type statistic “Absolute prediction error” C-index/Somer’s Dxy (similar to AUC)

R2 Approach Due to censoring, measure is not as straightforward as in linear regression Think of as the fraction of the log-likelihood explained by the model relative to the log-likelihood for a perfect model Penalized for complexity of model

R2 Calculations First proposed method for estimating R2
Later, Maddala and Magee proposed alternative method (output in R)

R2 Approach Measure is fairly good if proportion of censored observations small Sensitive to the proportion of censored observations E(R2) decreases substantially as a function of % censored observations Early censoring has greater impact than later censoring R2 values can decrease by as much as 20% under heavy censoring (> 50% censored) e.g. R2 from 0.5 to 0.4

Absolute Prediction Error
Expected value of the absolute value between observed and predicted responses More interpretable than others (e.g. those that use squared differences vs. absolute value) Makes the critical difference between association and prediction

C-Index Proportion all pairs of subjects whose survival times can be ordered s.t subject with the higher predicted survival also the one whose survival is larger Also think of as probability of concordance between observed and predicted Relationship to Somer’s D

Predictive Model: R2 >###Building a predictive model using R2 >p1<-coxph(st~factor(Disease), data=bmt) > summary(p1)$rsq[1] rsq p2<-coxph(st~factor(Disease) + FAB, data=bmt) p3<-coxph(st~factor(Disease) + FAB + PtAge + DonAge + PtAge*DonAge, data=bmt) p4<-coxph(st~factor(Disease) + FAB + PtAge + DonAge + PtAge*DonAge + factor(Hosp), data=bmt) p5<-coxph(st~factor(Disease) + FAB + PtAge + DonAge + PtAge*DonAge + factor(Hosp)+ MTX, data=bmt) p6<-coxph(st~factor(Disease) + FAB + PtAge + DonAge + PtAge*DonAge + factor(Hosp)+ MTX + PtSex + DonSex + PtSex*DonSex, data=bmt) p7<-coxph(st~factor(Disease) + FAB + PtAge + DonAge + PtAge*DonAge + factor(Hosp)+ MTX + PtSex + DonSex + PtSex*DonSex + TTTrans, data=bmt) R2<-c(summary(p1)$rsq[1], summary(p2)$rsq[1], summary(p3)$rsq[1], summary(p4)$rsq[1], summary(p5)$rsq[1], summary(p6)$rsq[1], summary(p7)$rsq[1]) mod<-1:7 plot(mod, R2, xlab="Model", ylab="R^2", type="l", ylim=c(0,.3)) points(mod, R2, pch=16, col=2)

Comparing R2 for Each Model

Predictive Model Validation
Ideally we should validate our choice of predictive model There are several options Divide data in training and test sets Fit model on single training set Evaluate performance on test set Cross-validation Divide data into k equally sized datasets Fit model to data with kth set removed Evaluate performance on set left out and average over the k sets Bootstrap validation Same idea a cross-validation but using bootstrapped sample

rms Package > ### Cross-validation/boostrapping > library(rms) > bmt$Disease<-as.factor(bmt$Disease); bmt$Hosp<-as.factor(bmt$Hosp) > new.r5<-cph(st~Disease + FAB + PtAge + DonAge + PtAge*DonAge + Hosp + TTTrans, data=bmt, x=TRUE, y=TRUE, surv=TRUE) > v.cv<-validate(new.r5, method="crossvalidation", B=5) > v.cv index.orig training test optimism index.corrected n R Slope D U Q g > v.bs<-validate(new.r5, method="boot", B=100) > v.bs index.orig training test optimism index.corrected n R Slope D U Q g

Next Time Time-varying covariates

Lecture 14: CoxPHM III Model building.

Similar presentations

Presentation on theme: "Lecture 14: CoxPHM III Model building."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 14: CoxPHM III Model building.

Similar presentations

Presentation on theme: "Lecture 14: CoxPHM III Model building."— Presentation transcript:

Similar presentations

About project

Feedback