Lecture 3: Parametric Survival Modeling Parametric models Example and nuances in R
Parametric Distributions We’ve discussed a variety of parametric distributions Exponential, Weibull, log-normal, log-logistic, gamma, …. But… how do we “fit” a model Model parameterizations Inclusion of coefficients
Modeling Homogeneous Population Relatively “simple” Once we’ve determined the distribution we need to estimate the parameters For example, exponential
Covariates Frequently want to adjust survival for covariates Two main approaches Accelerated Failure Time model Multiplicative model
Accelerated Failure Time Under AFT model for two populations expected survival time median survival time Survival at time t for Population 1 are c times that of population 2, where c is constant.
Accelerated Failure Time Data include Failure time T > 0 Vector of covariates Z’=(Z1, Z2, …, Zp) Quantitative Qualitative Log transform T for linear model approach
Accelerated Failure Time When Z = 0, So(t) is survival function of em+sW
Accelerated Failure Time First consider 2 populations that only differ by 1 unit in zk
Accelerated Failure Time First consider 2 populations that only differ by 1 unit in zk
Exponential Models in R Recall: Parameterization is the same for exponential in R rexp(n, rate)
Exponential Models in R We can run an expontial survival model in R using survreg(formula, data, dist) R gives us: But, we can find: In a model with no covariates,
Exponential Models in R The distribution of any T is exponential with constant hazard rate: We can interpret as the hazard ratio corresponding to a 1 unit increase in the covariate
Weibull Models in R Recall: Now our scale parameter is no longer 1 Unlike exponential, the parameterization for Weibull is different in R… Random weibull generation… rweibull(n, shape, scale)
Weibull Models in R Again we can run a Weibull model in R but parameterization different here too… survreg(formula, data, dist) R gives us: But, we can find:
AML Example In R Survival in patients with Acute Myelogenous Leukemia. Data 23 Subjects Time to death Censoring indicator Treatment Standard course of chemotherapy Chemo extended ('maintainance') for additional cycles.
AML Dataset > library(survival) > aml time status x 1 9 1 Maintained 2 13 1 Maintained 3 13 0 Maintained 4 18 1 Maintained 5 23 1 Maintained 6 28 0 Maintained 7 31 1 Maintained 8 34 1 Maintained 9 45 0 Maintained 10 48 1 Maintained 11 161 0 Maintained 12 5 1 Nonmaintained 13 5 1 Nonmaintained 14 8 1 Nonmaintained 15 8 1 Nonmaintained 16 12 1 Nonmaintained … 23 45 1 Nonmaintained
AML model in R: exponential (no covariates) >library(MASS) >library(survival) >sdat<-Surv(aml$time, aml$status) >exp_fit<-survreg(sdat~1, dist=“exponential“) >#exp_fit<-survreg(sdat~1, dist="weibull", scale=1) alternative >summary(exp_fit) Call: survreg(formula = sdat ~ 1, dist = "exponential") Value Std. Error z p (Intercept) 3.63 0.236 15.4 1.75e-53 Scale fixed at 1 Exponential distribution Loglik(model)= -83.3 Loglik(intercept only)= -83.3 Number of Newton-Raphson Iterations: 4 n= 23
Checking Exponential Model Fit
Model Checks: Exponential ###Model checks for exponential par(mfrow=c(1,3)) lam_hat<-exp(-exp_fit$coefficient) logHt<-log(-log(emp_fit$surv)) logt<-log(emp_fit$time) # Plot log cumulative hazard vs. log time plot(logt, logHt, lwd=2, type="l", xlab="log(t)", ylab="log(H(t))") points(logt, logHt, pch=16) abline(log(lam_hat), 1, lwd=2, col="red") # Second model check: Plot of H(t) vs. t Ht<--log(emp_fit$surv) t<-emp_fit$time plot(t, Ht, lwd=2, type="l", xlab="time", ylab="H(t)") points(t, Ht, pch=16) abline(0,lam_hat, lwd=2, col="red") #Third model check fit.dat<-exp(-lam_hat*c(0:150)) plot(emp_fit, xlab="Time", ylab="Survival Fraction") lines(c(0:150), fit.dat, lwd=2, col=2)
Lets Look at some Specifics for Exponential Exponential Model… 1st estimate lambda 12 month survival ? Median survival ? Mean survival ?
AML model in R: Weibull (no covariates) >weib_fit<-survreg(sdat~1, dist="weibull", scale=0) > summary(weib_fit) Call: survreg(formula = sdat ~ 1, dist = "weibull", scale = 0) Value Std. Error z p (Intercept) 3.6425 0.217 16.780 3.43e-63 Log(scale) -0.0922 0.169 -0.544 5.86e-01 Scale= 0.912 Weibull distribution Loglik(model)= -83.2 Loglik(intercept only)= -83.2 Number of Newton-Raphson Iterations: 5 n= 23
Model Checks: Weibull
Model Checks: Weibull ###Model checks for weibull alp_hat<-1/exp(weib_fit$scale) lam_hat<-exp(-weib_fit$coefficient[1]/exp(weib_fit$scale)) logHt<-log(-log(emp_fit$surv)) logt<-log(emp_fit$time) # Plot log cumulative hazard vs. log time plot(logt, logHt, lwd=2, type="l", xlab="log(t)", ylab="log(H(t))") points(logt, logHt, pch=16) abline(log(lam_hat), alp_hat, lwd=2, col="red") # Plot of survival function vs. empircal fit.dat<-exp(-lam_hat*c(0:150)^alp_hat) plot(emp_fit, xlab="Time", ylab="Survival Fraction") lines(c(0:150), fit.dat, lwd=2, col=2)
Lets Look at some Specifics for Weibull 1st estimate lambda and alpha 12 month survival ? Median survival ? Mean survival ?
Compare Weibull/Exponential Fits to the Empirical Distribution (no covariates)
Empirical Distribution: What about specific times (no covariates)? 12 month survival = 0.74 Median survival = 27
What about relative to the empirical distribution (no covariates)? 12 month survival = 74% Median survival = 27 months Exponential Model: 12 month survival = 73% Median survival = 26.1 months Mean survival = 37.7 months Weibull model: 12 month survival = 75.5% Median survival = 27.3 months Mean survival = 36.9 months
What about covariates….
AML model in R: exponential (with covariate) > exp_fit2<-survreg(sdat~x, dist="exponential", data=aml) > summary(exp_fit2) Call: survreg(formula = sdat ~ x, data = aml, dist = "exponential") Value Std. Error z p (Intercept) 4.101 0.378 10.85 1.96e-27 xNonmaintained -0.958 0.483 -1.98 4.75e-02 Scale fixed at 1 Exponential distribution Loglik(model)= -81.3 Loglik(intercept only)= -83.3 Chisq= 4.06 on 1 degrees of freedom, p= 0.044 Number of Newton-Raphson Iterations: 4 n= 23
Exponential Fit by Group
What about estimates by group? Maintiained? Non-maintained?
AML model in R: Weibull (with covariates) > weib_fit2<-survreg(sdat~x, dist="weibull", data=aml, scale=0) > summary(weib_fit2) Call: survreg(formula = sdat ~ x, data = aml, dist = "weibull", scale = 0) Value Std. Error z p (Intercept) 4.109 0.300 13.70 9.89e-43 xNonmaintained -0.929 0.383 -2.43 1.51e-02 Log(scale) -0.235 0.178 -1.32 1.88e-01 Scale= 0.791 Weibull distribution Loglik(model)= -80.5 Loglik(intercept only)= -83.2 Chisq= 5.31 on 1 degrees of freedom, p= 0.021 Number of Newton-Raphson Iterations: 5 n= 23
Weibull fit
What about estimates by group? Maintiained? Non-maintained?
Exponential and Weibull Fits for Maintained vs. Non-maintained Red = weibull Blue = exponetial
Empirical Distribution: What about specific survival times (with covariate)? Maintained: 12 month survival = 91% Median survival = 31 months Non-Maintained: 12 Month survival = 58% Median survival = 23 months
Comparisons Maintained Empirical: Exponential Model: Weibull model: Non-maintained Empirical: 12 month survival = 91% Median survival = 31 months Exponential Model: 12 month survival = 82% Median survival = 41.9 months Weibull model: 12 month survival = 88% Median survival = 45.9 months Empirical: 12 month survival = 58% Median survival = 23 months Exponential Model: 12 month survival = 60% Median survival = 16.1 months Weibull model: 12 month survival = 66% Median survival = 18 months
Compare Exponential & Empirical Distribution (with covariates)
Compare Weibull & Empirical Distribution (with covariates)
Multiplicative Hazard Rate Models Hazard rate of individual with covariate vector z is: In these models ho(t) may be parametric or arbitrary non-negative function Most common link function proposed by Cox
Multiplicative Hazard Rate Models Key feature is proportional hazards
Multiplicative Hazard Rate Model These parametric models are very similar to semi-parametric Cox proportional hazard models we will discuss later… The AFT models using the exponential/Weibull are also classified as multiplicative models due to their proportional hazards property This is not true for any other parametric distribution Since Cox models are so commonly used, it is rare to see a parametric implementation of these models
Advantages of Parametric Models If we correctly characterize the underlying distribution, our estimates will be more precise than semi- and non-parametric estimates. This means we may have greater power to identify relationships between our outcome and predictors However…
Disadvantages of Parametric Models If we use the wrong distribution problems can arise Distribution often chosen based on the shape of the model without covariates, This can/will change as covariates are added\ Alternatively use intuition/theory about what the dependency is expected to be BUT the time-dependency is what is left over after conditioning on covariates so we are also likely to fail here.
Brief SAS Code /************************************/ /* Accelerated Failure Time Models */ /*Exponential models: 1st is intercept only, second is with the covariate*/ proc lifereg data=aml; model time*status(0) = /dist=exponential; run; class x; model time*status(0) = x/dist=exponential;
Brief SAS Code /************************************/ /* Accelerated Failure Time Models */ /*Weibull models: 1st is intercept only, second is with the covariate*/ proc lifereg data=aml; model time*status(0) = /dist=weibull; run; class x; model time*status(0) = x/dist=weibull;
Example of SAS Output Analysis of Maximum Likelihood Parameter Estimates Parameter DF Estimate Standard Error 95% Confidence Limits Chi-Square Pr > ChiSq Intercept 1 3.6288 0.2357 3.1668 4.0907 237.02 <.0001 Scale 1.0000 0.0000 Weibull Scale 37.6667 8.8781 23.7316 59.7843 Weibull Shape Lagrange Multiplier Statistics Parameter Chi-Square Pr > ChiSq Scale 0.3305 0.5654
Next Time Likelihoods!!!