Lecture 3: Parametric Survival Modeling

Slides:



Advertisements
Similar presentations
Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
STA/BST 222 Fall 2011 Nov 22,  K-M estimate  COX MODEL  AFT MODEL.
Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
HSRP 734: Advanced Statistical Methods July 24, 2008.
BIO503: Lecture 4 Statistical models in R --- Recap --- Stefan Bentink
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
1 Statistics 262: Intermediate Biostatistics Kaplan-Meier methods and Parametric Regression methods.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Part 21: Hazard Models [1/29] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.
Chapter 11 Survival Analysis Part 3. 2 Considering Interactions Adapted from "Anderson" leukemia data as presented in Survival Analysis: A Self-Learning.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Biostatistics in Research Practice Time to event data Martin Bland Professor of Health Statistics University of York
1 BAMS 580B Lecture 2 Part 1 – LTC Planning. 2 Topics  LTC Capacity Planning  Objectives  Approaches LBH Deterministic Model – Parameter Estimation.
Introduction to Survival Analysis
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Linear and generalised linear models
Modeling clustered survival data The different approaches.
Accelerated Failure Time (AFT) Model As An Alternative to Cox Model
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
1 Kaplan-Meier methods and Parametric Regression methods Kristin Sainani Ph.D. Stanford University Department of Health.
17. Duration Modeling. Modeling Duration Time until retirement Time until business failure Time until exercise of a warranty Length of an unemployment.
STT : BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 8: Fitting Parametric Regression Models STT
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
NASSER DAVARZANI DEPARTMENT OF KNOWLEDGE ENGINEERING MAASTRICHT UNIVERSITY, 6200 MAASTRICHT, THE NETHERLANDS 22 OCTOBER 2012 Introduction to Survival Analysis.
HSRP 734: Advanced Statistical Methods July 10, 2008.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Lecture 2: Key Functions and Parametric Distributions Survival Function Hazard Function Median Survival Common Parametric Distributions.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
STT : Biostatistics Analysis Dr. Cuixian Chen
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Bayesian Analysis and Applications of A Cure Rate Model.
Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
HSRP 734: Advanced Statistical Methods July 17, 2008.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
HSRP 734: Advanced Statistical Methods July 31, 2008.
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 9 Survival Analysis Henian Chen, M.D., Ph.D.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Survival Analysis 1 Always be contented, be grateful, be understanding and be compassionate.
Lecture 12: Cox Proportional Hazards Model
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
STT : BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 7: Parametric Survival Models under Censoring STT
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and.
Treat everyone with sincerity,
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
Topic 19: Survival Analysis T = Time until an event occurs. Events are, e.g., death, disease recurrence or relapse, infection, pregnancy.
REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
DURATION ANALYSIS Eva Hromádková, Applied Econometrics JEM007, IES Lecture 9.
[Topic 11-Duration Models] 1/ Duration Modeling.
Transforming the data Modified from:
Classification Methods
CHAPTER 7 Linear Correlation & Regression Methods
CHAPTER 18 SURVIVAL ANALYSIS Damodar Gujarati
Multiple logistic regression
Survival Analysis {Chapter 12}
Parametric Survival Models (ch. 7)
EVENT PROJECTION Minzhao Liu, 2018
Treat everyone with sincerity,
Lecture 2: Key Functions and Parametric Distributions
Presentation transcript:

Lecture 3: Parametric Survival Modeling Parametric models Example and nuances in R

Parametric Distributions We’ve discussed a variety of parametric distributions Exponential, Weibull, log-normal, log-logistic, gamma, …. But… how do we “fit” a model Model parameterizations Inclusion of coefficients

Modeling Homogeneous Population Relatively “simple” Once we’ve determined the distribution we need to estimate the parameters For example, exponential

Covariates Frequently want to adjust survival for covariates Two main approaches Accelerated Failure Time model Multiplicative model

Accelerated Failure Time Under AFT model for two populations expected survival time median survival time Survival at time t for Population 1 are c times that of population 2, where c is constant.

Accelerated Failure Time Data include Failure time T > 0 Vector of covariates Z’=(Z1, Z2, …, Zp) Quantitative Qualitative Log transform T for linear model approach

Accelerated Failure Time When Z = 0, So(t) is survival function of em+sW

Accelerated Failure Time First consider 2 populations that only differ by 1 unit in zk

Accelerated Failure Time First consider 2 populations that only differ by 1 unit in zk

Exponential Models in R Recall: Parameterization is the same for exponential in R rexp(n, rate)

Exponential Models in R We can run an expontial survival model in R using survreg(formula, data, dist) R gives us: But, we can find: In a model with no covariates,

Exponential Models in R The distribution of any T is exponential with constant hazard rate: We can interpret as the hazard ratio corresponding to a 1 unit increase in the covariate

Weibull Models in R Recall: Now our scale parameter is no longer 1 Unlike exponential, the parameterization for Weibull is different in R… Random weibull generation… rweibull(n, shape, scale)

Weibull Models in R Again we can run a Weibull model in R but parameterization different here too… survreg(formula, data, dist) R gives us: But, we can find:

AML Example In R Survival in patients with Acute Myelogenous Leukemia. Data 23 Subjects Time to death Censoring indicator Treatment Standard course of chemotherapy Chemo extended ('maintainance') for additional cycles.

AML Dataset > library(survival) > aml time status x 1 9 1 Maintained 2 13 1 Maintained 3 13 0 Maintained 4 18 1 Maintained 5 23 1 Maintained 6 28 0 Maintained 7 31 1 Maintained 8 34 1 Maintained 9 45 0 Maintained 10 48 1 Maintained 11 161 0 Maintained 12 5 1 Nonmaintained 13 5 1 Nonmaintained 14 8 1 Nonmaintained 15 8 1 Nonmaintained 16 12 1 Nonmaintained … 23 45 1 Nonmaintained

AML model in R: exponential (no covariates) >library(MASS) >library(survival) >sdat<-Surv(aml$time, aml$status) >exp_fit<-survreg(sdat~1, dist=“exponential“) >#exp_fit<-survreg(sdat~1, dist="weibull", scale=1)  alternative >summary(exp_fit) Call: survreg(formula = sdat ~ 1, dist = "exponential") Value Std. Error z p (Intercept) 3.63 0.236 15.4 1.75e-53 Scale fixed at 1 Exponential distribution Loglik(model)= -83.3 Loglik(intercept only)= -83.3 Number of Newton-Raphson Iterations: 4 n= 23

Checking Exponential Model Fit

Model Checks: Exponential ###Model checks for exponential par(mfrow=c(1,3)) lam_hat<-exp(-exp_fit$coefficient) logHt<-log(-log(emp_fit$surv)) logt<-log(emp_fit$time) # Plot log cumulative hazard vs. log time plot(logt, logHt, lwd=2, type="l", xlab="log(t)", ylab="log(H(t))") points(logt, logHt, pch=16) abline(log(lam_hat), 1, lwd=2, col="red") # Second model check: Plot of H(t) vs. t Ht<--log(emp_fit$surv) t<-emp_fit$time plot(t, Ht, lwd=2, type="l", xlab="time", ylab="H(t)") points(t, Ht, pch=16) abline(0,lam_hat, lwd=2, col="red") #Third model check fit.dat<-exp(-lam_hat*c(0:150)) plot(emp_fit, xlab="Time", ylab="Survival Fraction") lines(c(0:150), fit.dat, lwd=2, col=2)

Lets Look at some Specifics for Exponential Exponential Model… 1st estimate lambda 12 month survival ? Median survival ? Mean survival ?

AML model in R: Weibull (no covariates) >weib_fit<-survreg(sdat~1, dist="weibull", scale=0) > summary(weib_fit) Call: survreg(formula = sdat ~ 1, dist = "weibull", scale = 0) Value Std. Error z p (Intercept) 3.6425 0.217 16.780 3.43e-63 Log(scale) -0.0922 0.169 -0.544 5.86e-01 Scale= 0.912 Weibull distribution Loglik(model)= -83.2 Loglik(intercept only)= -83.2 Number of Newton-Raphson Iterations: 5 n= 23

Model Checks: Weibull

Model Checks: Weibull ###Model checks for weibull alp_hat<-1/exp(weib_fit$scale) lam_hat<-exp(-weib_fit$coefficient[1]/exp(weib_fit$scale)) logHt<-log(-log(emp_fit$surv)) logt<-log(emp_fit$time) # Plot log cumulative hazard vs. log time plot(logt, logHt, lwd=2, type="l", xlab="log(t)", ylab="log(H(t))") points(logt, logHt, pch=16) abline(log(lam_hat), alp_hat, lwd=2, col="red") # Plot of survival function vs. empircal fit.dat<-exp(-lam_hat*c(0:150)^alp_hat) plot(emp_fit, xlab="Time", ylab="Survival Fraction") lines(c(0:150), fit.dat, lwd=2, col=2)

Lets Look at some Specifics for Weibull 1st estimate lambda and alpha 12 month survival ? Median survival ? Mean survival ?

Compare Weibull/Exponential Fits to the Empirical Distribution (no covariates)

Empirical Distribution: What about specific times (no covariates)? 12 month survival = 0.74 Median survival = 27

What about relative to the empirical distribution (no covariates)? 12 month survival = 74% Median survival = 27 months Exponential Model: 12 month survival = 73% Median survival = 26.1 months Mean survival = 37.7 months Weibull model: 12 month survival = 75.5% Median survival = 27.3 months Mean survival = 36.9 months

What about covariates….

AML model in R: exponential (with covariate) > exp_fit2<-survreg(sdat~x, dist="exponential", data=aml) > summary(exp_fit2) Call: survreg(formula = sdat ~ x, data = aml, dist = "exponential") Value Std. Error z p (Intercept) 4.101 0.378 10.85 1.96e-27 xNonmaintained -0.958 0.483 -1.98 4.75e-02 Scale fixed at 1 Exponential distribution Loglik(model)= -81.3 Loglik(intercept only)= -83.3 Chisq= 4.06 on 1 degrees of freedom, p= 0.044 Number of Newton-Raphson Iterations: 4 n= 23

Exponential Fit by Group

What about estimates by group? Maintiained? Non-maintained?

AML model in R: Weibull (with covariates) > weib_fit2<-survreg(sdat~x, dist="weibull", data=aml, scale=0) > summary(weib_fit2) Call: survreg(formula = sdat ~ x, data = aml, dist = "weibull", scale = 0) Value Std. Error z p (Intercept) 4.109 0.300 13.70 9.89e-43 xNonmaintained -0.929 0.383 -2.43 1.51e-02 Log(scale) -0.235 0.178 -1.32 1.88e-01 Scale= 0.791 Weibull distribution Loglik(model)= -80.5 Loglik(intercept only)= -83.2 Chisq= 5.31 on 1 degrees of freedom, p= 0.021 Number of Newton-Raphson Iterations: 5 n= 23

Weibull fit

What about estimates by group? Maintiained? Non-maintained?

Exponential and Weibull Fits for Maintained vs. Non-maintained Red = weibull Blue = exponetial

Empirical Distribution: What about specific survival times (with covariate)? Maintained: 12 month survival = 91% Median survival = 31 months Non-Maintained: 12 Month survival = 58% Median survival = 23 months

Comparisons Maintained Empirical: Exponential Model: Weibull model: Non-maintained Empirical: 12 month survival = 91% Median survival = 31 months Exponential Model: 12 month survival = 82% Median survival = 41.9 months Weibull model: 12 month survival = 88% Median survival = 45.9 months Empirical: 12 month survival = 58% Median survival = 23 months Exponential Model: 12 month survival = 60% Median survival = 16.1 months Weibull model: 12 month survival = 66% Median survival = 18 months

Compare Exponential & Empirical Distribution (with covariates)

Compare Weibull & Empirical Distribution (with covariates)

Multiplicative Hazard Rate Models Hazard rate of individual with covariate vector z is: In these models ho(t) may be parametric or arbitrary non-negative function Most common link function proposed by Cox

Multiplicative Hazard Rate Models Key feature is proportional hazards

Multiplicative Hazard Rate Model These parametric models are very similar to semi-parametric Cox proportional hazard models we will discuss later… The AFT models using the exponential/Weibull are also classified as multiplicative models due to their proportional hazards property This is not true for any other parametric distribution Since Cox models are so commonly used, it is rare to see a parametric implementation of these models

Advantages of Parametric Models If we correctly characterize the underlying distribution, our estimates will be more precise than semi- and non-parametric estimates. This means we may have greater power to identify relationships between our outcome and predictors However…

Disadvantages of Parametric Models If we use the wrong distribution problems can arise Distribution often chosen based on the shape of the model without covariates, This can/will change as covariates are added\ Alternatively use intuition/theory about what the dependency is expected to be BUT the time-dependency is what is left over after conditioning on covariates so we are also likely to fail here.

Brief SAS Code  /************************************/ /* Accelerated Failure Time Models */ /*Exponential models: 1st is intercept only, second is with the covariate*/ proc lifereg data=aml; model time*status(0) = /dist=exponential; run; class x; model time*status(0) = x/dist=exponential;

Brief SAS Code  /************************************/ /* Accelerated Failure Time Models */ /*Weibull models: 1st is intercept only, second is with the covariate*/ proc lifereg data=aml; model time*status(0) = /dist=weibull; run; class x; model time*status(0) = x/dist=weibull;

Example of SAS Output Analysis of Maximum Likelihood Parameter Estimates Parameter DF Estimate Standard Error 95% Confidence Limits Chi-Square Pr > ChiSq Intercept 1 3.6288 0.2357 3.1668 4.0907 237.02 <.0001 Scale 1.0000 0.0000 Weibull Scale 37.6667 8.8781 23.7316 59.7843 Weibull Shape Lagrange Multiplier Statistics Parameter Chi-Square Pr > ChiSq Scale 0.3305 0.5654

Next Time Likelihoods!!!