Lecture 2: Key Functions and Parametric Distributions

Lecture 2: Key Functions and Parametric Distributions
Survival Function Hazard Function Median Survival Common Parametric Distributions

But First Let’s think a little more about censoring and truncation using an example… An investigator is interested in determining if treatment with amoxetine leads to recovery of cognitive function in rats with brain lesions that mimic Parkinson’s disease. The outcome of interest is time to “complete” recovery of cognitive function i.e. time it takes to return to baseline cognitive function after treatment with amoxetine.

Amoxetine and Cognitive Function
Collect baseline measure of cognitive function Time to correctly perform water radial arm maze (WARM) task Induce cognitive impairment Treat 4 week old rats with N-(2-chloroethyl)-N-ethyl-bromo-benzylamine (DSP-4) causes noradrenergic lesions in the locus coeruleus. Treat lesioned animals with Amoxetine daily dose for 4 weeks (ages 4 to 8 weeks) 0, 0.3, 1.0, or 3.0 mg/kg Measures cognitive performance post treatment weekly for 16 weeks (ages 8 to 24 weeks) Endpoint: time it takes reach >75% baseline cognitive function

Describe the type of censoring
Rat survives to 24 weeks of age but never achieves complete cognitive recovery Rat does not achieve complete cognitive recovery at 12 weeks but does by 13 weeks Rat that dies at 16 weeks but has not yet achieve complete cognitive recovery

Describe the type of censoring
Rat doesn’t develop brain lesions due to misplaced DSP-4 treatment and shows complete cognitive recovery at 8 weeks Rat shows complete cognitive recovery 8 at weeks

Let’s Draw These 5 Animals

Time to Event Outcomes Modeled using “survival analysis”
Define X = time to event X is a random variable Realizations of X are denoted x X > 0 Key characterizing functions Survival functions Hazard rate (or function) Probability density function Mean residual life

PDF, survival function, hazard rate, and mean residual life
S(x) f(x)

PDF, survival function, hazard rate, and mean residual life
h(x) mrl(x)

Survival Function S(x) = the probability of an individual surviving to time x Basic properties:

Types of time to event data
Continuous t Observe actual time Discrete t Interval censoring Grouping into intervals Where p(xj) is the probability mass function, P(X = xj)

Example of Discrete Time to Event
Discrete Uniform (3 times possible)

Hazard Rate A little harder to conceptualize
Instantaneous failure rate or conditional failure rate Interpretation: probability that a person at time t experiences the event in the interval (x, x+Dx) given survival to time x.

Hazard Rate Relationship between h(x), S(x) and pdf (continuous):

Hazard Function Useful for conceptualizing how the chance of an event changes over time i.e. consider hazard ‘relative’ over time Examples: Treatment related mortality Early on, high risk of death Later on, risk of death decreases Aging Early on, low risk of death Later on, high risk of death

Shapes of Hazard Functions
Increasing Natural aging and wear Decreasing Early failures due to device or transplant failures Bathtub Populations followed from birth Hump Shaped Initial risk of event, followed by decreasing chance of event

Examples

R Code for Hazard Function Shapes
#Examples of hazard function shapes weibull.hazard<-function(x,alp,lam) { h<-alp*lam*x^(alp-1) return(h) } loglogistic.hazard<-function(x,alp,lam) { h<-alp*lam*x^(alp-1)/(1+lam*x^alp) x<-seq(0, 6, 0.05) h1<-weibull.hazard(x, 1.5, 0.25) plot(x, h1, type="l", lwd=2, ylab="Hazard Function", xlab="Time", ylim=c(0,1)) h2<-loglogistic.hazard(x, 0.5, 0.25) lines(x, h2, lwd=2, col=2) h3<-loglogistic.hazard(x, 2, 1) lines(x, h3, lwd=2, col=3) h4<-0.01*(x-3)^4 lines(x, h4, lwd=2, col=4)

Cumulative Hazard Function
Often used instead of the hazard function Relationship between H(x) and S(x) More on this later or model checking…

What if time is discrete?
So far we’ve focused on time X as a continuous r.v. Discrete x Interval censoring Grouping into intervals Depending on level of discreteness, use discrete data approach where p(xj) is a pmf (P(X = xj)).

Complications How can we use this to define our “discrete” hazard function?

Mean Residual Life Biomedical applications
Median is very common MRL is not common MRL = the expected residual life Theoretically, could be useful to predict survival times given survival to a certain point in time.

Mean We do not see the mean quantified very often in biomedical applications Why? Recall our censoring issue Empirical means depend on parametric model Means can only be ‘model-based’ Somewhat counterintuitive, especially when alternatives exist More common: median

Median Very/Most common way to express the ‘center’ of the distribution Rarely see another quantile expressed Find time x such that Complication: in some applications, median is not reached empirically Reported median based on model seems like an extrapolation Often just state ‘median not reached’ and given alternative point estimates

X-Year Survival Rate Many applications have ‘landmark’ times that historically used to quantify survival Examples: Breast cancer: 5 year relapse-free survival Pancreatic cancer: 6 month survival Acute myeloid leukemia (AML): 12 month relapse-free survival Solve for S(x) given x

Common Parametric Distributions
Course will focus on non-parametric and semi-parametric methods But… some parametrics can be useful Especially for trial design Note that power and precision are improved under parametric approaches versus others

Example 1: Exponential Recall the exponential distribution
f(x) = F(x) = What is S(x) based on F(x) and f(x) S(x) =

Example 1: Exponential What about H(x) and h(x)
l represents the failure rate per unit of time Large l, rapid decay Small l, slow decay

Example 1: Exponential

R Code for the Plot time<-seq(0, 60, 0.1) S1<-exp(-0.1*time) S2<-exp(-0.05*time) S3<-exp(-0.01*time) plot(time, S1, xlab="Time", ylab="Survival Function", col=3 , lwd=2, type="l") lines(time, S2, col=2 , lwd=2) lines(time, S3, col=4 , lwd=2) labs<-c(expression(paste(lambda, " = ",0.1, sep="")), expression(paste(lambda, " = ",0.05, sep="")), expression(paste(lambda, " = ",0.01, sep=""))) legend(x=45, y=.95, labs, col=c(3,2,4), lty=c(1,1,1), lwd=(2,2,2), cex=0.9)

Example: Kidney Infection after Catheterization
Kidney infection after catheter insertion in patients using portable dialysis equipment Time to event was time to catheter removal BUT should be noted that catheter can be removed for reasons other than infection (right censored) Only 76 observations (!) Time to infection is outcome of interest Question: can we describe it using a parametric approach?

Kidney Infection Example: Survival curve and 95% confidence intervals

Exponential Overly used due to simplicity One parameter
Recall: S(x) = e-lx So let’s revisit the hazard function:

Exponential Mean = Median =

Exponential MRL = “lack of memory” Realistic?

Exponential Recall the cumulative hazard function H(x)
For exponential: Plot of ln(H(x)) vs. ln(x) should be a straight line with: Slope = Intercept = Use to check model with non-parametric distribution of H(x)

Does Exponential Fit the Kidney Data?

R Code library(survival) surv.kid<-Surv(kidney$time, kidney$status) fit.kid<-survfit(surv.kid~1) exp.kid<-survreg(surv.kid~1, dist="exp") plot(fit.kid, xlab="Time", ylab="Survival Fraction") # summarize KM estimator to get median survival summary(fit.kid) names(fit.kid) # define log cumulative hazard and log time logHt<-log(-log(fit.kid$surv)) logt<-log(fit.kid$time) # Plot log cumulative hazard vs. log time plot(logt, logHt, lwd=2, type="l", xlab="log(t)", ylab="log(H(t))") points(logt, logHt, pch=16) # Add plot of x=y line. If exponential fits, should be parallel. abline(-exp.kid$coef, 1, lwd=2, col="red")

Exponential Another alternative model check
What about plotting –ln(S(x)) versus x? Should be a straight line with Slope = Intercept = Why would the previous be preferred? It can accommodate Weibull as we will see….

Another Exponential Check

More Model Checking We will build likelihood later
For now, accept that the MLE of l is Where di indicates whether the event is observed or censored for patient i, an ti is the event or censoring time Here: This implies a model such that S(x) =

Compare Fitted and Observed S(t)

What about specific survival time? Median survival? Mean survival?
Empirical: 200 day survival = 21.0% Median survival = 66 days Mean survival = ? Exponential Model: 200 day survival = S(200) = ? Median survival = ?

Weibull Generalization of the Exponential
VERY common for survival, but not always perfect Shape and Scale parameters: a and l Variable hazard Increasing Decreasing Constant (a = 1)

Weibull: Generalization of Exponential
Shape Parameter: a Scale Parameter: l Note: There are different parameterizations for the Weibull

Weibull Example

R Code for the Weibull Plot
#Weibull time<-seq(0,60, 0.1) S1<-exp(-0.05*time^.5) S2<-exp(-0.05*time^1) S3<-exp(-0.01*time^0.5) S4<-exp(-0.01*time^1) plot(time, S1, xlab="Time", ylab="Survival Function", col=2, lwd=2, type="l", ylim=c(0,1)) lines(time, S2, col=1, lwd=2) lines(time, S3, col=3, lwd=2) lines(time, S4, col=4, lwd=2) labs<-c(expression(paste(lambda, " = ",0.05, ", ", alpha, " = ",0.5, sep="")), expression(paste(lambda, " = ",0.05, ", ", alpha, " = ",1, sep="")), expression(paste(lambda, " = ",0.01, ", ", alpha, " = ",0.5, sep="")), expression(paste(lambda, " = ",0.01, ", ", alpha, " = ",1, sep=""))) legend(x=0, y=.25, labs, col=c(2,1,3,4), lty=1, lwd=2,cex=0.9)

Effect of Shape Parameter

Weibull Mean: Median: Model checking:
More later when we discuss likelihoods

Log-normal Just like it sounds If X ~ log-normal, then ln(X) ~ normal
Two parameters: m and s Survival function Median

Log-normal Log-normal can work well in medical applications (e.g. age of disease onset) Hazard is hump-shaped Critics think that decreasing hazard at later times is unrealistic

Log-logistic If X ~ log-logistic, then ln(X) ~ logistic
Logistic is similar to normal, but the survival function is easier to work with Hazard similar to Weibull, but more variable in shapes for hazard Monotone decreasing Hump-shaped

Log-logistic Survival Function: Hazard function: Median:

Gamma Generalization of exponential Not easy to work with
-The gamma distribution represents another generalization of the exponential distribution -If k =1 this follows a exponential distribution -Here although there is a solution for S(t), it involves the gamma integral -Note the hazard function is monotone increasing from – for k>1, decreasing from infinity for k<1 and approached lambda as t  infinity FOR BOTH CASES!

Cure Rate Distribution
Not in K & M Assumption: fraction of individuals never fail Violates assumption that S(∞) = 0 Useful for clinical trials in which A fraction of the patients are cured Event my never occur (e.g. cancer relapse)

Cure Rate Example 75% of women with early stage breast cancer are cured by treatment Remaining 25% of women relapse Assume exponential l = 0.05

Cure Rate Distribution
Mixture model: S(x) = p = S*(x) = -S(t_i) is a mixture between a survival distribution and a point mass at 1 -In the above equation, p (our point mass at 1) represents the proportion of individuals we expect to never fail -S(t) represents the overall survival -S*(t) represents survival among the proportion of individuals whom we expect will fail

Cure Rate: Breast cancer example

R Code par(mfrow=c(1,2)) t<-seq(0,1000,0.1) St<-0.25*exp(-0.05*t)+0.75 plot(t, St, xlim=c(0,60), ylim=c(0,1), type="l", lwd=2, xlab="Time(months)", ylab="Survival Fraction") plot(t, St, xlim=c(0,1000), ylim=c(0,1), type="l", lwd=2,

Competing Risks Used to be somewhat ignored Not so much anymore Idea:
Each subject can fail due to one of K causes (K > 1) Occurrence of one event precludes us from observing the other event Usually, quantity of interest is the cause specific hazard Overall hazard equals sum of each hazard

Example An investigator is looking at graft rejection in kidney transplant patients However… patients can also experience graft failure and death Treat graft failure and graft rejection events as censored observations Why is this a problem?

Assumptions Dependence structure between the ‘potential’ failure times
Identifiability dilemma: Can only observe one time per person so not testable We can not distinguish between independent and dependent competing risks

Useful Approaches Want to account for other causes
Adjust the denominator Compare rates of events Use measures of probabilities Crude: probability of event k allowing for all other risks Net: probability of event k if it is the ONLY risk Partial: probability of event k is one of a subset of risks acting in the population See K & M for more details

Lecture 2: Key Functions and Parametric Distributions

Similar presentations

Presentation on theme: "Lecture 2: Key Functions and Parametric Distributions"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 2: Key Functions and Parametric Distributions

Similar presentations

Presentation on theme: "Lecture 2: Key Functions and Parametric Distributions"— Presentation transcript:

Similar presentations

About project

Feedback