STT520-420: Biostatistics Analysis Dr. Cuixian Chen Chapter 1: Introduction to Survival Analysis
Survival Data survival time examples: time a cancer patient is in remission time til a disease-free person has a heart attack time til death of a healthy mouse time til a computer component fails time til a paroled prisoner gets rearrested time til death of a liver transplant patient time til a cell phone customer switches carrier time til recovery after surgery all are "time til some event occurs" - longer times are better in all but the last… STT520-420
Survival and hazard functions Now define a survival r.v. Y as a continuous r.v. taking its values in the interval from 0 to inf; i.e., its values are thought of as the lifetime or survival time = the time til death (or time til failure if we’re considering an inanimate object). So Y is a positive-valued r.v. with pdf f(y) and cdf F(y) and F(y)=P(Y≤y) STT520-420
Survival distribution Now define the survival (or reliability) function S(y) as S(y) = 1- F(y) = P(Y>y). In terms of the pdf, f(y), we have Note the following important properties of the survival function: S(0) = 1 S(inf) = 0 S(b) > S(a) for 0<b<a So the survival function is a monotone decreasing function on the interval from 0 to infinity (see Fig 1.1 p. 4) STT520-420
Survival function Note: the survival function is a monotone decreasing function on the interval from 0 to infinity (see Fig 1.1 p. 4) STT520-420
Summary to Survival function STT520-420
Three goals of survival analysis Estimate the survival function with SD. Compare survival functions (e.g., across levels of a categorical variable - treatment vs. placebo) Understand the relationship of the survival function to explanatory variables ( e.g., is survival time different for various values of an explanatory variable?) STT520-420
Empirical survival function The survival function S(y)=P(Y>y) can be estimated by the empirical survival function (ESF), which essentially gets the relative frequency of the number of Y’s > y… Y1, … ,Yn are i.i.d. (independent and identically distributed) survival variables. Then empirical survival function is given by where I is the indicator function… Q: Find ESF for data: 1, 3, 5, 8, 10. Q: Moreover, what is variance of Sn(y)? STT520-420
Review of Bernoulli & Binomial RVs: Z~ Bernoulli(p): in a trial, outcome={success, failure}, Prob(Success)=p. Or say: Z~ Bernoulli(p)=Binomial(1, p), where n=1. E(Z)=p; (i.e., P(Z=1)=p). V(Z)=p*q=p*(1-p). Recall: Sum of n iid (independent and identically distributed) Bernoullis is a Binomial rv with parameters n and p, show on the next slide that the empirical survivor function Sn(y) is an unbiased estimator of S(y) STT520-420
Expectation, Variance and confidence interval of Empirical survival function Note that and as such nSn has B(n,p) where p=P(Y>y)=S(y). Note that for a fixed y* so Sn is unbiased as an estimator of S What is the Var(Sn)? What is Confidence interval for Sn? (see 1.6 and on p.6 where the confidence interval is computed…) STT520-420
Empirical survival function Example 1.3, page 6 Placebo group: Steroid induced remission times (weeks): 1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23. Q: find (a) ; (b) find the 95% confidence interval for . STT520-420
Plot a empirical survivor function in R section 4.1, page 55-56: placebo<-c(1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23) placebo<-sort(placebo); a<-rle(placebo); values<-a$values; values ##distinct values from the observations length<-a$length; length ##replication for each distinct value f<-table(placebo); f #We need the fractions to plot the curve - so get the sample size first in n n=length(placebo) ; n #we want S(0)=1 surv1=1-cumsum(f)/n; surv2=c(1,surv1) ; surv2 #now let's plot this curve… use the type="s" to get a step function t=c(0, values) ; surv2 #t is the vector of x's and surv2 is the vector of y's plot(t,surv2,type="s",xlab="Remission Times",ylab="Relative Frequencies", col="blue", lwd=3) points(t,surv2, col="red", pch=18); STT520-420
To create the confidence bands for Example 1.3 > (cbind(t, surv2, low, upp)) t surv2 low upp 0 1.00000000 1.000000000 1.0000000 1 1 0.90476190 0.776649008 1.0328748 2 2 0.80952381 0.638145636 0.9809020 3 3 0.76190476 0.576019034 0.9477905 4 4 0.66666667 0.460928867 0.8724045 5 5 0.57142857 0.355448873 0.7874083 8 8 0.38095238 0.169010042 0.5928947 11 11 0.28571429 0.088552697 0.4828759 12 12 0.19047619 0.019098017 0.3618544 15 15 0.14285714 -0.009863567 0.2955779 17 17 0.09523810 -0.032874802 0.2233510 22 22 0.04761905 -0.045323816 0.1405619 23 23 0.00000000 0.000000000 0.0000000 low=surv2-2*sqrt(surv2*(1-surv2)/n) upp=surv2+2*sqrt(surv2*(1-surv2)/n) points(t, low, col="orange", lty=2) points(t, upp, col="orange", lty=3) lines(t, low, col="orange", lty=2, lwd=3) lines(t, upp, col="orange", lty=3, lwd=3) ## To print out the confidence intervals (cbind(t, surv2, low, upp)) STT520-420
Confidence bands for Example 1.3 STT520-420
Plot a empirical survivor function in R section 4.1, page 55-56: Assume we have sorted data: Starting at Sn(0)=1; STT520-420
How to compare survival functions Example 1.4 on page 8 shows that it is sometimes difficult to compare survival curves since they can cross each other… (what makes one survival curve “better” than another?): S1(y)=exp(-y/2) and S2(y)=exp(-y2/4)… STT520-420
How to compare survival functions One way of comparing two survival curves is by comparing their MTTF (mean time til failure) values. Let’s try to use R to draw the two curves given in Ex. 1.4: S1(y)=exp(-y/2) and S2(y)=exp(-y2/4)… # For example 1.4 on page 8. We plot two survival function in R: # First we need to evaluate the first exponential survival function at these values of x x=seq(0,10, by=0.001) y1=exp(-x/2) y2=exp(-x^2/4) plot(x,y1, col="blue") points(x,y2,col="red"); title("comparing two survival functions") STT520-420
Mean time to failure (MTTF) Note that the MTTF of a survival rv Y is just its expected value E(Y). We can also show (Theorem 1.2) that (Math & Stat majors: Show this is true using integration by parts and l’Hospital’s rule…!) STT520-420
Review: Exponential distribution Def 4.11: Y~Exp(β). Eg: The cdf of Exp(β)? Eg: What is the prob of Pr(Y>a), if Y~Exp(β)?
Mean time to failure (MTTF) Note that the MTTF of a survival rv Y is just its expected value E(Y). We can also show (Theorem 1.2) that (Math & Stat majors: Show this is true using integration by parts and l’Hospital’s rule…!) So suppose we have an exponential survival function: Q: Show that MTTF for this variable is . (Can you show this satisfies the properties of a survival function?) STT520-420
Mean time to failure (MTTF) For any two such survival functions, S1(y)=exp(-y/ and S2(y)=exp(-y/2). one is “better” than the other if the corresponding beta is “better”… HW, EX1: (Use R) plot on the same axes at least two such survival functions: S1(y)=exp(-y/ and S2(y)=exp(-y/2). with different values of beta (e.g: = 10; 2 = 5) and show this result: STT520-420
STT420-520 HW HW, EX2: For example 1.1, page 1: Calculate the empirical survival function and corresponding confidence intervals (Use R) ; Plot both the empirical survival function and its confidence intervals in the same figure… (Use R and see example code on the back) Estimate the probability of failure beyond 10 weeks. HW, EX3: See page 13, Exercise 1.3. STT520-420
Review: Exponential distribution In R: dexp(x, 1/β); pexp (x, 1/β); qexp (per, 1/β); rexp (N, 1/β). ## Note that in R, exponential distribution is defined in a different way than we used to have in ## STT315 class. set.seed(100) y=rexp(10000, 0.1) mean(y) ## beta= 10.07184, not 0.1!