STT : Biostatistics Analysis Dr. Cuixian Chen

Slides:



Advertisements
Similar presentations
Stats for Engineers Lecture 5
Advertisements

Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Continuous Random Variables. L. Wang, Department of Statistics University of South Carolina; Slide 2 Continuous Random Variable A continuous random variable.
Review.
1 Chap 5 Sums of Random Variables and Long-Term Averages Many problems involve the counting of number of occurrences of events, computation of arithmetic.
SUMS OF RANDOM VARIABLES Changfei Chen. Sums of Random Variables Let be a sequence of random variables, and let be their sum:
Evaluating Hypotheses
Continuous Random Variables and Probability Distributions
Week 51 Theorem For g: R  R If X is a discrete random variable then If X is a continuous random variable Proof: We proof it for the discrete case. Let.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Continuous Random Variables and Probability Distributions.
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
1 Exponential Distribution & Poisson Process Memorylessness & other exponential distribution properties; Poisson process and compound P.P.’s.
1 Performance Evaluation of Computer Systems By Behzad Akbari Tarbiat Modares University Spring 2009 Introduction to Probabilities: Discrete Random Variables.
Simulation Output Analysis
Topic 4 - Continuous distributions
Chapter 5 Statistical Models in Simulation
1 Lecture 4. 2 Random Variables (Discrete) Real-valued functions defined on a sample space are random vars. determined by outcome of experiment, we can.
Bayesian Analysis and Applications of A Cure Rate Model.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Convergence in Distribution
1 Lecture 13: Other Distributions: Weibull, Lognormal, Beta; Probability Plots Devore, Ch. 4.5 – 4.6.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Censoring an observation of a survival r.v. is censored if we don’t know the survival time exactly. usually there are 3 possible reasons for censoring.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
STA347 - week 31 Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5’s in the 6 rolls. Let X = number of.
Random Variables Presentation 6.. Random Variables A random variable assigns a number (or symbol) to each outcome of a random circumstance. A random variable.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
Topic 3 - Discrete distributions Basics of discrete distributions - pages Mean and variance of a discrete distribution - pages ,
Some Common Discrete Random Variables. Binomial Random Variables.
STT : BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 7: Parametric Survival Models under Censoring STT
Random Variables Example:
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
1 Probability and Statistical Inference (9th Edition) Chapter 5 (Part 2/2) Distributions of Functions of Random Variables November 25, 2015.
STT : BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 4: Data Plot STT
Chapter 4 Continuous Random Variables and Probability Distributions  Probability Density Functions.2 - Cumulative Distribution Functions and E Expected.
Survival Data survival time examples: –time a cancer patient is in remission –time til a disease-free person has a heart attack –time til death of a healthy.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
Chapter 4 Continuous Random Variables and Probability Distributions  Probability Density Functions.2 - Cumulative Distribution Functions and E Expected.
Copyright © Cengage Learning. All rights reserved. 4 Continuous Random Variables and Probability Distributions.
Sums of Random Variables and Long-Term Averages Sums of R.V. ‘s S n = X 1 + X X n of course.
Theoretical distributions: the Normal distribution.
STT : BIOSTATISTICS ANALYSIS Dr. Cuixian Chen
Expectations of Random Variables, Functions of Random Variables
MAT 446 Supplementary Note for Ch 3
Chapter 4 Continuous Random Variables and Probability Distributions
ECE 313 Probability with Engineering Applications Lecture 7
Comparing Cox Model with a Surviving Fraction with regular Cox model
ASV Chapters 1 - Sample Spaces and Probabilities
The Exponential and Gamma Distributions
Chapter 4 Continuous Random Variables and Probability Distributions
STATISTICAL INFERENCE
Continuous Random Variables
SOME IMPORTANT PROBABILITY DISTRIBUTIONS
Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband.
Sampling Distributions
3.1 Expectation Expectation Example
Review of survival models:
T305: Digital Communications
Functions of Random variables
Parametric Methods Berlin Chen, 2005 References:
Dept. of Electrical & Computer engineering
The hazard function The hazard function gives the so-called “instantaneous” risk of death (or failure) at time t, assuming survival up to time t. Estimate.
Further Topics on Random Variables: 1
Further Topics on Random Variables: Derived Distributions
Chapter 8 Estimation.
Berlin Chen Department of Computer Science & Information Engineering
Further Topics on Random Variables: Derived Distributions
Chapter 13: Chi-Square Procedures
Further Topics on Random Variables: Derived Distributions
Presentation transcript:

STT520-420: Biostatistics Analysis Dr. Cuixian Chen Chapter 1: Introduction to Survival Analysis

Survival Data survival time examples: time a cancer patient is in remission time til a disease-free person has a heart attack time til death of a healthy mouse time til a computer component fails time til a paroled prisoner gets rearrested time til death of a liver transplant patient time til a cell phone customer switches carrier time til recovery after surgery all are "time til some event occurs" - longer times are better in all but the last… STT520-420

Survival and hazard functions Now define a survival r.v. Y as a continuous r.v. taking its values in the interval from 0 to inf; i.e., its values are thought of as the lifetime or survival time = the time til death (or time til failure if we’re considering an inanimate object). So Y is a positive-valued r.v. with pdf f(y) and cdf F(y) and F(y)=P(Y≤y) STT520-420

Survival distribution Now define the survival (or reliability) function S(y) as S(y) = 1- F(y) = P(Y>y). In terms of the pdf, f(y), we have Note the following important properties of the survival function: S(0) = 1 S(inf) = 0 S(b) > S(a) for 0<b<a So the survival function is a monotone decreasing function on the interval from 0 to infinity (see Fig 1.1 p. 4) STT520-420

Survival function Note: the survival function is a monotone decreasing function on the interval from 0 to infinity (see Fig 1.1 p. 4) STT520-420

Summary to Survival function STT520-420

Three goals of survival analysis Estimate the survival function with SD. Compare survival functions (e.g., across levels of a categorical variable - treatment vs. placebo) Understand the relationship of the survival function to explanatory variables ( e.g., is survival time different for various values of an explanatory variable?) STT520-420

Empirical survival function The survival function S(y)=P(Y>y) can be estimated by the empirical survival function (ESF), which essentially gets the relative frequency of the number of Y’s > y… Y1, … ,Yn are i.i.d. (independent and identically distributed) survival variables. Then empirical survival function is given by where I is the indicator function… Q: Find ESF for data: 1, 3, 5, 8, 10. Q: Moreover, what is variance of Sn(y)? STT520-420

Review of Bernoulli & Binomial RVs: Z~ Bernoulli(p): in a trial, outcome={success, failure}, Prob(Success)=p. Or say: Z~ Bernoulli(p)=Binomial(1, p), where n=1. E(Z)=p;  (i.e., P(Z=1)=p). V(Z)=p*q=p*(1-p). Recall: Sum of n iid (independent and identically distributed) Bernoullis is a Binomial rv with parameters n and p, show on the next slide that the empirical survivor function Sn(y) is an unbiased estimator of S(y) STT520-420

Expectation, Variance and confidence interval of Empirical survival function Note that and as such nSn has B(n,p) where p=P(Y>y)=S(y). Note that for a fixed y* so Sn is unbiased as an estimator of S What is the Var(Sn)? What is Confidence interval for Sn? (see 1.6 and on p.6 where the confidence interval is computed…) STT520-420

Empirical survival function Example 1.3, page 6 Placebo group: Steroid induced remission times (weeks): 1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23. Q: find (a) ; (b) find the 95% confidence interval for . STT520-420

Plot a empirical survivor function in R section 4.1, page 55-56: placebo<-c(1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23) placebo<-sort(placebo); a<-rle(placebo); values<-a$values; values ##distinct values from the observations length<-a$length; length ##replication for each distinct value f<-table(placebo); f #We need the fractions to plot the curve - so get the sample size first in n n=length(placebo) ; n #we want S(0)=1 surv1=1-cumsum(f)/n; surv2=c(1,surv1) ; surv2 #now let's plot this curve… use the type="s" to get a step function t=c(0, values) ; surv2 #t is the vector of x's and surv2 is the vector of y's plot(t,surv2,type="s",xlab="Remission Times",ylab="Relative Frequencies", col="blue", lwd=3) points(t,surv2, col="red", pch=18); STT520-420

To create the confidence bands for Example 1.3 > (cbind(t, surv2, low, upp)) t surv2 low upp 0 1.00000000 1.000000000 1.0000000 1 1 0.90476190 0.776649008 1.0328748 2 2 0.80952381 0.638145636 0.9809020 3 3 0.76190476 0.576019034 0.9477905 4 4 0.66666667 0.460928867 0.8724045 5 5 0.57142857 0.355448873 0.7874083 8 8 0.38095238 0.169010042 0.5928947 11 11 0.28571429 0.088552697 0.4828759 12 12 0.19047619 0.019098017 0.3618544 15 15 0.14285714 -0.009863567 0.2955779 17 17 0.09523810 -0.032874802 0.2233510 22 22 0.04761905 -0.045323816 0.1405619 23 23 0.00000000 0.000000000 0.0000000 low=surv2-2*sqrt(surv2*(1-surv2)/n) upp=surv2+2*sqrt(surv2*(1-surv2)/n) points(t, low, col="orange", lty=2) points(t, upp, col="orange", lty=3) lines(t, low, col="orange", lty=2, lwd=3) lines(t, upp, col="orange", lty=3, lwd=3) ## To print out the confidence intervals (cbind(t, surv2, low, upp)) STT520-420

Confidence bands for Example 1.3 STT520-420

Plot a empirical survivor function in R section 4.1, page 55-56: Assume we have sorted data: Starting at Sn(0)=1; STT520-420

How to compare survival functions Example 1.4 on page 8 shows that it is sometimes difficult to compare survival curves since they can cross each other… (what makes one survival curve “better” than another?): S1(y)=exp(-y/2) and S2(y)=exp(-y2/4)… STT520-420

How to compare survival functions One way of comparing two survival curves is by comparing their MTTF (mean time til failure) values. Let’s try to use R to draw the two curves given in Ex. 1.4: S1(y)=exp(-y/2) and S2(y)=exp(-y2/4)… # For example 1.4 on page 8. We plot two survival function in R: # First we need to evaluate the first exponential survival function at these values of x x=seq(0,10, by=0.001) y1=exp(-x/2) y2=exp(-x^2/4) plot(x,y1, col="blue") points(x,y2,col="red"); title("comparing two survival functions") STT520-420

Mean time to failure (MTTF) Note that the MTTF of a survival rv Y is just its expected value E(Y). We can also show (Theorem 1.2) that (Math & Stat majors: Show this is true using integration by parts and l’Hospital’s rule…!) STT520-420

Review: Exponential distribution Def 4.11: Y~Exp(β). Eg: The cdf of Exp(β)? Eg: What is the prob of Pr(Y>a), if Y~Exp(β)?

Mean time to failure (MTTF) Note that the MTTF of a survival rv Y is just its expected value E(Y). We can also show (Theorem 1.2) that (Math & Stat majors: Show this is true using integration by parts and l’Hospital’s rule…!) So suppose we have an exponential survival function: Q: Show that MTTF for this variable is . (Can you show this satisfies the properties of a survival function?) STT520-420

Mean time to failure (MTTF) For any two such survival functions, S1(y)=exp(-y/ and S2(y)=exp(-y/2). one is “better” than the other if the corresponding beta is “better”… HW, EX1: (Use R) plot on the same axes at least two such survival functions: S1(y)=exp(-y/ and S2(y)=exp(-y/2). with different values of beta (e.g:  = 10; 2 = 5) and show this result: STT520-420

STT420-520 HW HW, EX2: For example 1.1, page 1: Calculate the empirical survival function and corresponding confidence intervals (Use R) ; Plot both the empirical survival function and its confidence intervals in the same figure… (Use R and see example code on the back) Estimate the probability of failure beyond 10 weeks. HW, EX3: See page 13, Exercise 1.3. STT520-420

Review: Exponential distribution In R: dexp(x, 1/β); pexp (x, 1/β); qexp (per, 1/β); rexp (N, 1/β). ## Note that in R, exponential distribution is defined in a different way than we used to have in ## STT315 class. set.seed(100) y=rexp(10000, 0.1) mean(y) ## beta= 10.07184, not 0.1!