STT : BIOSTATISTICS ANALYSIS Dr. Cuixian Chen

Slides:



Advertisements
Similar presentations
Exponential Distribution. = mean interval between consequent events = rate = mean number of counts in the unit interval > 0 X = distance between events.
Advertisements

Sampling Distributions (§ )
Time-Dependent Failure Models
CONTINUOUS RANDOM VARIABLES These are used to define probability models for continuous scale measurements, e.g. distance, weight, time For a large data.
Continuous Random Variables Chap. 12. COMP 5340/6340 Continuous Random Variables2 Preamble Continuous probability distribution are not related to specific.
Introduction Before… Next…
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Continuous Random Variables and Probability Distributions.
Lecture II-2: Probability Review
Chapter 3 Basic Concepts in Statistics and Probability
Random Variables & Probability Distributions Outcomes of experiments are, in part, random E.g. Let X 7 be the gender of the 7 th randomly selected student.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Normal distribution and intro to continuous probability density functions...
1 Lecture 13: Other Distributions: Weibull, Lognormal, Beta; Probability Plots Devore, Ch. 4.5 – 4.6.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
Random Variables Presentation 6.. Random Variables A random variable assigns a number (or symbol) to each outcome of a random circumstance. A random variable.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Sampling and estimation Petter Mostad
STT : BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 4: Data Plot STT
Chapter 4 Continuous Random Variables and Probability Distributions  Probability Density Functions.2 - Cumulative Distribution Functions and E Expected.
Chapter 4 Continuous Random Variables and Probability Distributions  Probability Density Functions.2 - Cumulative Distribution Functions and E Expected.
Random Variables By: 1.
Sampling Distribution of the Sample Mean
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005
Sampling Distributions Chapter 18
The normal distribution
Modeling and Simulation CS 313
Expectations of Random Variables, Functions of Random Variables
Introduction to Probability - III John Rundle Econophysics PHYS 250
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Estimating the Value of a Parameter Using Confidence Intervals
Chapter 4 Continuous Random Variables and Probability Distributions
ASV Chapters 1 - Sample Spaces and Probabilities
STT : Biostatistics Analysis Dr. Cuixian Chen
The Exponential and Gamma Distributions
Chapter 4 Continuous Random Variables and Probability Distributions
Confidence intervals for m when s is unknown
Hypotheses and test procedures
Other confidence intervals
Continuous Random Variables
Modeling and Simulation CS 313
STAT 206: Chapter 6 Normal Distribution.
Chapter 5 Sampling Distributions
Chapter 7: Sampling Distributions
Lecture 13 Sections 5.4 – 5.6 Objectives:
Chapter 5 Sampling Distributions
Introduction to Probability and Statistics
Review of survival models:
Chapter 5 Sampling Distributions
The normal distribution
Continuous distributions
Continuous Statistical Distributions: A Practical Guide for Detection, Description and Sense Making Unit 3.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Sampling Distributions (§ )
The hazard function The hazard function gives the so-called “instantaneous” risk of death (or failure) at time t, assuming survival up to time t. Estimate.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
12/12/ A Binomial Random Variables.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Binomial and Geometric Distributions
Chapter 5: Sampling Distributions
Presentation transcript:

STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 2: Hazard Model

Review: The hazard function The hazard function gives the so-called “instantaneous” risk of death (or failure) at time t, assuming survival up to time t. Estimate h(t) by the quotient STT520-420

Review: The hazard function The hazard function is also called the instantaneous failure rate or force of mortality or conditional mortality rate or age-specific failure rate. In a real sense it gives the risk of failure (death) per unit time over the progress of aging. Hazard functions can be flat, increasing, decreasing, or more complex… STT520-420

Examples of Hazard functions Hazard functions can be flat, increasing, decreasing, or more complex… How do you think hazard function for human beings? STT520-420

Constant Hazard model Consider a simple hazard function, the constant hazard h(y)= for all y≥0. Here we assume , where 0. We have seen that so if we evaluate this for h(y)= , we get Since f(y)=-d(S(y))/dy, we have the exponential probability density with parameter . This means the expected value is  and the variance is 2. STT520-420

Example 1 Identify the distribution which has the following hazard function: h(y)=4. Then Y ~ ? STT520-420

Constant Hazard model Definition 2.1 writes Y as Y ~ exp( with h(y)=1/ and notes that this is one of the most commonly used models for lifetime distributions. One reason is because of the “memoryless property” of the exponential distribution (given on page 20). Theorem 2.1: If Y ~ exp( then for any y>0 and t>0 we have P(Y>y+t | Y>y)=P(Y>t) Memoryless property: given survival past y, the conditional probability of surviving an additional t is the same as the unconditional probability of surviving t. Thus there is no “aging” with an increased risk of dying… STT520-420

Constant Hazard model Go over Example 2.1 to see a picture of exponential data which would then have a constant hazard (of value =1/mean(Y)). STT520-420

Constant Hazard model Note from Theorem 2.2 that if Y ~ exp( , Y/ ~ exp(1). This tells us that if we multiply an exponential survival variable by a constant, the MTTF is correspondingly multiplied by the same constant. STT520-420

Review: Exponential distribution From STT315: In R: dexp(x, 1/β); pexp (x, 1/β); qexp (per, 1/β); rexp (N, 1/β). ## Note that in R, exponential distribution is defined in a different way than we used to have in STT315 class. set.seed(100) y=rexp(10000, 0.1) mean(y) ## beta= 10.07184, not 0.1!

Exponential Prob Plot for Constant Hazard model How do we decided whether a set of survival data is following the exponential distribution? That the hazard is constant? Look over Example 2.1: 200 randomly generated exponential variables with mean=100. Characteristic skewed distribution, sample mean=107.5, sample s.d.=106.1; (Recall that if Y~exp( then E(Y)=SD(Y)= . ) The sample stemplot and the sample mean and sd approximate the true shape, center and spread of the exponential. The estimated hazards (rightmost column) approx. .01 (1/100) - constant - see the formula on p.22 for getting these values… STT520-420

Constant Hazard model: Example 2.1, page 21 The decimal point is 1 digit(s) to the right of the | 0 | 001234555667799011344455999 2 | 01133455566777888901222344455677889 4 | 0122334455566667779013688 6 | 000012346666679900124689 8 | 334445883467 10 | 0082445556779 12 | 2358880125 14 | 0144467 16 | 12457868 18 | 13493 20 | 17792369 22 | 1934446 24 | 2113 26 | 05839 28 | 77 30 | 66 32 | 4 34 | 0 36 | 6 38 | 3 40 | 42 | 44 | 78 set.seed(100) y=rexp(10000, 0.1) mean(y) ## beta= 10.07184, not 0.1! ############################ # random generation for the exponential distribution with rate rate (i.e., mean=1/rate). # rexp(n, rate) # From Example 2.1, we first generate 200 exponential varaibles with mean=100. x<-rexp(n=200,rate=1/100); mean(x); # mean sqrt(var(x)); #standard deviation # do a stemplot of 200 randomly generated exponential variables with beta=100 stem(x); STT520-420

Power Hazard model Power Hazard: Note this is of the form (constant)yconstant and if 1 this reduces to the constant hazard we just considered. Note that and so STT520-420

Weibull Distribution Def 2.2: The lifetime variable Y follows a Weibull probability model with parameters >0 and >0: Y ~ Weibull(, iff the density function of Y is given by: The mean and variance of Y are given in Theorem 2.5 in terms of the gamma function The survival function: STT520-420

Power Hazard model For Y ~ Weibull(, the survival function is: and so Power Hazard: Note this is of the form (constant)yconstant. If 1 this reduces to the constant hazard with Exponential model. If >1, the Weibull hazard function is increasing; If <1, the Weibull hazard function is decreasing; STT520-420

Power Hazard model Note that if 2, the hazard is linear in y: Go over Example 2.2 on pages 24-25. Y ~Weibull(2,sqrt(3)). Use R (gamma()) to compute the values of the Gamma function… Also in R, shape=alpha and scale=beta in qweibull. Rayleigh hazard model: For k=2, the coefficients can be chosen so that h(y) follows a quadratic curve or “bathtub” shape (with h(y)>0 for y>0). STT520-420

Weibull Distribution STT520-420 http://en.wikipedia.org/wiki/Weibull_distribution STT520-420

Power Hazard model STT520-420 The decimal point is 1 digit(s) to the left of the | 0 | 348889011567889 2 | 0111222333455577888889900001133344555667778888 4 | 0113334445556677788899990001112244455556777799 6 | 00001111122222222333445555667777777889990001112222222222333344444455+2 8 | 00000111111122333344445556666677777778888889999001111122333333444444+15 10 | 00001222233333333445555555666666778888899900000001111111122222334444+10 12 | 00000000011111122222223344444445555555566666777778888889999999999990+28 14 | 00112222333334444455555666666788889900000011111222222223333333334444+5 16 | 00000111222222233344444556666777777888888999900011111222223333344444+18 18 | 00000001111222222233333344445566666666777778888888999900011122234445+2 20 | 0112222233333344455556677789900000011233444444566666777778889999 22 | 01122233334444455666689000123334444455556667788999 24 | 00344555566667777778990012223333467788899 26 | 000112223455667777880111235667789 28 | 12223566678011113377899 30 | 000223344456888 32 | 55688904468 34 | 06899335 36 | 0228 38 | 5 40 | 42 | 4 44 | 46 | 2 48 | 4 STT520-420

Example 2.2 on pages 24-25 Introduction to R #4, Biostatistics Analysis #To plot any function when we know the closed form (y=f(x)) we have to #first create a set of x-values on which to evaluate the function. #To do this we use the seq function in R. E.g., to plot the Weibull(2,sqrt(3)) #density function, create a vector of x's from say 0 to 10 in steps of .05 x=seq(0,10,.05) #Now evaluate the Weibull density function at these x's. Recall that the #pdf's in R begin with the letter "d", the cdf's begin with "p", the quantiles #begin with "q" and the random numbers begin with "r". So we use the "dweibull" #function evaluated at our vector x and then plot the (x,y) pairs. y=dweibull(x,shape=2, scale=sqrt(3)) plot(x,y,type="l",main="Weibull pdf") #The mean and variance of such a random variable are given in the formulas #on page 24 in Theorem 2.5. Use the gamma function in R to compute them for #this particular Weibull density set.seed(100) z=rweibull(1000,shape=2,scale=sqrt(3)); mean(z); # sample mean var(z); # sample variance sqrt(var(z)); # sample standard variation, or use sd(z); mean.Weibull = sqrt(3)*gamma(1+1/2) #the population mean = expectation mean.Weibull; Var.Weibull = (sqrt(3)^2)*(gamma(1+2/2)-(gamma(1+1/2))^2) # the population varaince Var.Weibull; #the variance - take sqrt to get the standard deviation sd.Weibull = sqrt(Var.Weibull); # the population standard variation sd.Weibull; ##To get the asked for probability in Example 2.2 we use the cdf #Prob(Y>=4 | Y>=2) = P(Y>=4 and Y>=2)/(P(Y>=2)) = P(Y>=4)/P(Y>=2) (1-pweibull(4,shape=2,scale=sqrt(3)))/(1-pweibull(2,shape=2,scale=sqrt(3))) #Now check the simulated data on page 25… let's create our own randomly #generated set of 200 and do a histogram instead of a stemplot… #In order to make sure we use the same set of randomly generated data, #save it first in a vector called w w=rweibull(1000,shape=2,scale=sqrt(3)) stem(w) ; hist(w) #To get a count of the frequencies in each interval use the plot=F #and then use these frequencies to estimate the hazard function hist(w,plot=F) STT520-420

Exponential Hazard model Note this is of the form (constant)econstant. An exponential hazard model occurs frequently in actuarial science for modeling human lifetime. The Gompertz survival model is: Or rewrite as Note that STT520-420

Exponential Hazard model Note this is of the form (constant)econstant. An exponential hazard model occurs frequently in actuarial science for modeling human lifetime. The Gompertz survival model is: Or rewrite as STT520-420

Gompertz Survival Model   GAmpertz Gompertz distribution is often applied to describe distribution of adult lifespans by demographers[1][2] and actuaries.[3][4] STT520-420

Extreme Survival Model   STT520-420

Compare Gompertz survival model and Extreme survival model Compare the Gompertz survival model and Extreme survival model: Extra exponent term in first survival function is the normalizing factor to ensure the area under the corresponding density is 1. STT520-420

Power Hazard model Recall: Y ~ Weibull(1,exp(and if >1 the hazard function is increasing; if 1, the hazard is decreasing; if 1 then the hazard is constant (as in the exponential survival case) Note that if 2, the hazard is linear in y: STT520-420

Example 2 Page 31, EX2.1: Identify the distribution which has the following hazard function: e) for y>0. Then Y ~ ? f) for . Then Y ~ ? STT520-420