HST412: SURVIVAL MODELS Chipepa Fastel.

Slides:



Advertisements
Similar presentations
Surviving Survival Analysis
Advertisements

Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
Survival Analysis-1 In Survival Analysis the outcome of interest is time to an event In Survival Analysis the outcome of interest is time to an event The.
Survival Analysis. Statistical methods for analyzing longitudinal data on the occurrence of events. Events may include death, injury, onset of illness,
Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
Departments of Medicine and Biostatistics
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
بسم الله الرحمن الرحیم. Generally,survival analysis is a collection of statistical procedures for data analysis for which the outcome variable of.
Intermediate methods in observational epidemiology 2008 Instructor: Moyses Szklo Measures of Disease Frequency.
Biostatistics in Research Practice Time to event data Martin Bland Professor of Health Statistics University of York
Introduction to Survival Analysis
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Measures of disease frequency (I). MEASURES OF DISEASE FREQUENCY Absolute measures of disease frequency: –Incidence –Prevalence –Odds Measures of association:
EVIDENCE BASED MEDICINE
Sample Size Determination Ziad Taib March 7, 2014.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Introduction to Survival Analysis August 3 and 5, 2004.
Inference for regression - Simple linear regression
HSTAT1101: 27. oktober 2004 Odd Aalen
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
NASSER DAVARZANI DEPARTMENT OF KNOWLEDGE ENGINEERING MAASTRICHT UNIVERSITY, 6200 MAASTRICHT, THE NETHERLANDS 22 OCTOBER 2012 Introduction to Survival Analysis.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
INTRODUCTION TO SURVIVAL ANALYSIS
Applied Epidemiologic Analysis Fall 2002 Patricia Cohen, Ph.D. Henian Chen, M.D., Ph. D. Teaching Assistants Julie KranickSylvia Taylor Chelsea MorroniJudith.
HSRP 734: Advanced Statistical Methods July 17, 2008.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 9 Survival Analysis Henian Chen, M.D., Ph.D.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
12/20091 EPI 5240: Introduction to Epidemiology Incidence and survival December 7, 2009 Dr. N. Birkett, Department of Epidemiology & Community Medicine,
Satistics 2621 Statistics 262: Intermediate Biostatistics Jonathan Taylor and Kristin Cobb April 20, 2004: Introduction to Survival Analysis.
01/20151 EPI 5344: Survival Analysis in Epidemiology Hazard March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
SURVIVAL ANALYSIS PRESENTED BY: DR SANJAYA KUMAR SAHOO PGT,AIIH&PH,KOLKATA.
DURATION ANALYSIS Eva Hromádková, Applied Econometrics JEM007, IES Lecture 9.
Carolinas Medical Center, Charlotte, NC Website:
Bootstrap and Model Validation
Inference for a Single Population Proportion (p)
Chapter Nine Hypothesis Testing.
CHAPTER 9 Testing a Claim
An introduction to Survival analysis and Applications to Predicting Recidivism Rebecca S. Frazier, PhD JBS International.
April 18 Intro to survival analysis Le 11.1 – 11.2
Survival Analysis Rick Chappell, Ph.D. Professor,
Survival curves We know how to compute survival curves if everyone reaches the endpoint so there is no “censored” data. Survival at t = S(t) = number still.
Sampling Distributions and Estimation
Relative Values.
Overview What is survival analysis? Terminology and data structure.
Chapter 5 Sampling Distributions
Chapter 8: Inference for Proportions
CHAPTER 18 SURVIVAL ANALYSIS Damodar Gujarati
Some Epidemiological Studies
Lecture 1: Fundamentals of epidemiologic study design and analysis
Statistics 103 Monday, July 10, 2017.
Multiple logistic regression
Survival Analysis {Chapter 12}
Descriptive and inferential statistics. Confidence interval
CHAPTER 9 Testing a Claim
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Measures of Disease Occurrence
CHAPTER 12 More About Regression
CHAPTER 9 Testing a Claim
Interpreting Epidemiologic Results.
CHAPTER 9 Testing a Claim
Objectives 6.1 Estimating with confidence Statistical confidence
Objectives 6.1 Estimating with confidence Statistical confidence
Presentation transcript:

HST412: SURVIVAL MODELS Chipepa Fastel

Outline Survival functions, hazard rates, types of censoring and truncation. Life tables, Kaplan-Meier plots, log-rank tests, Cox regression models, inference for parametric regression models. Survival models and the life table; Describe the future lifetime as a random variable Define probabilities of death and survival, Define the actuarial functions tpx, tqx, n/mqx, Define the complete and curtate expectations of future lifetime, Describe the life table functions lx and dx, Describe the simple laws of mortality, Define simple assurance and annuity contracts and develop formulae for means and variances. Estimating the lifetime distribution Fx(t); Describe how lifetime data might be censored,

Outline Describe the estimation of empirical survival function, Describe the Kaplan- Meier estimate of the survival function in the presence of censoring, Describe the Nelson-Aalen estimate of the cumulative hazard rate in the presence of censoring, compute it from typical data and estimate its variance. The Cox regression model; Describe the Cox model for proportional hazards and derive the partial likelihood estimate. The two state Markov model; Describe the two state model of a single decrement and compare the assumptions with those of the random lifetime, Derive the MLE for the transition intensities in models of transfers between states with piecewise constant transition intensities, Define waiting time in a state. The general Markov model; Describe the statistical models of transfers between multiple states, State the assumptions underlying the Markov model of transfers between a finite number of states in continuous time,

Outline Binomial and Poisson Models of Mortality; Describe the Binomial model of mortality, derive a maximum likelihood estimator for the probability of death and compare the Binomial model with the multiple state models Graduation and statistical tests; Describe how to test crude estimates for consistency with standard table or a set of graduated estimates, and describe the process. Methods of graduation; Describe the process of graduation by the three common methods and state the advantages and disadvantages of each. Exposed to risk; Define initial and central exposed to risk, and the various common rate intervals, Calculate the central exposed to risk in simple cases. State the principle of correspondence.

Overview What is survival analysis? Terminology and data structure. Survival/hazard functions. Parametric versus semi-parametric regression techniques. Introduction to Kaplan-Meier methods (non-parametric). Relevant SAS Procedures (PROCS).

Early example of survival analysis, 1669 Christiaan Huygens' 1669 curve showing how many out of 100 people survive until 86 years. From: Howard Wainer­ STATISTICAL GRAPHICS: Mapping the Pathways of Science. Annual Review of Psychology. Vol. 52: 305-335.

Early example of survival analysis Roughly, what shape is this function? What was a person’s chance of surviving past 20? Past 36? This is survival analysis! We are trying to estimate this curve—only the outcome can be any binary event, not just death.

What is survival analysis? Statistical methods for analyzing longitudinal data on the occurrence of events. Events may include death, injury, onset of illness, recovery from illness (binary variables) or transition above or below the clinical threshold of a meaningful continuous variable (e.g. CD4 counts). Accommodates data from randomized clinical trial or cohort study design.

Randomized Clinical Trial (RCT) Intervention Control Disease Random assignment Disease-free Target population Disease-free, at-risk cohort Disease Disease-free TIME

Randomized Clinical Trial (RCT) Treatment Control Cured Random assignment Not cured Target population Patient population Cured Not cured TIME

Randomized Clinical Trial (RCT) Treatment Control Dead Random assignment Alive Target population Patient population Dead Alive TIME

Cohort study (prospective/retrospective) Disease Exposed Disease-free Target population Disease-free cohort Disease Unexposed Disease-free TIME

Examples of survival analysis in medicine

RCT: Women’s Health Initiative (JAMA, 2002) On hormones On placebo Cumulative incidence Women’s Health Initiative Writing Group. JAMA. 2002;288:321-333.

WHI and low-fat diet… Control Low-fat diet Prentice et al. JAMA, February 8, 2006; 295: 629 - 642.

Retrospective cohort study: From December 2003 BMJ: Aspirin, ibuprofen, and mortality after myocardial infarction: retrospective cohort study Curits et al. BMJ  2003;327:1322-1323.

Objectives of survival analysis Estimate time-to-event for a group of individuals, such as time until second heart-attack for a group of MI patients. To compare time-to-event between two or more groups, such as treated vs. placebo MI patients in a randomized controlled trial. To assess the relationship of co-variables to time-to-event, such as: does weight, insulin resistance, or cholesterol influence survival time of MI patients? Note: expected time-to-event = 1/incidence rate

Why use survival analysis? 1. Why not compare mean time-to-event between your groups using a t-test or linear regression? -- ignores censoring 2. Why not compare proportion of events in your groups using risk/odds ratios or logistic regression? --ignores time 1. If no censoring (everyone followed to outcome-of-interest) than ttest on mean or median time to event is fine. 2. If time at-risk was the same for everyone, could just use proportions.

Survival Analysis: Terms Time-to-event: The time from entry into a study until a subject has a particular outcome Censoring: Subjects are said to be censored if they are lost to follow up or drop out of the study, or if the study ends before they die or have an outcome of interest. They are counted as alive or disease-free for the time they were enrolled in the study. If dropout is related to both outcome and treatment, dropouts may bias the results PhD candidates who are most likely to take longest may be most likely to drop out, thereby biasing results.

Data Structure: survival analysis Two-variable outcome : Time variable: ti = time at last disease-free observation or time at event Censoring variable: ci =1 if had the event; ci =0 no event by time ti

CENSORING Different types Right Left Interval Each leads to a different likelihood function Most common is right censored

Right censored data “Type I censoring” Event is observed if it occurs before some prespecified time Mouse study Clock starts: at first day of treatment Clock ends: at death Always be thinking about ‘the clock’

Simple example: Type I censoring Time 0

Introduce “administrative” censoring Time 0 STUDY END

Introduce “administrative” censoring Time 0 STUDY END

More realistic: clinical trial “Generalized Type I censoring” Time 0 STUDY END

More realistic: clinical trial “Generalized Type I censoring” Time 0 STUDY END

Additional issues Patient drop-out Loss to follow-up

Drop-out or LTFU Time 0 STUDY END

How do we ‘treat” the data? Shift everything so each patient time represents time on study Time of enrollment

Another type of censoring: Competing Risks Patient can have either event of interest or another event prior to it Event types ‘compete’ with one another Example of competers: Death from lung cancer Death from heart disease Common issue not commonly addressed, but gaining more recognition

Left Censoring The event has occurred prior to the start of the study OR the true survival time is less than the person’s observed survival time We know the event occurred, but unsure when prior to observation In this kind of study, exact time would be known if it occurred after the study started Example: Survey question: when did you first smoke? Alzheimers disease: onset generally hard to determine HPV: infection time

Interval censoring Due to discrete observation times, actual times not observed Example: progression-free survival Progression of cancer defined by change in tumor size Measure in 3-6 month intervals If increase occurs, it is known to be within interval, but not exactly when. Times are biased to longer values Challenging issue when intervals are long

Key components Event: must have clear definition of what constitutes the ‘event’ Death Disease Recurrence Response Need to know when the clock starts Age at event? Time from study initiation? Time from randomization? time since response? Can event occur more than once?

Introduction to survival distributions Ti the event time for an individual, is a random variable having a probability distribution. Different models for survival data are distinguished by different choice of distribution for Ti.

Describing Survival Distributions Parametric survival analysis is based on so-called “Waiting Time” distributions (ex: exponential probability distribution). The idea is this: Assume that times-to-event for individuals in your dataset follow a continuous probability distribution (which we may or may not be able to pin down mathematically). For all possible times Ti after baseline, there is a certain probability that an individual will have an event at exactly time Ti. For example, human beings have a certain probability of dying at ages 3, 25, 80, and 140: P(T=3), P(T=25), P(T=80), P(T=140). These probabilities are obviously vastly different.

Probability density function: f(t) In the case of human longevity, Ti is unlikely to follow a normal distribution, because the probability of death is not highest in the middle ages, but at the beginning and end of life. Hypothetical data: People have a high chance of dying in their 70’s and 80’s; BUT they have a smaller chance of dying in their 90’s and 100’s, because few people make it long enough to die at these ages.

Probability density function: f(t) The probability of the failure time occurring at exactly time t (out of the whole range of possible t’s).

Survival function: 1-F(t) The goal of survival analysis is to estimate and compare survival experiences of different groups. Survival experience is described by the cumulative survival function: F(t) is the CDF of f(t), and is “more interesting” than f(t). Example: If t=100 years, S(t=100) = probability of surviving beyond 100 years.

Cumulative survival Recall pdf: Same hypothetical data, plotted as cumulative distribution rather than density: Recall pdf:

Cumulative survival P(T>20) P(T>80)

Hazard Function: new concept AGES Hazard rate is an instantaneous incidence rate.

Hazard Function A little harder to conceptualize Instantaneous failure rate or conditional failure rate Interpretation: approximate probability that a person at time t experiences the event in the next instant. Only constraint: h(t)0 For continuous time,

Hazard Function Treatment related mortality Aging Useful for conceptualizing how chance of event changes over time That is, consider hazard ‘relative’ over time Examples: Treatment related mortality Early on, high risk of death Later on, risk of death decreases Aging Early on, low risk of death Later on, higher risk of death

Shapes of hazard functions Increasing Natural aging and wear Decreasing Early failures due to device or transplant failures Bathtub Populations followed from birth Hump-shaped Initial risk of event, followed by decreasing chance of event

Examples

Median Very/most common way to express the ‘center’ of the distribution Rarely see another quantile expressed Find t such that Complication: in some applications, median is not reached empirically Reported median based on model seems like an extrapolation Often just state ‘median not reached’ and give alternative point estimate.

X-year survival rate Many applications have ‘landmark’ times that historically used to quantify survival Examples: Breast cancer: 5 year relapse-free survival Pancreatic cancer: 6 month survival Acute myeloid leukemia (AML): 12 month relapse-free survival Solve for S(t) given t

Hazard vs. density This is subtle, but the idea is: When you are born, you have a certain probability of dying at any age; that’s the probability density (think: marginal probability) Example: a woman born today has, say, a 1% chance of dying at 80 years. However, as you survive for awhile, your probabilities keep changing (think: conditional probability) Example, a woman who is 79 today has, say, a 5% chance of dying at 80 years.

A possible set of probability density, failure, survival, and hazard functions. f(t)=density function F(t)=cumulative failure h(t)=hazard function S(t)=cumulative survival

A probability density we all know: the normal distribution What do you think the hazard looks like for a normal distribution? Think of a concrete example. Suppose that times to complete the midterm exam follow a normal curve. What’s your probability of finishing at any given time given that you’re still working on it?

f(t), F(t), S(t), and h(t) for different normal distributions:

Examples: common functions to describe survival Exponential (hazard is constant over time, simplest!) Weibull (hazard function is increasing or decreasing over time)

 f(t), F(t), S(t), and h(t) for different exponential distributions:

f(t), F(t), S(t), and h(t) for different Weibull distributions: Parameters of the Weibull distribution

Exponential Constant hazard function: Exponential density function: Survival function:

With numbers… Why isn’t the cumulative probability of survival just 90% (rate of .01 for 10 years = 10% loss)? Incidence rate (constant). Probability of developing disease at year 10. Probability of surviving past year 10. (cumulative risk through year 10 is 9.5%)

Example… Recall this graphic. Does it look Normal, Weibull, exponential?

Example… One way to describe the survival distribution here is: P(T>76)=.01 P(T>36) = .16 P(T>20)=.20, etc.

Example… Or, more compactly, try to describe this as an exponential probability function—since that is how it is drawn! Recall the exponential probability distribution: If T ~ exp (h), then P(T=t) = he-ht Where h is a constant rate. Here: Event time, T ~ exp (Rate)

Example… To get from the instantaneous probability (density), P(T=t) = he-ht, to a cumulative probability of death, integrate: Area to the left Area to the right

Example… Solve for h:

Example… This is a “parametric” survivor function, since we’ve estimated the parameter h.

Hazard rates could also change over time… Example: Hazard rate increases linearly with time.

Relating these functions (a little calculus just for fun…): If you know one, you can derive all the others. We saw special case of 2 and 3 with exponential waiting times.

Getting density from hazard… Example: Hazard rate increases linearly with time.

Getting survival from hazard…

Methods to estimate distribution of survival times Nonparametric methods to estimate the distribution of survival times (both Kaplan-Meier and life table methods) Parametric models – Weibull model, Exponential model and Lognormal model Semi-parametric model – Cox proportional hazards model

Objectives To understand how to describe survival times To understand how to choose a survival analysis model

Survival Data (1) Example one: Four Liver Cancer Patients Mike 1/2/02 Date of Diagnosis Endpoint Date of Death or Censoring Survival Time (Day) Treatment Mike 1/2/02 Dead 9/1/02 242 A Kathy 4/7/02 7/8/02 92 Tom 3/3/02 Alive 11/4/02 246+ B Susan 2/4/02 11/3/02 272 Complete data (noncensored data): survival time = 242, 92, 272 Incomplete data (censored data): survival time = 246+ for Tom The survival time for Tom will exceed 246 days, but we don’t know the exact survival time for Tom.

Survival Data (2) Right-Censored Data: Subjects observed to be event-free to a certain time beyond which their status is unknown 1. Subjects sometimes withdraw from a study, or die from other causes (diseases). 2. The study is completed before the endpoint is reached.   Methods for survival analysis must account for both censored and noncensored data.

Survival Data (3) Survival analysis assumes censoring is random. Censoring times vary across individuals and are not under the control of the investigator. Random censoring also includes designs in which observation ends at the same time for all individuals, but begins at different times.

Survival Data (4) Example two: Researchers treated 65 multiple myeloma patients with alkylating agents. Of those patients, 48 died during the study and 17 survived. The goal of this study is to identify important prognostic factors.   TIME survival time in months from diagnosis STATUS 1 = dead, 0 = alive (censored) LOGBUN log blood urea nitrogen (BUN) at diagnosis HGB hemoglobin at diagnosis PLATELET platelets at diagnosis: 0 = abnormal, 1 = normal AGE age at diagnosis in years LOGWBC log WBC at diagnosis FRACTURE fractures at diagnosis: 0 = none, 1 = present LOGPBM log percentage of plasma cells in bone marrow PROTEIN proteinuria at diagnosis SALCIUM serum calcium at diagnosis

Survival Data (5) – more examples Survival analysis techniques arose from the life insurance industry as a method of costing insurance premiums. The term “survival” does not limit the usefulness of the technique to issues of life and death. A “survival” analysis could be used to examine: The survival time after a heart transplant The time a kidney graft remains functional The time from marriage to divorce The time from release to first arrest The time to a job change

Nonparametric Methods 1. Kaplan-Meier method (also called product-limit method) 2. Life table method To estimate the distribution of survival times -- estimate the survival rate -- calculate the median survival time -- graphs: survival curve, log(time) against log[-log(survival rate)] -- comparison of two survival curves

Product-Limit (Kaplan-Meier) Survival Estimates How to describe survival times (1) Product-Limit (Kaplan-Meier) Survival Estimates   Survival Standard Number Number TIME Survival Failure Error Failed Left 0.0000 1.0000 0 0 0 65 1.2500 . . . 1 64 1.2500 0.9692 0.0308 0.0214 2 63 2.0000 . . . 3 62 2.0000 . . . 4 61 2.0000 0.9231 0.0769 0.0331 5 60 3.0000 0.9077 0.0923 0.0359 6 59 4.0000* . . . 6 58 4.0000* . . . 6 57 5.0000 . . . 7 56 5.0000 0.8758 0.1242 0.0411 8 55 --------------------------------------------------------------- 89.0000 0.0414 0.9586 0.0382 47 1 92.0000 0 1.0000 0 48 0 NOTE: The marked survival times are censored observations.

Product-Limit (Kaplan-Meier) Survival Estimates How to describe survival times (2) Product-Limit (Kaplan-Meier) Survival Estimates ni: the number of surviving units just prior to ti di: the number of units that fail at ti q = di / ni p = 1- q time ni di q p survival rate 1.25 65 2 2/65 63/65 (63/65)=0.9692 63 3 3/63 60/63 (63/65)(60/63)=0.9231 60 1 1/60 59/60 (63/65)(60/63)(59/60)=0.9077 5 57 2/57 55/57 (63/65)(60/63)(59/60)(55/57)=0.8758 Applied Epidemiologic Analysis Fall 2002

How to describe survival times (3) Product-Limit (Kaplan-Meier) Survival Estimates Kaplan-Meier method uses the actual observed event and censoring times. A problem arises with Kaplan-Meier method if there exist censored times that are later than the last event time. The average duration will be underestimated when we use the time until the last event occurs. In the practical application of such cases, an interpretation only considers the length of time until the last event occurs.

How to describe survival times (4) Life Table Survival Estimates Effective Conditional Interval Number Number Sample Probability [Lower, Upper) Failed Censored Size of Failure Survival   NF NC n q p 0 10 16 5 62.5 0.2560 1.0000 20 15 7 40.5 0.3704 0.7440 20 30 3 1 21.5 0.1395 0.4684 30 40 3 0 18.0 0.1667 0.4031 40 50 2 1 14.5 0.1379 0.3359 50 60 4 2 11.0 0.3636 0.2896 60 70 2 0 6.0 0.3333 0.1843 70 80 0 1 3.5 0 0.1228 80 90 2 0 3.0 0.6667 0.1228 . 1 0 1.0 1.0000 0.0409 n = N – ½ (NC); 62.5 = 65 – 5/2, 40.5 = 44 – 7/2 q = NF / n; 0.2560 = 16/62.5, 0.3704 = 15/40.5 p = Пp = П(1-q); 0.7440 = 1 – 0.2560, 0.4684 = (1-0.2560)(1-0.3704)

How to describe survival times (5) Life Table Survival Estimates The Life Table method uses time interval. The Life Table method is very useful for a large sample, but the estimated results will depend on the chosen interval length. The larger the interval, the poorer the estimations. You should apply Kaplan-Meier method if the sample is not very large.

How to describe survival times (6) Survival Curve

How to describe survival times (7) Summary Statistics for Time Variable Point 95% Confidence Interval Percent Estimate [Lower Upper) 75 52.0000 35.0000 67.0000 50 19.0000 15.0000 35.0000 25 9.0000 6.0000 14.0000 Mean Standard Error 32.1460 4.0301 Percent Total Failed Censored Censored 65 48 17 26.15

How to describe survival times (8) Median Survival Time The median survival time is defined as the value at which 50% of the individuals have longer survival times and 50% have shorter survival times. The reason for reporting the median survival time rather than the mean survival time is because the distributions of survival time data often tend to be skewed, sometimes with a small number of long-term ‘survivors’. Another reason is that we can not calculate the mean survival time for the survival time with censored data.

How to describe survival times (9) How to estimate median survival time If there are no censored data, the median survival time is estimated by the middle observation of the ranked survival times. In the presence of censored data the median survival time is estimated by first calculating the Kaplan-Meier survival curve, then finding the value of survival time when survival rate=0.50 (50%)

How to describe survival times (10) Graph of Log Negative Log SDF versus Log Time Exponential Distribution The graph is approximately a straight line, the slope is 1. Weibull Distribution The graph is approximately a straight line, but the slope is greater or less than 1.

How to describe survival times (11) Graph of Log Negative Log SDF versus Log Time

Comparison of Two Survival Curves (1)

Comparison of Two Survival Curves (2) Median Survival Time Group 1: PLATELET = 0 (abnormal) Point 95% Confidence Interval Percent Estimate [Lower Upper) 50 13.0000 6.0000 35.0000   Group 2: PLATELET = 1 (normal) Point 95% Confidence Interval Percent Estimate (Lower Upper) 50 24.0000 16.0000 41.0000

Comparison of Two Survival Curves (3) Test of Equality of Two Survival Curves   Test Chi-Square DF P Value Log-Rank 3.2923 1 0.0696 Wilcoxon 2.3724 1 0.1235 -2Log(LR) 2.4065 1 0.1208 Log-Rank test for Weibull distribution or proportional hazards assumption, using weight=1 so that each failure time has equal weighting, placing less emphasis on the earlier failure times. Wilcoxon test For lognormal distribution, using weight=the total number at risk at that time so that earlier times receive greater weight than later times, placing less emphasis on the later failure times. -2Log(LR) : Likelihood Ratio test for exponential distribution survival data.

Parametric Models (1) Whenever fundamental hypotheses are to be tested or you have clear idea about the distribution of survival data, you should use a parametric model. Three most common parametric models: 1. Exponential regression model 2. Weibull regression model 3. Lognormal regression model

Exponential Regression Model Parametric Models (2) Exponential Regression Model The exponential distribution is a useful form of the survival distribution when the hazard function (probability of failure) is constant and does not depend on time, the graph is approximately a straight line with slope=1. In biomedical field, a constant hazard function is usually unrealistic, the situation will not be the case.

Weibull Regression Model Parametric Models (3) Weibull Regression Model The hazard function changes with time, the graph is approximately a straight line, but the slope is not 1. The hazard function always increase when the parameter α >1 The hazard function always decrease when α <1 It is the exponential regression model when α=1

Lognormal Regression Model Parametric Models (4) Lognormal Regression Model The survival times are log-normal distribution. The hazard function changes with time. The hazard function first increase and then decrease (an inverted “U” shape).

Cox Model (1) Disadvantages of parametric models: 1. It is necessary to decide how the hazard function depends on time. 2. It may be difficult to find a parametric model if the hazard function is believed to be nonmonotonic. 3. Parametric models do not allow for explanatory variables whose values change over time. It is cumbersome to develop fully parametric models that include time-varying covariates. Time-varying covariates are very important in survival analysis: 1) continuous time-varying variable: income is changed over time 2) discrete time-varying variable: single - married - divorce - remarried

h(t|xi) = h0(t) exp (βixi) Cox Model (2) David Cox, a British statistician, solved these problems in 1972, published a paper entitled “Regression Models and Life-Tables (with Discussion),” Journal of the Royal Statistical Society, Series B, 34:187-220 h(t|xi) = h0(t) exp (βixi)

Why is Cox model a semiparametric model ? h(t|xi) = h0(t) exp (βixi) h0(t): nonparametric baseline hazard function, this function does not have to be specified, the hazard may change as a function of time. exp (βixi): parametric form for the effects of the covariates, the hazard function changes as a exponential function of covariates

For any time t, hi(t) / hj(t) = C Cox Model (4) Why is Cox model a ‘proportional hazards’ model? Any two individuals (or groups, i & j) at any point in time, the ratio of their hazards is a constant (a fixed proportional). For any time t, hi(t) / hj(t) = C C may depend on explanatory variables but not on time.

What is a partial likelihood ? h(t|xi) = h0(t) exp (βixi) Cox Model (5) What is a partial likelihood ? It is easy for a statistician to write down a model: h(t|xi) = h0(t) exp (βixi) It isn’t easy to devise ways to estimate this model. Cox’s most important contribution was to propose a method called partial likelihood because it does not include the baseline hazard function h0(t). Partial likelihood depends only on the order in which events occur, not on the exact times of occurrence.

What is a partial likelihood ? (cont) Cox Model (6) What is a partial likelihood ? (cont) Partial likelihood accounts for censored survival times. Partial likelihood allows time-dependent explanatory variables. It is not fully efficient because some information is lost by ignoring the exact times of event occurrence. But the loss of efficiency is usually so small that it is not worth worrying about.

Cox Model (7) Using Cox model to fit our data (final model) Parameter Standard Hazard 95% Hazard Ratio Variable Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits LOGBUN 1.67440 0.61209 7.4833 0.0062 5.336 1.608 17.709 HGB -0.11899 0.05751 4.2811 0.0385 0.888 0.793 0.994   The hazards ratio (also known as risk ratio) is the ratio of the hazards functions that correspond to a change of one unit of the given variable and conditional on fixed values of all other variables. An increase in one unit of the log of blood urea nitrogen increases the hazard of dying by 433.6% (5.336-1). An increase in one unit of hemoglobin at diagnosis decreases the hazard of dying by 11.2% (1-0.888).  

Cox Model (8): Examine Proportional Hazards Assumption Checking the assumption graphically The two plots appear ‘parallel’ in that there is an approximately constant vertical distance between them at any given time. The hazards for the two groups are proportional, their ratio remains approximately constant with time.

2. Statistical test of the assumption Cox Model (9) Examine Proportional Hazards Assumption cont. 2. Statistical test of the assumption Testing the increasing or decreasing trend over time in the hazard function by investigating the interaction between time and covariate. A significant interaction would imply the hazard function changes with time, the proportional hazards model assumption is invalid.

How do you decide which model to use? (1) How does hazard function depend on time? Examples The hazard function for retirement increases with age. The hazard function for being arrested declines with age at least after age 25. The hazard function for death from any cause has “U” shape.

How do you decide which model to use? (2) Using exponential regression model if hazard function is constant and does not depend on time. 2. Using Weibull regression model (monotonic models) if hazard function always increases or always decreases with time. 3. Using Lognormal regression model (nonmonotonic models) if hazard function first increases and then decreases with time (an inverted “U” shape).

How do you decide which model to use? (3) 4. Using Cox regression model if hazard function first decreases and then increases, or changes dynamically (a “U” shape or other shapes) Cox model can fit any distribution of survival data if the proportional hazards assumption is valid (actually most hazards ratios are fixed proportional). This is why the Cox model is used so widely now. By the way, when we have a Cox model, we can not use this model for forecasting because we just have exp (βixi), we do not have the h0(t) (baseline hazard function). We have to estimate h0(t) (by using BASELINE Statement in SAS) before we forecast.