1 Statistics 262: Intermediate Biostatistics Kaplan-Meier methods and Parametric Regression methods.

Slides:



Advertisements
Similar presentations
Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
Advertisements

Survival Analysis-1 In Survival Analysis the outcome of interest is time to an event In Survival Analysis the outcome of interest is time to an event The.
Comparing Two Proportions (p1 vs. p2)
Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
HSRP 734: Advanced Statistical Methods July 24, 2008.
SC968: Panel Data Methods for Sociologists
Tests for time-to-event outcomes (survival analysis)
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Multiple regression analysis
بسم الله الرحمن الرحیم. Generally,survival analysis is a collection of statistical procedures for data analysis for which the outcome variable of.
Introduction to Survival Analysis
Chapter 11 Survival Analysis Part 3. 2 Considering Interactions Adapted from "Anderson" leukemia data as presented in Survival Analysis: A Self-Learning.
Survival analysis1 Every achievement originates from the seed of determination.
PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).
Biostatistics in Research Practice Time to event data Martin Bland Professor of Health Statistics University of York
Introduction to Survival Analysis
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
Event History Models Sociology 229: Advanced Regression Class 5
Introduction to Survival Analysis PROC LIFETEST and Survival Curves.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
1 Tests for time-to-event outcomes (survival analysis)
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
1 Kaplan-Meier methods and Parametric Regression methods Kristin Sainani Ph.D. Stanford University Department of Health.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Survival Curves Marshall University Genomics Core.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
1 Survival Analysis Biomedical Applications Halifax SAS User Group April 29/2011.
Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international.
HSRP 734: Advanced Statistical Methods July 10, 2008.
G Lecture 121 Analysis of Time to Event Survival Analysis Language Example of time to high anxiety Discrete survival analysis through logistic regression.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
CHAPTER 18: Inference about a Population Mean
Assessing Survival: Cox Proportional Hazards Model
Time-dependent covariates and further remarks on likelihood construction Presenter Li,Yin Nov. 24.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Applied Epidemiologic Analysis Fall 2002 Patricia Cohen, Ph.D. Henian Chen, M.D., Ph. D. Teaching Assistants Julie KranickSylvia Taylor Chelsea MorroniJudith.
01/20151 EPI 5344: Survival Analysis in Epidemiology Survival curve comparison (non-regression methods) March 3, 2015 Dr. N. Birkett, School of Epidemiology,
Linear correlation and linear regression + summary of tests
HSRP 734: Advanced Statistical Methods July 17, 2008.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 9 Survival Analysis Henian Chen, M.D., Ph.D.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
Lecture 12: Cox Proportional Hazards Model
Survival Analysis approach in evaluating the efficacy of ARV treatment in HIV patients at the Dr GM Hospital in Tshwane, GP of S. Africa Marcus Motshwane.
01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
Satistics 2621 Statistics 262: Intermediate Biostatistics Jonathan Taylor and Kristin Cobb April 20, 2004: Introduction to Survival Analysis.
Lecture 3: Parametric Survival Modeling
Treat everyone with sincerity,
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
INTRODUCTION TO CLINICAL RESEARCH Survival Analysis – Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
SURVIVAL ANALYSIS PRESENTED BY: DR SANJAYA KUMAR SAHOO PGT,AIIH&PH,KOLKATA.
April 18 Intro to survival analysis Le 11.1 – 11.2
The binomial applied: absolute and relative risks, chi-square
Tests for time-to-event outcomes (survival analysis)
Event History Analysis 3
Survival Analysis {Chapter 12}
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Statistics 262: Intermediate Biostatistics
EVENT PROJECTION Minzhao Liu, 2018
Presentation transcript:

1 Statistics 262: Intermediate Biostatistics Kaplan-Meier methods and Parametric Regression methods

2 More on Kaplan-Meier estimator of S(t) (“product-limit estimator” or “KM estimator”) When there are no censored data, the KM estimator is simple and intuitive: Estimated S(t)= proportion of observations with failure times > t. For example, if you are following 10 patients, and 3 of them die by the end of the first year, then your best estimate of S(1 year) = 70%. When there are censored data, KM provides estimate of S(t) that takes censoring into account (see last week’s lecture). If the censored observation had actually been a failure: S(1 year)=4/5*3/4*2/3=2/5=40% KM estimator is defined only at times when events occur! (empirically defined)

3 KM (product-limit) estimator, formally

4 S(t) represents estimated survival probability at time t: P(T>t) Observed event times Typically d j = 1 person, unless data are grouped in time intervals (e.g., everyone who had the event in the 3 rd month). The risk set n j at time t j consists of the original sample minus all those who have been censored or had the event before t j This formula gives the product-limit estimate of survival at each time an event happens. d j /n j =proportion that failed at the event time t j 1- d j /n j =proportion surviving the event time Multiply the probability of surviving event time t with the probabilities of surviving all the previous event times.

5 Example 1: time-to-conception for subfertile women “Failure” here is a good thing. 38 women (in 1982) were treated for infertility with laparoscopy and hydrotubation. All women were followed for up to 2-years to describe time-to-conception. The event is conception, and women "survived" until they conceived. Example from: BMJ, Dec 1998; 317:

Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy and hydrotubation. BMJ 1982; 284: Raw data: Time (months) to conception or censoring in 38 sub-fertile women after laparoscopy and hydrotubation (1982 study) Conceived (event)Did not conceive (censored)

7 Corresponding Kaplan-Meier Curve S(t) is estimated at 9 event times. (step-wise function)

Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy and hydrotubation. BMJ 1982; 284: Raw data: Time (months) to conception or censoring in 38 sub-fertile women after laparoscopy and hydrotubation (1982 study) Conceived (event)Did not conceive (censored)

Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy and hydrotubation. BMJ 1982; 284: Raw data: Time (months) to conception or censoring in 38 sub-fertile women after laparoscopy and hydrotubation (1982 study) Conceived (event)Did not conceive (censored)

10 Corresponding Kaplan-Meier Curve 6 women conceived in 1 st month (1 st menstrual cycle). Therefore, 32/38 “survived” pregnancy-free past 1 month.

11 Corresponding Kaplan-Meier Curve S(t=1) = 32/38 = 84.2% S(t) represents estimated survival probability: P(T>t) Here P(T>1).

Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy and hydrotubation. BMJ 1982; 284: Raw data: Time (months) to conception or censoring in 38 sub-fertile women after laparoscopy and hydrotubation (1982 study) Conceived (event)Did not conceive (censored) Important detail of how the data were coded: Censoring at t=2 indicates survival PAST the 2 nd cycle (i.e., we know the woman “survived” her 2 nd cycle pregnancy-free). Thus, for calculating KM estimator at 2 months, this person should still be included in the risk set. Think of it as 2+ months, e.g., 2.1 months.

13 Corresponding Kaplan-Meier Curve

14 Corresponding Kaplan-Meier Curve 5 women conceive in 2 nd month. The risk set at event time 2 included 32 women. Therefore, 27/32=84.4% “survived” event time 2 pregnancy-free. S(t=2) = ( 84.2%)*(84.4%)=71.1% Can get an estimate of the hazard rate here, h(t=2)= 5/32=15.6%. Given that you didn’t get pregnant in month 1, you have an estimated 5/32 chance of conceiving in the 2 nd month. And estimate of density (marginal probability of conceiving in month 2): f(t)=h(t)*S(t)=(.711)*(.156)=11%

Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy and hydrotubation. BMJ 1982; 284: Raw data: Time (months) to conception or censoring in 38 sub-fertile women after laparoscopy and hydrotubation (1982 study) Conceived (event)Did not conceive (censored) Risk set at 3 months includes 26 women

16 Corresponding Kaplan-Meier Curve

17 Corresponding Kaplan-Meier Curve S(t=3) = ( 84.2%)*(84.4%)*(88.5%)=62.8% 3 women conceive in the 3 rd month. The risk set at event time 3 included 26 women. 23/26=88.5% “survived” event time 3 pregnancy-free.

Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy and hydrotubation. BMJ 1982; 284: Raw data: Time (months) to conception or censoring in 38 sub-fertile women after laparoscopy and hydrotubation (1982 study) Conceived (event)Did not conceive (censored) Risk set at 4 months includes 22 women

19 Corresponding Kaplan-Meier Curve

20 Corresponding Kaplan-Meier Curve S(t=4) = ( 84.2%)*(84.4%)*(88.5%)*(86.4%)=54.2% 3 women conceive in the 4 th month, and 1 was censored between months 3 and 4. The risk set at event time 4 included 22 women. 19/22=86.4% “survived” event time 4 pregnancy-free. Hazard rates (conditional chances of conceiving, e.g. 100%-84%) look similar over time. And estimate of density (marginal probability of conceiving in month 4): f(t)=h(t)*S(t)=(.136)* (.542)=7.4%

Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy and hydrotubation. BMJ 1982; 284: Raw data: Time (months) to conception or censoring in 38 sub-fertile women after laparoscopy and hydrotubation (1982 study) Conceived (event)Did not conceive (censored) Risk set at 6 months includes 18 women

22 Corresponding Kaplan-Meier Curve

23 Corresponding Kaplan-Meier Curve S(t=6) = (54.2%)*(88.8%)=42.9% 2 women conceive in the 6 th month of the study, and one was censored between months 4 and 6. The risk set at event time 5 included 18 women. 16/18=88.8% “survived” event time 5 pregnancy-free.

24 Skipping ahead to the 9 th and final event time (months=16)… S(t=13)  22% (“eyeball” approximation)

Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy and hydrotubation. BMJ 1982; 284: Raw data: Time (months) to conception or censoring in 38 sub-fertile women after laparoscopy and hydrotubation (1982 study) Conceived (event)Did not conceive (censored) 2 remaining at 16 months (9 th event time)

26 Skipping ahead to the 9 th and final event time (months=16)… S(t=16) =( 22%)*(2/3)=15% Tail here just represents that the final 2 women did not conceive (cannot make many inferences from the end of a KM curve)!

27 Kaplan-Meier: SAS output The LIFETEST Procedure Product-Limit Survival Estimates Survival Standard Number Number time Survival Failure Error Failed Left * * *

28 Kaplan-Meier: SAS output Survival Standard Number Number time Survival Failure Error Failed Left * * * * * * * * * * NOTE: The marked survival times are censored observations.

29 Monday “Gut Check” Problem… Calculate the product-limit estimate of survival for the following data (n=9): Time-to-event (months)Survival (1=died/0=censored)

Not so easy to get a plot of the actual hazard function! In SAS, need a complicated MACRO, and depends on assumptions…here’s what I get from Paul Allison’s macro for these data…

31 At best, you can get the cumulative hazard function… See lecture 1 if you want more math! Linear cumulative hazard function indicates a constant hazard.

32 Cumulative Hazard Function If the hazard function is increasing with time, e.g. h(t)=kt, then the cumulative hazard function will be curved up, for example h(t)=kt gives a quadratic: If the hazard function is constant, e.g. h(t)=k, then the cumulative hazard function will be linear (and higher hazards will have steeper slopes): If the hazard function is decreasing over time, e.g. h(t)=k/t, then the cumulative hazard function should be curved down, for example:

33 Kaplan-Meier: example 2 Researchers randomized 44 patients with chronic active hepatitis were to receive prednisolone or no treatment (control), then compared survival curves. Example from: BMJ 1998;317: ( 15 August )

Prednisolone (n=22)Control (n=22) * *37 128*40 131*41 140*54 141* * * 148*140* 162*146* * 173*167* 181*182* Data from: BMJ 1998;317: ( 15 August ) *=censored Survival times (months) of 44 patients with chronic active hepatitis randomised to receive prednisolone or no treatment.

35 Kaplan-Meier: example 2 Are these two curves different? Misleading to the eye— apparent convergence by end of study. But this is due to 6 controls who survived fairly long, and 3 events in the treatment group when the sample size was small. Big drops at the end of the curve indicate few patients left. E.g., only 2/3 (66%) survived this drop.

Survival Standard Number Number time Survival Failure Error Failed Left * * * * * * Control group: 6 controls made it past 100 months.

Survival Standard Number Number time Survival Failure Error Failed Left * * * * * * * * * * * treated group: 5/6 of 54% rapidly drops the curve to 45%. 2/3 of 45% rapidly drops the curve to 30%.

38 Point-wise confidence intervals We will not worry about mathematical formula for confidence bands. The important point is that there is a confidence interval for each estimate of S(t). (SAS uses Greenwood’s formula.)

39 Log-rank test Test of Equality over Strata Pr > Test Chi-Square DF Chi-Square Log-Rank Wilcoxon Log(LR) Chi-square test (with 1 df) of the (overall) difference between the two groups. Groups appear significantly different.

40 Log-rank test Log-rank test is just a Cochran-Mantel-Haenszel chi-square test! Anyone remember (know) what this is?

CMH test of conditional independence Group 1 Group 2 EventNo Event ab cd K Strata = unique event times NkNk

CMH test of conditional independence NkNk Group 1 Group 2 EventNo Event ab cd K Strata = unique event times

CMH test of conditional independence How do you know that this is a chi-square with 1 df? Why is this the expected value in each stratum? Variance is the variance of a hypergeometric distribution Group 1 Group 2 EventNo Event ab cd

Survival Standard Number Number time Survival Failure Error Failed Left * * * * * * Event time 1 (2 months), control group: At risk=22 1 st event at month 2.

Survival Standard Number Number time Survival Failure Error Failed Left * * * * * * * * * * * Event time 1 (2 months), treated group: At risk=22 1 st event at month 2.

Stratum 1= event time 1 treated control EventNo Event Event time 1: 1 died from each group. (22 at risk in each group) 44

Survival Standard Number Number time Survival Failure Error Failed Left * * * * * * Event time 2 (3 months), control group: At risk=21 Next event at month 3.

Survival Standard Number Number time Survival Failure Error Failed Left * * * * * * * * * * * Event time 2 (3 months), treated group: At risk=21 No events at 3 months

Stratum 2= event time 2 treated control EventNo Event Event time 2: At 3 months, 1 died in the control group. At that time 21 from each group were at risk 42

Survival Standard Number Number time Survival Failure Error Failed Left * * * * * * Event time 3 (4 months), control group: At risk=20 1 event at month 4.

Survival Standard Number Number time Survival Failure Error Failed Left * * * * * * * * * * * Event time 3 (4 months), treated group: At risk=21

Stratum 3= event time 3 (4 months) treated control EventNo Event Event time 3: At 4 months, 1 died in the control group. At that time 21 from the treated group and 20 from the control group were at-risk. 41

Etc.

54 Log-rank test, et al. Test of Equality over Strata Pr > Test Chi-Square DF Chi-Square Log-Rank Wilcoxon Log(LR) Likelihood Ratio test is not ideal here because it assumes exponential distribution (constant hazard). Wilcoxon is just a version of the log-rank test that weights strata by their size (giving more weight to earlier time points). More sensitive to differences at earlier time points. Log-rank test has most power to test differences that fit the proportional hazards model—so works well as a set-up for subsequent Cox regression.

55 Estimated –log(S(t)) Maybe hazard function decreases a little then increases a little? Hard to say exactly…

56 Approximated h(t)

57 One more graph from SAS… log(-log(S(t))= log(cumulative hazard) If group plots are parallel, this indicates that the proportional hazards assumption is valid. Necessary assumption for calculation of Hazard Ratios…

58 Uses of Kaplan-Meier Commonly used to describe survivorship of study population/s. Commonly used to compare two study populations. Intuitive graphical presentation.

59 Limitations of Kaplan-Meier Mainly descriptive Doesn’t control for covariates Requires categorical predictors SAS does let you easily discretize continuous variables for KM methods, for exploratory purposes. Can’t accommodate time-dependent variables

60 Parametric Models for the hazard/survival function The class of regression models estimated by PROC LIFEREG is known as the accelerated failure time models.

61 Parameters of the Weibull distribution Shape parameter (inverse of the scale parameter): <1: hazard rate is decreasing >1 hazard rate is increasing

62 Constant hazard rate (special case of Weibull where shape parameter =1.0)

63 Recall: two parametric models Components: A baseline hazard function (that may change over time). A linear function of a set of k fixed covariates that when exponentiated (and a few other things) gives the relative risk. Exponential model assumes fixed baseline hazard that we can estimate.Weibull model models the baseline hazard as a function of time. Two parameters (baseline hazard and scale) must be estimated to describe the underlying hazard function over time.

64 To get Hazard Ratios (relative risk)… Weibull (and thus exponential) are proportional hazards models, so hazard ratio can be calculated. For other parametric models, you cannot calculate hazard ratio (hazards are not necessarily proportional over time). More tricky to get confidence intervals here!

65 What’s a hazard ratio? Distinction between hazard/rate ratio and odds ratio/risk ratio: Hazard/rate ratio: ratio of incidence rates Odds/risk ratio: ratio of proportions

66 Example 1 Using data from pregnancy study… Recall: roughly, hazard rates were similar over time (implies exponential model should be a good fit).

The LIFEREG Procedure Analysis of Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept <.0001 Scale Weibull Shape Scale of 1.0 makes a Weibull an exponential, so looks exponential.

68 Parametric estimates of survival function based on a Weibull model (left) and exponential (right). Compare to KM:

69 Example 2: 2 groups Using data from hepatitis trial, I fit exponential and Weibull models in SAS using LIFEREG (Weibull is default in LIFEREG)…

The LIFEREG Procedure Dependent Variable Log(time) Right Censored Values 17 Left Censored Values 0 Interval Censored Values 0 Name of Distribution Exponential Log Likelihood Analysis of Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept <.0001 group Scale Weibull Shape P-value for group very similar to p-value from log- rank test. Scale parameter is set to 1, because it’s exponential. -2Log Likelihood = 2*68= 136 Hazard ratio (treated vs. control): e =.406 Interpretation: median time to death was decreased 60% in treated group; or, equivalently, mortality rate is 60% lower in treated group.

Model Information Dependent Variable Log(time) Right Censored Values 17 Left Censored Values 0 Interval Censored Values 0 Name of Distribution Weibull Log Likelihood Analysis of Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept <.0001 group Scale Weibull Shape Hazard ratio (treated vs. control): e -1.05/1.267 =.43 P-value for group very similar to p-value from log- rank test and exponential model. Scale parameter is greater than 1, indicating decreasing hazard with time. -2Log Likelihood = 2*67= 134 Shape parameter is just 1/scale parameter! Comparison of models using Likelihood Ratio test: -2LogLikelihood(simpler model)—2LogLikelihood(more complex) = chi- square with 1 df (1 extra parameter estimated for weibull model). = = 2 NS No evidence that Weibull model is much better than exponential.

Parametric estimates of cumulative survival based on Weibull model (left) and exponential (right), by group. Compare to KM:

73 Compare to Cox regression: Parameter Standard Hazard 95% Hazard Ratio Variable DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits group