MUSC Summer Institute May 24-25, 2018 Dr. Bethany Wolf

MUSC Summer Institute May 24-25, 2018 Dr. Bethany Wolf
Survival Analysis MUSC Summer Institute May 24-25, 2018 Dr. Bethany Wolf

Outline for Today Time-to-event outcomes
Estimating survival distributions Hypothesis testing (log-rank tests) A brief introduction to R Survival analysis in R, Part 1

Time to Event Outcomes

Introduction to Survival Analysis
What makes it different? Three main variable types Continuous Categorical Time-to-event Examples

Time to Event Outcomes Outcomes in which we have an start Examples
Time to death Time to relapse Time to first opiate rescue dose after hospitalization Time to complete recovery from illness/disease

Bone Marrow Transplant for Leukemia
137 patient undergoing bone marrow transplant (BMT) for acute leukemia Some of the variables include Disease type (ALL, AML low risk, AML high risk) Gender Patient and donor age Treatment with methotrexate Time to death or relapse Examine survival (death or relapse) experience of these patients

Gender- how might we look at this?
What about patient and donor ages at time of transplant…. These are familiar types of variables and there are “standard” things we look at?

Patient age at transplant:
Patient Gender: Females: n = 57, 41.6% Males: n = 80, 57.4% Patient age at transplant: Mean: years Standard deviation: years Range: years

But what about time to death?
Gender and age are familiar types of variables and there are “standard” statistics we use to summarize them. But what about time to death? What happens if we consider standard statistical summaries?

Time to Death/Relapse? Number of people who died or relapsed
Mean time to “death/relpase” (all subjects) days Mean time to death (subjects who died) days Range 1 to vs. 1 to 2204 days

Does our summary of time to death/relapse make sense?
83 of the 137 patients relapsed but the rest were alive at the end of the study… Does it make sense to estimate a mean (or median) On all patients? Only using patients who died?

Censoring Different types: Each leads to a different likelihood
Right Left Interval Each leads to a different likelihood Most common is right censored

Right Censored Data Event is observed if it occurs before some pre-specified time Consider an animal study Clock starts on the first day of treatment Clock ends at death Always thinking about the clock

Treatment of Transitional Cell Carcinoma of the Bladder in Mice
Consider a study investigating how a new cancer therapeutic effects survival in mice transfected with transitional cell bladder carcinoma. In this case, all 8 animals are treated at the same time and we follow them forward in time to observe how long it takes them to die.

Simple Example of Right Censoring

Treatment of TCC of the Bladder in Mice
What if the researcher is only interested in survival up to 12 weeks post-treatment? animals that live longer are sacrificed at 12 weeks What do we the information on the animals that are alive at 12 weeks? Let’s look at our plot again…

Simple Example of Right Censoring

Consider the bone marrow transplant study… An investigator conducts a prospective study to investigate the impact of different factors on time to relapse or death Disease classification Patient and donor age Impact of treating with methotrexate Whether or not they exhibited platelet recovery

These data are right censored since many of the patients did not relapse or die before the study ended However, unlike the mouse study, patients do not all enter the study at the same time. So what might a line plot of patient survival look like?

More Realistic: Clinical Trial

More Realistic: Clinical Trials

Additional Issues Patient drop-out Loss to follow-up

Drop-out or LTFU

How do we treat these data?
Shift everything so each patient time represents time in the study Time of Enrollment

Interval Censoring Due to discrete observation times, actual time of event not observed Examples: Progression in cancer determined by tumor size. Measures at 3-6 month intervals and if increase occurs, known only to have happened in the last interval Time to death in nematodes. Nematodes are grown on a plate and checked periodically for death. Due to short life span, exact time of death for each worm is not observed. Times are biased to longer values Challenging issue when intervals are long

Left Censoring The event has occurred prior to the start of the study.
OR the true survival time is less than the person’s observed survival time. We know the event occurred, but are unsure when prior to observation. In this kind of study, exact time would be known if it occurred after the study started.

Example of Left Censoring
191 California high school boys were asked “When did you first smoke marijuana?” Possible answers include: The exact age “never used it” “Used it but can not recall age” “Never used” is a right-censored observation “Used it but can not recall age” is a left-censored observation (we only know that the event occurred prior to the boys current age)

Truncation Only a subset of individuals are able to be observed
“truncation event” Only individuals with the event are included in the study Examples: Left Truncation: Time to relapse (requires a complete response to cancer treatment) Right Truncation: Patients with AIDS from transfusion (only patients who developed AIDS prior to a cutoff time are included)

Truncation Example: Time to AIDS
296 patients infected with HIV via blood transfusion in 1978 were sampled in 1987after they had converted to AIDS, in order to determine the time to development of AIDS among those acquiring HIV from transfusion. According to the sampling scheme used in 1987, only those who had developed AIDS were included in the study. This represents right-truncation since individuals who had yet to develop AIDS were not included in the sample.

Truncation Example Patient must experience a response before duration of the response can be assessed Censored for response response Censored at relapse relapse

Censoring Assumption:
Potential censoring time is unrelated to potential event time Reasonable? Estimation approaches are biased when this is violated Violation example: Sick patients tend to miss clinical visits more often High school drop-out: Kids who move may be more likely to drop-out

Structure of Time To Event Data
In order to evaluate survival distributions Estimate survival rates Test associations Etc. The data must have the appropriate format Event times Censoring times Indicator of event occurence

Key Components Event: Must have a clear definition of what constitutes an ‘event’ Death Disease Recurrence Response to treatment Need to know when the clock starts: Age at event? Time from study initiation? Time from randomization? Time since response?

Notation Many possible versions For our purposes, we will use
ti as the observed time to event or censoring time for the ith subject di to indicate censoring

Data Structure For each subject data must include Censoring indicator:
One (or more) event/censoring time Indicator of whether or not the subject had the event Censoring indicator: A binary (0/1) variable indicating whether or not a subject had the event at the specified time For survival analysis in R 1 = subject had the event a time t 0 = subject did not have the event at time t (i.e. they were censored).

Data Structure Time can be specified in one of two ways:
Each subject has a single entry which shows Their last observed time, t Whether or not they experienced the event at that time (censoring indicator) Or a subject can be listed on multiple rows Each row has a start and stop time Start and stop intervals are non-overlapping And each entry has an indicator of whether or not they experienced the event at the end of that time interval

Simple Example Data on 10 subject with a time to event outcome
The event/censoring times are 10, 20+, 35, 40+, 50+, 55, 70+, 80, 90+ Here a + indicates that the subject did not experience the event

Simple Example: First Approach
Each subject has a single entry Data: 10, 20+, 35, 40+, 50+, 55, 70+, 80, 90+ ID Event Indicator Time 1 2 3 4 5 6 7 8 9 10

Simple Example: Second Approach
ID Event Indicator Start Time Stop Time 1 10 2 20 3 35 4 40 5 … Each subject can have more than on entry where each entry has a start/stop time Data: 10, 20+, 35, 40+, 50+, 55, 70+, 80, 90+

Estimating Survival Probability and Survival Curves

Time to Event Outcomes Modeled using “survival analysis”
Define t = time to event Total time they are followed after study entry (experience the event)-(study entry) (study end or LTFU)-(study entry) Can be in any unit of time desired Define the censoring indicator 1 if experienced the event in the observed time 0 if did not experience event in the observed time or where LTFU

Time to Event Outcomes Key functions for characterizing time to event outcome Survival function Cumulative hazard Hazard rate (or function)

Survival Function S(t) = the probability of an individual surviving to time t Basic properties Monotonic non-increasing S(0) = 1 S(∞) = 0* *debatable: cure-rate distribution allow plateau at some other value

Hazard Rate A little harder to conceptualize
Instantaneous failure rate (aka conditional failure rate) Interpretation: probability that a person at time t experiences the event in the interval (t, t+Dt) given survival to time t.

Hazard Function Useful for conceptualizing how the chance of an event changes over time i.e. consider hazard ‘relative’ over time Examples: Treatment related mortality Early on, high risk of death Later on, risk of death decreases Aging Early on, low risk of death Later on, high risk of death

Shapes of Hazard Functions
Increasing Natural aging and wear Decreasing Early failures due to device or transplant failures Bathtub Populations followed from birth Hump Shaped Initial risk of event, followed by decreasing chance of event

Cumulative Hazard Function
Often used instead of the hazard function There is an easy mathematical relationship between H(t) and S(t)

More Terminology D distinct event or censoring times
t1 < t2 < t3 < … < tD Ties allowed At time ti, there are di events Yi is the number of individuals at risk at ti Yi is all the people who have event times > ti di/Yi is an estimate of the conditional probability of an event at ti, given survival to ti

Kaplan-Meier Estimation
Also known as the ‘product-limit’ estimator Step-function Size of the steps depends of Number of events at time t Pattern of censoring before t

Simple Example Event = time to relapse Data:
10, 20+, 35, 40+, 50+, 55, 70+, 71+, 80, 90+

Times: 10, 20+, 35, 40+, 50+, 55, 70+, 71+, 80, 90+ 10 20 35 40 50 55 70 71 80 90

Times: 10, 20+, 35, 40+, 50+, 55, 70+, 71+, 80, 90+ 10 1-(0/10)=1 1 20 35 40 50 55 70 71 80 90

Times: 10, 20+, 35, 40+, 50+, 55, 70+, 71+, 80, 90+ 10 1-(0/10) 1 1-(1/10)=0.9 1*0.9 = 0.9 20 35 40 50 55 70 71 80 90

Times: 10, 20+, 35, 40+, 50+, 55, 70+, 71+, 80, 90+ 10 1-(0/10)=1 1 1-(1/10)=0.9 1*0.9 = 0.9 20 9 1-(0/9)=0.9 0.9*1 = 0.9 35 40 50 55 70 71 80 90

Times: 10, 20+, 35, 40+, 50+, 55, 70+, 71+, 80, 90+ 10 1-(0/10)=1 1 1-(1/10)=0.9 1*0.9 = 0.9 20 9 1-(0/9)=0.9 0.9*1 = 0.9 35 8 1-(1/8)-0.875 0.9*0.875 = 0.79 40 50 55 70 71 80 90

10 1-(0/10) 1 1-(1/10) 1*0.9 = 0.9 20 9 1-(0/9) 0.9*1 = 0.9 35 8 1-(1/8) 0.9*0.875 = 0.79 40 7 1-(0/7) 0.79*1=0.79 50 6 1-(0/6) .79*1=.79 55 5 1-(1/5) .79*.8=.63 70 4 1-(0/4) .63*1=.63 71 3 1-(0/3) 80 2 1-(1/2) .63*.5=.32 90 1-(0/1) .32*1=.32

Plot it… Time di 1 10 0.9 20 35 0.79 40 50 55 0.63 70 71 80 0.32 90

Recall Cumulative Hazard
We said the relationship between the hazard and survival functions is: So to look at the hazard for the Kaplan-Meier estimate use

10 1 -ln(1) = 0 0.9 -ln(0.9) = 0.105 20 9 35 8 0.79 -ln(0.79) = 0.235 40 7 50 6 55 5 0.63 -ln(0.63) = 0.462 70 4 71 3 80 2 0.32 -ln(0.32) = 1.14 90

Cumulative Hazard Plot
Time di H(t) 10 1 0.105 20 35 0.235 40 50 55 0.462 70 71 80 1.14 90

An Alternative to KM: Nelson-Aalen Estimator
Better small sample properties than KM Variance of NA estimator

Uses of NA Model Identification Estimates of the hazard rate, h(t)
Determine if parametric exponential model appropriate Estimates of the hazard rate, h(t) Slope of H(t) Survival Function S(t) = exp(-H(t)) S(t) using NA for H(t) is called the Fleming-Harrington/Breslow method

Comparison of KM and NA Estimates
10 1 0.9 0.1 0.905 20 9 35 8 0.788 0.225 0.799 40 7 50 6 55 5 0.63 0.425 0.654 70 4 71 3 80 2 0.315 0.925 0.397 90

Comparison of KM and NA Estimates

Bone Marrow Transplant
What about a more realistic example? Let’s look at the Kaplan-Meier and Nelson Aalen estimates for survival and cumulative hazard in our bone marrow transplant patients

KM and NA: Time to Death/Relapse in BMT

Nelson Aalen vs. Kaplan Meier
Approximately the same when risk sets (Yi) large relative to number of subjects who have the event Larger difference when ties present Different standard errors when ties present Smaller MSE than KM for S(t) > 0.20 and larger otherwise NA biased upward when survival estimated close to 0

Why consider NA over KM NA is the basis for all log-rank type hypothesis tests. Additionally, NA can provide an estimate of hazard rate Hazard rate can be estimated as slope of the cumulative hazard

Estimates from a Survival Curve

Estimates from a Survival Curve
Now we have a survival curve We would like to be able to say something about survival in our population Median survival time (or other quantiles) X-Time survival rate 6 month rejection free survival rate in organ transplant 5 year survival rate in specific cancer

Median Survival Time Common way to express the ‘center’ of the distribution Find the time t where the estimated survival probability is 0.5 We don’t often have S(t) = 0.5 so we choose the first survival probability where

Time to Event in Toy Example
# at Risk # Events S(t) 10 1 0.90 35 8 0.79 55 5 0.63 80 2 0.315

Median Time to Death in BMT
Again, what about a more realistic example? Time # at Risk # Events S(t) … 422 72 1 0.524 456 71 0.517 466 70 0.509 467 69 0.502 481 68 0.494 486 67 0.487 487 66 0.480 526 65 0.472

Median Survival in BMT Data
If the median is not observed Estimating a median can be done but it is an extrapolation Often just state ‘median not reached’ Note we can also estimate other quantiles in the same way

Alternatives: X-Time Survival
Many applications have ‘landmark’ times that historically used to quantify survival Examples: Breast cancer: 5 year relapse-free survival Pancreatic cancer: 6 month survival Acute myeloid leukemia (AML): 12 month relapse-free survival Solve the survival function for S(t) given time t

Time to Event in Toy Example
Let’s look at survival to time = 60 in the toy example Time # at Risk # Events S(t) 10 1 0.90 35 8 0.79 55 5 0.63 80 2 0.315

X-Time Survival in BMT Perhaps we are interested in the probability a patient will survive (disease free) to 6 month or 2 years post-transplant Similar to estimating the median but instead we look for the probability for that specific time That is find S(182.5 days = 6 months) S(730 days = 2 years)

6 month Survival in BMT Time # at Risk # Events S(t) … 164 99 1 0.715
168 98 0.708 172 97 0.701 183 96 0.693 192 95 0.686 194 94 0.679 211 93 0.672 219 92 0.664

2 Year Survival in BMT Time # at Risk # Events S(t) … 625 61 1 0.450
641 60 0.444 666 59 0.435 677 58 0.427 704 57 0.420 748 56 0.412 1063 47 0.404 1074 46 0.395

X-Time Survival in BMT For the median, we selected the first time where the estimate survival probability was S(t) < 0.5 For X-time survival, the probability is based on the latest time that is less than or equal to the time of interest. So for BMT The probability of surviving 6 months is 70.1% The probability of surviving 2 years is 42.0%

Point-wise Confidence Intervals
Constructed so that the true value of survival probability, S(t) at a particular t, falls in the interval with 100 x (1 - a)% confidence Based on the estimated variance For Kaplan-Meier the estimated variance is : where is the probability of survival at time t

Types of Point-Wise CIs
There are several point-wise confidence intervals for survival probability Linear Log Log-log Note, there are other types but these are the most common The default in R is log confidence intervals

Types of Point-wise CIs
Linear Log Log-log

Types of Confidence Intervals
Linear are the easiest to compute However the transformations, while more complicated, have better properties All easy enough if there is software to calculate them

Comparison in our Toy Example

Comparison in BMT Data

Which to Use When? Linear Log Log-log
Very anti-conservative, particularly for small N Too narrow Log Conservative for upper limit Anti-conservative on lower limit Log-log For small N, slightly anti-conservative A little too narrow

Which to Use When? For N > 25 and < 50% censoring
Log and log-log are good Both given ~ nominal coverage for 95% CI Exception: extreme right tail where there is little data Linear approach requires much larger N

Confidence Intervals What is most commonly produced by software packages Valid ONLY for point-wise intervals Problem is they are often mis-intepreted: Plot a set of point-wise 95% CIs Interpret as confidence “band” These “bands” are too narrow!

Confidence Bands A band for which we are 100*(1-a)% confident that the survival function fails within the band for all t in some interval Two common approaches: Nair: equal probability bands Hall-Wellner bands

Comparison in BMT Data

Confidence Band Performance
The two approaches give very similar results Both generally have accurate coverage probabilities (even for n = 20) Confidence intervals or confidence bands? confidence bands for overall confidence about an estimated survival curve Confidence intervals for specific estimates of interest Median survival time 1-year survival rate

Hypothesis Testing: Log-Rank Type Tests

Comparing two or more samples
Anova type approach t is the largest time for which all groups have at least one subject at risk Data can be right-censored (and left truncated) for the tests we will discuss

More on the Hypothesis Test
Want to compare the distribution of event times between groups The hypothesis (for 2 groups) can be written as

Test for 2 Groups If and follow some parametric distribution (e.g. normal), this is easy Distribution of the event times across and within groups is rarely known Need a test whose validity doesn’t depend on parametric assumptions

Notation Let t1 < t2 < … < tT be distinct death times in all samples being compared At time ti , let dij be the number of events in group j out of Yij individuals at risk (j = 1,2,…,K) Those at risk and in the risk sets across all groups are

Developing a Hypothesis Test
Weighted comparison of estimated hazards between the jth groups under the null/alternative hypotheses Based on Nelson-Aalen estimator Remember: If the null is true, the estimated hazard across all groups should be similar to the estimate hazard for each of the j groups

Basic Form of the Test For unique times, find the weighted difference in hazard between group j and the overall hazard If all Dj(t)’s are close to zero, there is little evidence to reject the null Weight at ti # events jth group # at risk in jth grp # events all groups/ # at risk in all groups

General Test Statistic
The general test statistic is Can use Z-score or c 2 Corrects for ties

Log-Rank Test for 2 Groups
For log-rank W(ti)=1, the test statistic is

BMT Example Example categorical variables of interest
Use of Methotrexate (Yes/No) Disease Type ALL AML Low Risk AML High Risk Compare groups via log-rank test

Log-Rank Tests (BMT)

Beyond Log-Rank Log-rank has optimum power to detect differences when the hazard rates of our K groups are proportional So what is meant by proportional hazard rates?

Graphical Depiction Proportional hazards = hazard rates between groups is the same over time Group 1 Group 2

How to Check Proportional Hazards
We can estimate cumulative hazards (H(t)) and survival probabilities (S(t)) from log-rank results But to check proportional hazards we need the hazard rate (h(t)) The estimated slope of H(t) provides crude estimate of h(t) Smooth the estimated h(t) for each group and compare Estimate assuming a time x variable interaction Estimate assuming no time x variable interaction Let’s look at our different variables for BMT

Methotrexate Use

Disease Group

Another Example: Kidney Infection
Data on 119 kidney dialysis patients Comparing time to kidney infection between two groups Catheters placed percutaneously (n = 76) Catheters placed surgically (n = 43)

Kidney Infection: Log-Rank Results

Kidney Infection: Hazard Rates?

Beyond Log-Rank So what if our hazard rates are not proportional…
We’ve mentioned using other weight functions Depending on the choice of weight functions, we can place emphasis on different regions of the survival curve.

More on Weight Functions
Recall W(t) = 1 Log-rank test As we’ve said, this has optimal power for detecting differences when hazards are proportional

Choices for Weight Functions
Fleming-Harrington General case Special cases Log-rank: p=0 and q = 0 Mann-Whitney-Wilcoxon: p = 1, q = 0 q = 0, p > 0: gives greater weight to early departures p = 0, q > 0: gives greater weight to late departures Allows specific choice of influence (for better or worse!)

Kidney Data with Different Weights?
Log rank (p = q = 0): p-value = 0.112 Flemming-Harrington p = 0, q = 1: p-value = p = 1, q = 0: p-value = p = 1, q = 1: p-value = p = 0.5, q = 0.5: p-value = p = 0.5, q = 2: p-value = p = 4, q = 0: p-value =

Others? Many Not all available in all software
Note FH weights provide a lot of flexibility to mimic other weight functions Not all available in all software Worth trying a few in each situation to compare inferences

Take-home Most often, we are interested in the average difference (which log-rank addresses) If hazards are not proportional, difference may exist that log-rank misses Difference exist at early or late times Varying the weights allows exploration of this But, choice of weight function can be critical Think carefully about the distribution of weights

Brief Introduction to R

What is R R is a comprehensive statistical and graphical programming language Most functions are written in R But can interface with other langauges (C, C++, Fortran,…) R is used for data manipulation, statistics, and graphics

What R Provides R is open-source It provides
Data storage, analysis, and data visualization/graphing Contributed packages for statistical analysis Programming environment Provides access to algorithms and their implementation Gives you the ability to fix bugs and extend software Promotes reproducible research by providing open and accessible tools

R Overview R is highly functional
Data manipulation, statistical analysis, and graphics are done using functions Functions use strict arguments Functions in R from contributed packages/libraries About 25 package included with R installation CRAN repository includes >12,500 contributed packages We will use survival

R Overview R is an object oriented language, meaning R stores information in the R interface/ workspace as objects Everything in R is an object Assignment of objects is done using either “=” “<-” Example: If we want a variable x that has a numeric value of 5, the “code” would be x <- 5 or x = 5

R Objects Objects can be used in other calculations
There are some restrictions when giving an object a name can’t contain `strange' symbols like !, +, -, # dot (.) and underscore (_) are allowed can contain numbers but can’t start with a number case sensitive, X and x are two different objects

Types of Objects Numbers, characters, logical (TRUE/FALSE)
Vectors and matrices Think of Excel spreadsheet Data frames Similar to matrices but allows for mixed data types Arrays Lists Allows for single object to include Functions

R Workspace Objects created during an R session are held in memory
The collection of objects in a current R session is called the workspace. Stored in your working directory The location of your working directory accessed with getwd() Change the working directory using setwd(“location”) Example location- “S:\\wolfb\\SummerInstitute2018”

R Workspace The collection of objects in a current R session is called the workspace. workspace is not saved when you close your session unless you tell R to save This means that R objects are lost when you close R if you do not save workspace

R Functions Functions are available through packages
Recall CRAN repository includes >12,500 contributed packages It can be helpful to write your own functions Functions follow the form Function.name(argument1, argument2, …) Function is used by stating the function name and providing the necessary arguments

R Help All R functions have help documentation
Accessing help files is easy ?FunctionINeedHelpWith help(“FunctionINeedHelpWith”) help.search(“FunctionINeedHelpWith”) You can also view the function and examples getAnywhere(“FunctionINeedHelpWith”) example(“FunctionINeedHelpWith”)

Let’s Try R Out…

Cox Regression: Overview

Switching Gears The log-rank type tests provide a method for comparing survival distributions between groups However what if we want to control for other variables Stratified log-rank tests can be used Unable to detect interactions Assumes effect of the stratified variable is the same across strata Can’t be used for continuous variables This brings us to regression approaches

Cox Proportional Hazards Model
Names Cox regression Semi-parametric proportional hazards Proportional hazards model Multiplicative hazards model Why? Allows for adjustment of covariates (continuous and categorical) in a survival setting Allows prediction of survival based on a set of covariates Analogous to linear and logistic regression

Cox PHM Notation Data on n individuals: More complicated: Zj(t)
Tj : time on study for individual j dj : event time indicator for individual j Zj : vector of covariates for individual j More complicated: Zj(t) Covariates are time dependent May change with time/age

Basic Cox Regression Model
The outcome (i.e. dependent variable) is the hazard of the event at time t given variables X This outcome is regressed on the p (independent) variables of interest, X Model = baseline hazard x the sum of the p regression coefficients, b, times the observed values of X

Comments on the Basic Model
h0(t): Arbitrary baseline hazard Notice that it varies by t b: Regression coefficient vector Interpretation is a log hazard ratio Semi-parametric form Non-parametric baseline hazard Parametric form assumed for covariate effects only

Cox Proportional Hazards Model
Linear form Proportional hazards because

BMT Example: Methotrexate Use
Evaluate the relationship between time to death/relapse and methotrexate use (MTX) as graft vs. host prophylactic Form of the Cox Model: h(t| I(MTX) = h0(t)exp{bprI(MTX)} I(MTX) is an indicator of whether or not a patient received methotrexate The actual model determined from the data is h(t| I(platelet recovery) = h0(t)exp{0.398*I(MTX)}

Interpretation: Hazards Ratio
For any fixed time, individuals with platelet recovery have exp{bpr} times the risk of death compared to those who don’t.

Hazard Ratio So how do we interpret this model? Hazard ratio:
Interpretation: The hazard of the death or relapse for subjects who were given MTX is 1.5 times the hazard in those that did not. OR… There was an 50% increase in the hazard of death or relapse in patients who were given MTX.

Hazard Ratios Hazard function=P(event at time t|survived to time t)
Assumption: “proportional hazards” The risk does not depend on time That is “the risk is constant over time” But that is still vague… BMT example found a hazard ratio = 1.5 Patient given MTX had 50% greater risk of death relative to those not given MTX, at any given point in time Hazard function=P(event at time t|survived to time t)

Hazard Ratios Hazard ratio
The hazard ratio is assumed to be constant over time

Categorical Variables with > 2 Levels?
What if we have a categorical variable that has more than 2 levels? How is the hazard ratio estimated in this case? What about the proportional hazards assumption?

Disease Type and Survival in BMT
Recall we had three classifications of disease type ALL AML Low Risk AML High Risk Our model looks like

A Slightly More Complicated Example
This model provides 3 possible hazard rates

We can “compare” the different hazards between the groups by taking the ratio

Interpretation: Patients with low risk AML have 0.56 times the hazard of death or relapse relative to patients with ALL. Patients with high risk AML have 1.47 times the hazard of death or relapse relative to patients with ALL. Patients with high risk AML have 2.61 times the hazard of death or relapse relative to patients with low risk AML.

Continuous Covariates?
So far we’ve only examined categorical covariates (Methotrexate use and patient gender) What if we wanted to look at something like patient age at time of transplant? How is the hazard ratio estimated in this case? What about the proportional hazards assumption?

Patient Age and Risk of Death
Consider a model that includes Patient age at time of transplant (in years) Our model looks like

Hazard Ratio for Patient Age
In this case we estimate the hazard ratio for some specific amount of increase (or decrease) in patient age. The default increase for the estimated regression coefficient is 1 unit (in this case years) Interpretation: A 1 year increase in patient age at time of transplant is associated with a 1% increase in the hazard of death or relapse.

Hazard Ratio for Patient Age
A 1 year increase isn’t likely clinically meaningful but we can estimate the hazard for larger increases if we want For example, consider the impact of a 10 year increase in age Interpretation: A 10 year increase in patient age at time of transplant is associated with a 12% increase in the hazard of death or relapse.

Proportional Hazards? HR for a 25 year increase in Age
HR is still assumed to be constant over time for a specific difference in Age

More than 1 Covariate What if we had 2 binary covariates?
How is the hazard ratio estimated in this case? What about the proportional hazards assumption?

More than 1 Covariate Consider a model that includes
Methotrexate use (yes/no) Sex (male/female) Our model looks like

This model provides 4 possible hazard rates

And if we “compare” the different hazards by taking the ratio we get Interpretation: Patients given methotrexate had 1.44 times the hazard of death relative to those not given methotrexate, controlling for sex.

And if we “compare” the different hazards by taking the ratio we get Interpretation: Male patients had 0.84 times the hazard of death relative to females, controlling for methotrexate use.

But what does this mean in terms of proportional hazards? MTX, Female MTX, Male b1 b2 b1 No MTX, Female No MTX, Male b2

What About Interactions
We might also be interested in interactions between covariates In the BMT data, we have information on patients and donors and may want to consider interactions between these two Patient Sex by Donor Sex Patient Age by Donor Age

Models with Interactions
For these interactions our model looks like So how do we interpret these models We have to consider both patient and donor characteristics to estimate a hazard ratio

Patient and Donor Sex Interaction
Most interested in differences by patient sex so fix donor sex to interpret the hazard in male vs. female patients given male/female donor HR for male vs. female patients given a male donor HR for male vs. female patients given a female donor

Patient and Donor Age Interaction
Interactions between continuous variables are harder to understand/interpret As in the case of gender, we are most interested in the impact of increasing patient age but how do we “fix” donor age Consider plotting the hazard ratio for a specific increase in patient age (e.g. a 5 year increase) by donor age

HR for a 5 year increase in patient age with increasing donor age

Time-Varying Covariates
So far we’ve only considered “fixed” time covariates However, the BMT data include several time varying covariates Acute graft vs. host disease (AGvHD) Chronic graft vs. host disease (CGvHD) Platelet recovery (PR) These all occur after BMT or not at all

CPHM with Time Varying Covariates
The model looks like what we’ve been working with. The model looks similar but now we allow some of our covariates in X to be a function of t:

Data Structure: Time-Varying Covariates
Expand data to describe all scenarios Need to consider the possible combinations of events Think about the long format data we discussed in the beginning

Data Structure: Time-Varying Covariates
Example: Time of Platelet Recovery and Death Possible scenarios at any point in time during the study for subject 1 No PR: Death/Relapse? PR: Death/Relapse? For all patients with Time PR < Time Death, need two rows in dataset to describe variation For all patients with Time PR > Time Death, need only one row in the dataset

Timeline Examples: Observed Event
t0 to ta: no Platelet Recovery until ta, no event ta to te: Platelet Recovery, event t0 to te: no Platelet Recovery , event t0 ta te t0 te

Timeline Examples: Censored Event
t0 to ta: no Platelet Recovery , no event ta to tc: Platelet Recovery, no event (censored) t0 to tc: no Platelet Recovery, no event t0 ta tc te t0 tc te

Time-Varying Covariates
Estimation and inference are the same as with fixed time covariates Difference Data structure

Data Set-up Consider the BMT data with information on death/relapse, platelet recovery, and acute and chronic graft vs. host disease ID TTE DFS Time PR PR TAGvHD AGvHD 1 2081 13 67 2 1602 18 3 1496 12 4 1462 70 5 1433 6 1377 7 1330 17 8 996 72 …

Expansion Consider subject 1 (row 1) Consider row 2 Now, three rows
Row 1: start = 0, stop = 13, PR = 0, AGvHD = 0,DFS = 0 Row 2: start = 13, stop = 67, PR = 1, AGvHD = 0,DFS = 0 Row 3: start = 67, stop = 2081, PR = 1, AGvHD = 1,DFS = 0 Consider row 2 Now 2 rows Row 1: start = 0, stop = 18, PR = 0, AGvHD = 0,DFS = 0 Row 2: start = 18, stop = 1602, PR = 1, AGvHD = 0,DFS = 0

Expanded Data ID Start Time Stop Time Event PR AGvHD 1 13 67 2081 2 18
13 67 2081 2 18 1602 3 12 1496 4 70 1462 …

What About Dependence? You might be asking whether we need to worry about correlated data? In this case we do not need to worry about it. There two exceptions: When subjects have multiple events When a subject appears in overlapping intervals The 2nd case is almost always a data error

Cox Model with Time Varying Covariate
Consider a model that includes Time to platelet recovery Note the model looks like

Hazard Ratio for Platelet Recovery
We still estimate a hazard ratio in the same way Interpretation: Patients who experience platelet recovery at a given time have 0.28 times the hazard of death or relapse relative to those who have not experienced platelet recovery at that time.

What About Continuous Covariates
Continuous variables can change over time also Given the times measurements are taken, we can expand the data in the same way. However, this assumes the value is unchanging during the intervals between which it was measured A little unrealistic BUT… No different from treating a single measure (e.g. blood pressure) as a fixed time covariate

Inference in Cox Regression

Global Tests of the Model
Testing that bk = 0 for all k = 1, 2, …, p Three main tests Chi-square/ Wald test Likelihood ratio test Score test All three are chi-square distributed with p degrees of freedom All likelihood based Note the likelihood is a partial likelihood as it does not estimate baseline hazard

BMT Example: Disease Group
The model including disease group is shown below The global p-values for this models are Likelihood ratio test: p = Wald Test: p = Score Test: p =

Local Tests of Covariates
The global test assesses whether at least one covariates is associated with outcome Local tests can be used to Test associations of individual covariates while accounting for other covariates Compare nested model to determine if a group of covariates is significant (chunk test)

Results of local tests of associations of individual covariates are provided with model fit Continuous covariate: significance suggest a 1 unit increase gives a HR ≠ 1, controlling for other covariates Categorical covariate: significance suggests the HR ≠ 1 compared to the reference group, controlling for other covariates

Recall the global test indicated disease group as significant (so at there is a difference in at least two of the groups) The model and corresponding local tests for the Cox model including disease group with ALL as the reference are shown here Disease Beta SE(Beta) Z score P-value AML low risk (vs. ALL) -0.574 0.287 -2.00 0.046 AML high risk (vs. ALL) 0.383 0.267 1.43 0.152

Local tests of associations of a group of covariates compares a model will all covariates to a model without the subset of covariates being evaluated Comparison between models is based on the difference in the log-likelihood for each model Means we are using likelihood ratio test 2(log-likelihood big model – log-likelihood smaller model) Degrees of freedom is the number of covariates being tested

BMT Example Compare the following models
Model including disease group, patient age, donor age, and the interaction between patient and donor age Model including only disease group. Model Log-Likelihood Dx group, Pt age, Dn age, Pt x Dn age interaction -361.5 Dx group -366.6 Likelihood ratio test statistc (df=3) 10.1 P = 0.017

Linear Contrasts Local tests are estimated for regression coefficients
For categorical variables with >2 levels, local tests (from the model) only compare to the reference group BMT: if ALL is the reference, local tests tell us about AML low risk vs. ALL AML high risk vs. ALL We also want to compare AML low risk vs. high risk Can be done using contrasts

Linear Contrasts In the disease group example, we want to compare AML high and low risk patients To do this, we want to test the hypothesis H0: bAML.low = bAML.hi vs. HA: bAML.low ≠ bAML.hi This can be done using a linear contrast Take the difference: bAML.low - bAML.hi We reject the null of this difference (accounting for variability) is greater than or less than 0.

Our local tests found Hazard for AML low risk < hazard ALL But found no difference between AML high risk and ALL Comparing AML low and high risk groups using linear contrast finds bAML.low - bAML.hi = (aka HR = 0.38) SE(bAML.low - bAML.hi ) = 0.27 p =

Confidence Intervals for Hazard Ratios
We’ve learned how to estimate a hazard ratio from a Cox model Confidence intervals for HRs are also often reported Could also think of this as another form of hypothesis test If the interval doesn’t include 1, no significant difference Easy for individual covariates in a model

Examples: 95% CI for HR for Methotrexate 95% CI for 1 unit increase in Age

If we wanted to consider more than a 1 unit increase in a continuous covariate Example: 95% CI for 5 unit increase in Age

Complex Hazard Ratios What if we want to estimate hazard ratios that involve more than one covariate AML high risk vs. low risk (includes 2 coefficients) Interactions require us to include >2 coefficients

Complex Hazard Ratios The confidence interval in these cases is calculated the same way But now b = combination of covariates in our HR expression SE(b ) = standard error for the combination of covariates If we have these two things, we an calculate the confidence interval as above

Complex Hazard Ratios Patient Age x Donor Age example
Interactions require us to include >2 coefficients Our combination of b is: Our SE(b ) is So our final HR (95% CI) is

Building a Cox Regression Model

Model Building using PHM
Regression uses Adjusting for potential confounders when a specific hypothesis is to be tested Predict the distribution of the time to some event from a list of explanatory variables (“prediction”)

Adjustment Approach First, perform “unadjusted” analysis of factor of interest with time to event outcome Include factor of interest and significant confounders One at a time “full” model Use some criterion (e.g. p-values) to determine if confounders should be included/retained or not

Prediction Approach Interested in identifying set of variables to model survival (or use as “base” model for future testing) Look at univariate association with time to event for individual predictors Select a subset of predictors for consideration Note in both approaches, we might want to consider potential interactions

Most Common Approach: Subset Selection
Forward entry Start with most significant Continue until criterion is met (e.g. p > 0.05) Entered variables not reevaluated Backward elimination Put all variables in model that meet some criteria (e.g. p <0.20) Remove least significant variables and repeat until a criterion is met (e.g. p > 0.05) These will not be reconsidered again Stepwise selection

Problems with Subset Selection
Methods can be systematic but still may not provide a clear best subset Post-hoc so generally can’t be replicated easily Subset selection is also discontinuous As a result subset selection is often unstable and highly variable, particularly when p is large small changes in the data can results in very different estimates

Other Approaches Information criteria
AIC = Akaike information criteria (1973), where p = number of parameters in the model BIC = Bayesian information criteria (Schwartz, 1978) Choose the model with the smallest AIC/BIC

Identifying “Best” Predictive Model
P-values aren’t useful for determining if model is best predictor Why? Covariates with significant p-values can add more “noise” than signal to performance Other approaches R-squared type statistic “Absolute prediction error” C-index/Somer’s Dxy

R2 Approach Sensitive to the proportion of censored observations
Due to censoring, measure is not as straightforward as in linear regression Compares fraction of the likelihood explained by the model relative to the likelihood for a perfect model Sensitive to the proportion of censored observations Can decrease substantially as % censored increases R2 values can decrease by up to 20% for > 50% censoring

Absolute Prediction Error
Expected value of the absolute value between observed and predicted responses More interpretable than others (e.g. those that use squared differences vs. absolute value) Makes the critical difference between association and prediction

C-Index Proportion of all pairs of subjects whose survival times can be ordered such that the subject with the higher predicted survival also the one whose survival is larger Think of as the probability of concordance/agreement between observed and predicted Relationship to Somer’s D

Example of Model Building: BMT
Suppose we want to find the best predictive model for time to death or relapse….

ALL/AML example Possible Variables:
Disease group Methotrexate use Patient/Donor age and sex Platelet recovery Also considering the between Patient age x Donor Age

Global Tests for Each Variable in “Univariate” Models
Global p Disease group 0.0012 Platelet recovery 0.0005 Dx Group, Platelet Recovery (no intx) <0.0001 Dx Group, Platelet Recovery, Intx <0.0001, 0.042 Methotrexate 0.094 Patient Age (no interaction) 0.340 Donor Age (no interaction) 0.249 Patient Age, Donor Age, Intx 0.0037

Final Model via Backward Selection (based on p-values)
Variable b SE(b ) p AML low risk (vs. ALL) -0.622 0.304 0.041 AML high risk (vs. ALL) 0.148 1.610 0.614 Platelet recovery -1.06 0.346 0.002 Patient Age -0.077 0.034 0.024 Donor Age -0.072 0.030 0.017 Patient Age x Donor Age 0.003 0.0009 Global P <0.0001

Final Model via Forward Selection (based on p-values)
Variable b SE(b ) p AML low risk (vs. ALL) -0.622 0.304 0.041 AML high risk (vs. ALL) 0.148 1.610 0.614 Platelet recovery -1.06 0.346 0.002 Patient Age -0.077 0.034 0.024 Donor Age -0.072 0.030 0.017 Patient Age x Donor Age 0.003 0.0009 Global P <0.0001

Model Comparisons Based on AIC
Models AIC Disease group 737.1 Dx group + MTX 737.2 Dx group + Platelet Recovery 729.2 … Dx group + PR + Pt Age + Dn Age 732.7 Dx group + PR + Pt Age + Dn Age + Dx x PR 730.9 Dx group + PR + Pt Age + Dn Age + Pt x Dn Age 726.6 Dx group + PR + Pt Age + Dn Age + Dx x Pr + Pt x Dn Age 725.8 Dx group + PR + MTX +Pt Age + Dn Age + Dx x Pr + Pt x Dn Age 726.3

R2 and C-Index for Select Models

Approach Depends on the Goal
Stepwise selection most common approach But as we mentioned there are numerous issues For model parsimony AIC (and BIC) are good choices Nice in that they allow for comparisons of non-nested modeld BIC more conservative (but not readily available in R) If prediction is the goal, consider examining both R-squared and C-index

Model Diagnostics

Regression Diagnostics
Most interested in testing proportional hazards assumption Also looking for functional form of covariates Two types of methods Graphical approaches Regression approaches (sometimes combined with graphical approaches)

Graphical Approaches Pretty pictures are nice and can be intuitive but… We generally prefer a statistical means of determining if an assumption is true This leads us to regression approaches

Residuals A number of residuals are available for Cox models
Schoenfeld residuals are useful to assess: Model fit Model assumptions Grambsch and Therneau developed a statistical test that provides a p-value for these residuals Assesses multivariable Cox models for Global proportional hazards Proportional hazards for individual covariates

Assessment of Proportional Hazards/Functional Form
Variable c2 p AML low risk (vs. ALL) 0.008 0.927 AML high risk (vs. ALL) 1.112 0.292 Platelet recovery 1.655 0.684 Patient Age 0.974 0.324 Donor Age 4.882 0.027 Patient Age x Donor Age 2.943 0.086 Global P 0.362

Schoenfeld Residuals We can plot the scaled Schoenfeld residuals by time to assess where any issues may be So focus on those variables that were identified by the Grambsch-Therneau test When the proportional hazards assumption holds, the residuals should follow an approximate straight line

Scaled Schoenfeld Residual Plots

MTX and Time to Death/Relapse
We dropped MTX from our Cox model But we showed graphically that proportional hazards might be an issue… We might reexamine this in our model…

Proportional Hazards with MTX
Variable c2 p AML low risk (vs. ALL) 0.435 0.509 AML high risk (vs. ALL) 2.817 0.093 Platelet recovery 0.010 0.919 MTX 0.654 0.419 Patient Age 5.467 0.019 Donor Age 5.707 0.017 Patient Age x Donor Age 3.507 0.061 Global P 0.091

Schoenfeld Plots Again

How Can We Address the Problem?
Impose a time-dependent covariate into the model General idea: If PHM is valid, time-dependent covariate will not be significant If time-dependent covariate is significant, then there is “something” going on in terms of the HRs that varies over time

Introduce Time Dependent Covariate
Create an new variable Z2(t) = Z1×g(t), where g(t) is a function of time We don’t know the functional form of g(t) Two good possibilities are

Change Point Model Choosing gives us with a change point at t
Fit proportional hazards model with Z1 and Z2(t) Including a change point allows the HR to change after a specified time

Change Point Model We could also code our change point model with the time dependent covariate This coding yields an equivalent model but interpretation is simpler and so this may be preferred

Determining t How do we select a change point for the relative risk? Where is the best change point? Calculate log likelihood at each event time where t represents specific event times Choose t that yields the largest log-likelihood

New Time Dependent Covariate
Fit proportional hazards model with Z1, Z2(t), and estimate b1, b2 Test local hypothesis: H0: b2 = 0 vs. HA: b2 ≠ 0 Rejecting H0, reinforces that proportional hazards is doesn’t hold Do this for each covariate in question

Time Varying Covariates for MTX
Start with the simplest version: Interaction of MTX with stop time (g(t) = t) The proportional hazards check looks good Variable c2 p AML low risk (vs. ALL) 0.0004 0.984 AML high risk (vs. ALL) 1.287 0.257 Platelet recovery 0.009 0.923 MTX x Time 0.015 0.901 Patient Age 0.242 0.622 Donor Age 3.917 0.048 Patient Age x Donor Age 1.551 0.213 Global P 0.501

Change Point for MTX Consider a change-point at 466 days
The proportional hazards check looks even better Variable c2 p AML low risk (vs. ALL) 0.300 0.584 AML high risk (vs. ALL) 2.538 0.111 Platelet recovery 0.786 0.375 Z1 = MTX if Time > 466 0.175 0.676 Z2 = MTX if Time < 446 0.540 0.462 Patient Age 0.072 0.788 Donor Age 2.572 0.109 Patient Age x Donor Age 0.542 Global P 0.389

Model with Change-Point for MTX
Variable b SE(b ) p AML low risk (vs. ALL) -0.465 0.319 0.144 AML high risk (vs. ALL) 0.264 0.309 0.394 Platelet recovery -1.240 0.362 <0.001 Z1 = MTX if Time > 466 -2.544 1.019 0.013 Z2 = MTX if Time < 446 1.219 0.302 Patient Age -0.038 0.034 0.277 Donor Age -0.049 0.033 0.134 Patient Age x Donor Age 0.0015 0.0009 0.107 Global P

Interpretation of MTX? Up to 466 days, individuals treated with methotrexate have 3.4 times the hazard of death or relapse relative to those not treated, controlling for other factors. However, after 466 days, those treated with methotrexate are at significantly reduced risk of death or relapse, controlling for other factors. If you go back and look at the hazard rate plots Hazards crossed at about 460 days

Other Residuals for Diagnostics
There are several other types residuals Martingale residuals Can be used to assess function form (i.e. identify if variable transformations are needed/help) Continuous or ordinal variables only Deviance residuals Used to investigate outlier

Final Thoughts All methods we discussed are for right censored, left truncated data Parametric methods can be used for interval and left censored There are alternative (complex) semi-parametric approaches that have been developed also

Final Thoughts We also assumed censoring was independent of the event
Consider if we had wanted to look at time relapse instead of relapse or death In this case, it seems unlikely that death and relapse are independent Competing risk analysis is an alternative that can be used in such cases

MUSC Summer Institute May 24-25, 2018 Dr. Bethany Wolf

Similar presentations

Presentation on theme: "MUSC Summer Institute May 24-25, 2018 Dr. Bethany Wolf"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MUSC Summer Institute May 24-25, 2018 Dr. Bethany Wolf

Similar presentations

Presentation on theme: "MUSC Summer Institute May 24-25, 2018 Dr. Bethany Wolf"— Presentation transcript:

Similar presentations

About project

Feedback