Download presentation
Presentation is loading. Please wait.
1
Survival Analysis: From Square One to Square Two
Yin Bun Cheung, Ph.D. Paul Yip, Ph.D. Yin Bun Cheung is a Biostatistician at the National Cancer Centre Singapore. His academic interests include mortality models, analysis of survival time data and count data. Paul Yip is a Senior Lecturer in statistics at the University of Hong Kong. His main interests include capture-recapture methodology and statistics as applied in medicine. Readings
2
Lecture structure Basic concepts Kaplan-Meier analysis Cox regression
Computer practice Survival analysis is central in medical statistics. Not only because survival is an important medical concern, but also because survival analysis can be used to analyse data of non-fatal outcomes that are otherwise not analysable. Some examples of such non-fatal outcomes include time to tumor recurrence and age at achievement of developmental milestones. This lecture will begin with an introduction to some of the basic concepts and examples of survival analysis. Two popular survival analysis techniques are then introduced: Kaplain-Meier analysis and Cox regression. The former is often seen as a basic technique, whereas the latter is relatively advanced. In cancer clinical trials these two techniques are almost the standard techniques. Finally we will analyse a real data set. We will provide a hyperlink to a resource of data sets and we encourage readers to practise on real data.
3
What’s in a name? time-to-event data failure-time data censored data
(unobserved outcome) The name “survival analysis” is slightly misleading. It is a technique for analysing “time-to-event” or “failure-time” data, which may or may not be related to survival and death in the usual sense. For brevity we will use the terms survival and death in this lecture. The value of a survival time variable must be larger than zero, i.e. a zero or a negative value is not allowed. A common problem in this type of data is censoring, meaning that the time-to-event is not observed.
4
Types of censoring loss to follow-up during the study period
study closure In medical studies censoring is common and usually need be handled by survival analysis techniques. There are two major sources of censoring: (1) A patient is lost to follow-up. It is known that the patient is alive at the last contact, but his subsequent survival status is not known. Time to last contact will be taken as censoring time. (2) A clinical trial or an epidemiological study is closed after a fixed study period. Some of the patients/subjects are alive at the time of study closure. Time to study closure will be taken as censoring time. Although the exact time of the outcome event is not known, the fact that it does not precede the censoring time is a piece of useful information. Survival analysis techniques utilise this information.
5
Examples of survival analysis
1. Marital status & mortality 2. Medical treatments & tumor recurrence & mortality in cancer patients 3. Size at birth & developmental milestones in infants Example 1 is an epidemiological study of the British population. Marital status was enumerated in a baseline survey in Time from baseline survey to death was analysed. Data censored at May 1997 when the mortality dataset was last updated. (Cheung, 2000) Example 2 is a clinical trial of medical treatment regimes in breast cancer patients in Germany started in The primary endpoints were time from mastectomy to tumor recurrence and to death. Censored at mid 1997 when the data was analysed. (Sauerbrei et al., 2000) Example 3 is a paediatric study of Pakistani infants. The primary concern was age at achievement of developmental milestones in relation to size at birth. Data censored at 24 months of age when the follow-up for each newborn was ceased. (Cheung, Yip, Karlberg, 2001)
6
Why survival analysis ? Censoring (time of event not observed)
Unequal follow-up time If the study period is long enough to observe the survival time of all subjects, as in some animal experiments, one may prefer to use more common methods, such as the t-test and ordinary least square regression, to analyse survival time as a continuous variable. However, in studies of human subjects there often is censoring and the outcome cannot be analysed by the usual methods for continuous data. Subjects are often censored at different times, leading to unequal follow-up time. Analysis of the probability of survival during the study period as a dichotomous variable (alive vs dead), e.g. by a Chi-square test, would fail to account for this non-comparability between subjects.
7
What is time? What is the origin of time?
In epidemiology: Age (birth as time 0) ? Calendar time since a baseline survey ? There are different ways to measure the timing of an event. In example 1 (marital status and mortality) the outcome variable was calendar time since the baseline survey. An alternative is to analyse age at death (time since birth). Since the subjects were interviewed at different ages, the “entry age” would differ. In theory using age as the time scale (with different entry age) is often preferred in such an epidemiological study, but the actual advantage depends on situations. See Korn et al. Time-to-event analysis of longitudinal follow-up of a survey. <a href=“ J Epidemiol</a> 1997; 145: 72-80; and their commentators Ingram et al. <a href=“ J Epidemiol</a> 1997; 146:
8
What is the origin of time?
In clinical trials: Since randomisation ? Since treatment begins ? Since onset of exposure ? Some clinical trials take the first day of treatment as origin of survival time. Some use the day of randomisation, which precedes the treatment. In example 2 (breast cancer study) the researchers used day of mastectomy, which preceded randomisation and marks the onset of exposure to the risk of tumor recurrence. Which of them is better?
9
The choice of origin of time
Onset of continuous exposure Randomisation to treatment Strongest effect on the hazard Allison (1995) proposed 3 criteria: 1. Onset of exposure: A cancer patient cannot have tumor recurrence until the tumor(s) has been removed. In example 2 mastectomy marked the onset of exposure to the risk of breast cancer recurrence. 2. Randomisation to treatment:In clinical trials the rule of thumb is to take randomisation as the origin. This criterion usually overrides criterion 1. Survival time prior to randomisation is not relevant to a clinical trial. Furthermore, a rule of thumb is “Once randomised, always analyse”. If a patient dies between randomisation and start of treatment, time since randomisation should still be analysed. 3. Strongest effect: Time since removal of tumor is usually more related to tumor recurrence than age. So the former is preferred. Time since a baseline community survey tend to be less related to mortality than age. So the latter is often preferred. Allison PD. Survival Analysis using the SAS® System. Cary, NC: SAS Institute, 1995:
10
Types of survival analysis
1. Non-parametric method Kaplan-Meier analysis 2. Semi-parametric method Cox regression 3. Parametric method There are three major types of survival analysis techniques, differing in the assumptions that need to be made. As a metaphor, a pair of trousers is a parametric model. More exactly it is a 2-parameter model, one on waist circumference and one on leg length. In contrast, a skirt with an elastic waist is a non-parametric model. This is unlikely to fit well but it never fails badly. If you don’t know or if you are unwilling to guess the body size (the data), you can buy a skirt with an elastic waist (a non-parametric model). If you are confident that you know the body size quite well, you can buy a pair of trousers (a parametric model) and get a better fit. A semi-parametric model is a trade-off between the two extremes. An introductory text covering all three types of techniques plus a medical focus is: Collett D. Modelling Survival Data in Medical Research. London: Chapman & Hall, 1994.
11
Square 1 to square 2 This lecture focuses on two commonly used methods
Kaplan-Meier method Cox regression model In experimental studies the most commonly used survival analysis technique is likely to be the (non-parametric) Kaplan-Meier method (1958). In epidemiology the most popular one is the (semi-parametric) Cox regression model (1972). Very often in cancer clinical trials the primary comparison between treatment and control groups is based on the Kaplan-Meier method. A secondary analysis with adjustment for covariates based on the Cox regression model is then reported (see example 2). Example 1 and 3 used the Cox model and a parametric model, respectively. Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Asso 1958; 53: Cox DR. Regression models and life tables (with discussion). J Royal Stat Soc B 1972; 34:
12
KM survival curve * d=death, c=censored, surv=survival
Calculating a Kaplan-Meier (KM) survival curve: Hypothetically there are four subjects in the control group of a clinical trial. There are two deaths (day 2 and day 4) and two censored cases (day 3 and day 5). At the beginning all four subjects are at risk of death. In day 2 one out of four at risk subjects die, giving a probability of death Pt (death)=0.25 and reducing the number at risk to three for the next day. One subject is lost to follow-up in day 3; the censoring is assumed to happen at the end of the day. So in day 4 the number at risk is two. One out of two at risk subjects dies, so Pt (death)=0.5. Pt(surv), the probability of survival, is equal to 1-Pt(death). The cumulative survival probability, S(t), is initially 1. Multiplying S(t) and Pt+1(surv) gives S(t+1). For instance, S(4)=S(3)P4(surv)=0.750.50=0.38, rounded to two significant digits. The KM method utilises information from censored cases prior to their censoring time. Also see <a href=“ 12 of Statistics at Square one</1>. * d=death, c=censored, surv=survival
13
KM survival curve The Kaplan-Meier (KM) survival curve from the hypothetical example is plotted. It is a step function: horizontal when there is no death and drops vertically down when there is a death. If the sample size is large and the follow-up time is measured in small interval of time, the KM survival curve will look roughly smooth. Some researchers may add symbols or numbers on the figure to indicate the timing of censored observations and the number of at risk subjects.
14
Total expected death in group A: EA = EAi
No. of expected deaths Expected death in group A at time i, assuming equality in survival: EAi =no. at risk in group A i death i total no. at risk i Total expected death in group A: EA = EAi The Kaplan-Meier method produces survival curves. To test the null hypothesis that two survival curves are identical, e.g. in a clinical trial with a control group (group A) and a treatment group (group B), the log rank test is commonly used. In the slide we use the subscript i to indicate “at time i”, when a death is observed. The number of expected deaths in group A at time i (EAi) is equal to the number of at risk subjects in group A at time i divided by the total (groups A & B) number of at risk subjects at time i, multiplied by the total number of deaths at time i. EBi is calculated in the same way. Total expected deaths in group A and B (EA & EB) are then calculated by summing up the expected numbers at different times (i.e., EAi & EBi).
15
Log rank test A comparison of the number of expected and observed deaths. The larger the discrepancy, the less plausible the null hypothesis of equality. The log rank test is based on a comparison of the observed and expected numbers of death, with the expected number calculated under the assumption of no difference in survival between groups. EA and EB are calculated as shown in the previous slide. OA and OB are the actual, observed numbers of deaths in group A and group B. The log rank test statistics is compared with the critical values of a Chi-square distribution. The larger the discrepancies between the observed and expected number of deaths in the groups, the more likely the log rank test statistics will be larger than the critical value of a Chi-square distribution with 1 degree of freedom (in the case of two groups); and therefore more likely to reject the null hypothesis of no difference.
16
An approximation The log rank test statistic is often approximated by
X2 = (OA-EA)2/EA+ (OB-EB)2/EB, where OA & EA are the observed & expected number of deaths in group A, etc. There is a simple approximate of the log rank test statistic, as shown in the slide. The actual formula, as implemented by most statistical packages, is more complicated. See, e.g., Fisher LD, van Belle G. Biostatistics: A Methodology for the Health Sciences. N.Y.: Wiley, 1993: Chapter 16.
17
Proportional hazard assumption
S(t) Time 5 10 15 20 .2 .4 .6 .8 1 S(t) Time 5 10 15 20 .2 .4 .6 .8 1 The log rank test is usually taken as the default method. It is appropriate when the relative mortality does not change with time. This is known as the “proportional hazard” (PH) assumption. The figure on the left illustrates the survival functions, S(t), when PH assumption is true. In the figure on the right, suppose the blue line represents the survival of patients who have received a drug treatment, and the red line patients who have received a surgical treatment. Immediately after surgery there is a chance of infection and complication so the relative mortality is high initially. So the S(t) drops sharply. Gradually surgical patients may have a lower relative mortality; and the survival curve becomes less steep. This violates the PH assumption. The log rank test would tend to conclude that there is no difference. The Breslow test (Wilcoxon test) gives more emphasis to early deaths and is preferred in this situation. Breslow NE. A generalized Kruskal-Wallis test for comparing k samples subject to unequal patterns of censorship. Biometrika 1970; 57: Log rank test preferred (PH true ) Breslow test preferred (non-PH)
18
Risk, conditional risk, hazard
Suppose you followed 100 patients for a fixed period of 3 months. The numbers of deaths in the first, second and third month are 40, 20 and 10, respectively. The blue line indicates the risk of death during the three periods, i.e. 40/100=0.4, 20/100=0.2 and 10/100=0.1, respectively. The red line indicates the risk of death in a time interval conditional upon a subject has survived to the beginning of that interval (conditional risk), i.e. 40/100=0.4, 20/60=0.33 and 10/40=0.25. For a large sample size, as the measurement unit for time becomes small, e.g. one quarter of a month instead of one month, the conditional risk divided by the width of time interval, e.g month in the present case, will approach a smooth curve (green line). This is known as the hazard function. A hazard is the conditional risk of failure in an extremely small time interval divided by the width of the time interval. The concepts of hazard, hazard function and proportional hazard are very important in survival analysis.
19
Another look of PH Log rank test preferred (PH true )
Hazard Hazard There is a mathematical relation between a survival function and a hazard function (e.g. Fisher & van Belle,1993: Chp. 16). The survival functions in slide 17 give hazard functions that look roughly like those shown in this slide. On the left panel, the two hazard functions have roughly the same patterns. A lower hazard function (blue line) indicates a survival function that is decreasing less sharply. The ratio of hazard given by the blue line divided by that of the red line is approximately 0.6 at any time; hence proportional hazard is true. On the right panel, initially the red line is above the blue line, but they cross over at about time 12. The hazard ratio moves from below one (blue group better) to above one (blue group worse), violating the proportional hazard assumption. 5 10 15 20 5 10 15 20 Time Time Log rank test preferred (PH true ) Breslow test preferred (non-PH)
20
Cox regression model Handles 1 exposure variables.
Covariate effects given as Hazard Ratios. Semi-parametric: only assumes proportional hazard. The Cox model can perform multiple regression analysis of survival time data, i.e. more than one independent variables. A treatment effect and covariate effects are produced in terms of the log hazard ratio. In research report this is usually exponentiated to give the hazard ratio (HR), sometimes referred to loosely as relative risk (RR).
21
Cox model in the case of a single variable
. hi(t) = hB(t) exp(BXi) . hj(t) = hB(t) exp(BXj) . hi(t)/hj(t) =exp[B(Xi-Xj)] exp(B) is a Hazard Ratio Let hi(t) and hj(t) be the hazard functions for subjects with value i and j on variable X. B is a regression coefficient to be estimated. In a clinical trial, for instance, i=1 for a treatment group and i=0 for a control group. hB(t) is an unspecified baseline hazard function. When the hazard ratio (HR) is estimated, the hB(t) in the numerator and denominator cancels out itself, i.e. HR=hi(t)/hj(t)=exp(BXi)/exp(BXj)=exp[B(Xi-Xj)]. Since Xi is coded as 1 (treatment) and Xj as 0 (control), B indicates the treatment effect in terms of a log HR. The Cox model is semi-parametric in a sense that it is not concerned with the pattern of the baseline hazard, hB(t), but it assumes the same pattern of hB(t) in the numerator and denominator. The estimation is based on a method called maximum partial likelihood (Cox, 1972).
22
Test of proportional hazard assumption
Scaled Schoenfeld residuals Grambsch-Therneau test Test for treatmentperiod interaction Example: mortality of widows If the impact of an independent variable meets the proportional hazard assumption, the smoothed values of a quantity called scaled Schoenfeld residuals would be roughly horizontal when plotted against survival time. Grambsch and Therneau (1994) demonstrated that a test of non-zero slope in a weighted regression of the residuals upon time can test for non-proportional hazard. Test for PH and graphical examination of scaled Schoenfeld residuals may identify important information. See example 1 (marital status and mortality) for a case. Another method is to split the survival time into two or three periods. A test of interaction between treatment and periods will test the proportional hazard assumption. Grambsch PM, Therneau TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 1994; 81:
23
Computer practice A clinical trial of stage I bladder tumor
Thiotepa vs Control Data from StatLib It is important to practise on some real datasets. One useful resource is <a href=“ </a>. Here we analyse the TUMOR data set contributed by Terry Therneau. One reason of choosing this data set is that it is small enough for easy handling (n=86). The purpose is purely for computer practice; not to examine the quality or findings of the study. The bladder tumor data file contains 8 variables (names): treatment group (group), follow-up time (futime), pre-treatment number of tumors (number), largest pre-treatment tumor size (size), and times to first, second, third, and fourth recurrences. Only time to first recurrence is analysed in this practice. We used a word processor to edit the raw file so that each row represents one subject, each line contains only 5 values (removing the last 3 recurrences). If the fifth value was left blank (meaning no recurrence) we replace the blank by a dot (.), preceded by a space. All comments and description in the file were also removed. The file is saved as c:\data\tumor.dat (a text /ascii file).
24
Data structure Two most important variables:
Time to recurrence (>0) Indicator of failure/censoring (0=censored; 1=recurrence) (coding depends on software) We use the statistical package <a href=“ for the illustration. Readers can use other software packages. Five Stata commands are used: infile group futime number size rectime using c:\data\tumor.dat (this command reads the data) gen fail=1 if rectime!=. (create censoring indicator value=1 if recurrence observed) replace fail=0 if rectime==. (replace censoring indicator value=0 if no recurrence) replace rectime=futime if rectime==. (replace missing censor time by follow-up time) stset rectime, failure(fail) (declare the data as survival data) One subject with zero follow-up time was automatically excluded by Stata’s stset command. User of other software may need to remove this subject explicitly. A censored case with zero follow-up time is not informative. A failure at time zero is not logical: one fails at least a very short while after getting started.
25
KM estimates Thiotepa Control The Stata command sts graph, by(group)
produces the above Kaplan-Meier estimates. (The figure has been edited for appearance.) The Thiotepa group had longer time to recurrence. The median survival time can be read from the figure, where S(t) = 0.5. The medians for the control and thiotepa groups are 16 and 26 months, respectively. The survival curves are roughly parallel, suggesting that an assumption of proportional hazard is acceptable.
26
Log rank test chi2(1) = 1.52 Pr>chi2 = 0.22
To test for equality of survival functions, the log rank test is used. The Stata command sts test group produces the table (values rounded; table edited for appearance). Based on the null hypothesis of no difference, there are more failures than expected in the control group and less than expected in the Thiotepa group. The test statistics value is 1.52, which is not significant at the conventional 5% level(P=0.22). chi2(1) = Pr>chi2 =
27
Cox regression models Using a Cox regression model with treatment group as a single variable, the Stata command stcox group showes that Thiotepa is associated with a statistically insignificant reduction of hazard (HR=0.7; P=0.23). There is some baseline imbalance in pre-treatment number of tumors: the means are 1.9 and 2.3 in the control and Thiotepa groups, respectively. To adjust for this imbalance of risk factor, the following command is used: stcox group number Having adjusted for pre-treatment number of tumors, the treatment effect become stronger (HR=0.60; P=0.11).
28
Grambsch-Therneau test
Test of PH assumption Grambsch-Therneau test for PH in model II Thiotepa P=0.55 Number of tumor P=0.60 The following Stata commands test the hypothesis of proportional hazard in model II: stcox group number, scaledsch(sca*) schoenfeld(sch*) stphtest, detail The effects of Thiotepa (P=0.55) and pre-treatment number of tumors (P=0.60) did not violate the PH assumption. The first command generates some variables with prefixes sca and sch. They contain the scaled Schoenfeld residuals and Schoenfeld residuals. The second command is based on those variables. It is a good practice to delete them after the testing: drop sca* sch*
29
Major References (Examples)
Ex 1. Cheung. Int J Epidemiol 2000;29:93-99. Ex 2. Sauerbrei et al. J Clin Oncol 2000;18: Ex 3. Cheung et al. Int J Epidemiol 2001;30:66-74. Example 1. Cheung YB. Marital status and mortality in British women. Int J Epidemiol 2000;29:93-99. Example 2. Sauerbrei W, et al. Randomized 22 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. J Clin Oncol 2000;18: Example 3. Cheung YB, et al. Fetal growth, early postnatal growth and motor development in Pakistani infants. Int J Epidemiol 2001;30:66-74.
30
Major References (General)
Allison. Survival Analysis using the SAS® System. Collett. Modelling Survival Data in Medical Research. Fisher, van Belle. Biostatistics: A Methodology for the Health Sciences. Allison PD. Survival Analysis using the SAS® System. SAS Institute, 1995. Collett D. Modelling Survival Data in Medical Research. London: Chapman & Hall, 1994. Fisher LD, van Belle G. Biostatistics: A Methodology for the Health Sciences. N.Y.: Wiley, 1993: Chapter 16.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.