Faculty of Economics and Administrative Sciences Department of Applied Statistics Survival Analysis of Breast Cancer Patients in Gaza Strip
1- Introduction Survival analysis has become a popular tool in observational and experimental studies involving follow-up of study participants over time. These studies often experience late arrival and early departure of subjects into and out of the observation period.
Survival analysis techniques allow for a study to start without all experimental units enrolled and to end before all experimental units have experienced an event.
2- Terminology and Notations: Survival analysis is a collection of statistical procedures for data analysis for which the outcome variable of interest is time until an event occurs. Survival time can be defined broadly as the time to the occurrence of a given Event. Time, we mean years, months, weeks, or days from the beginning of follow-up of an individual until an event occurs.
event, we mean death, disease incidence, relapse from remission, recovery (e.g., return to work) or any designated experience of interest that may happen to an individual, Although more than one event may be considered in the same analysis, we will assume that only one event is of designated interest.
Censored Data. Most survival analyses consider a key analytical problem called censoring. In essence, censoring occurs when we have some information about individual survival time, but we don’t know the survival time exactly. There are generally three reasons why censoring may occur: (1)A person does not experience the event before the study ends. (2) A person is lost to follow-up during the study period. (3) A person withdraws from the study because of death (if death is not the event of interest) or some other reason.
The survivor function S(t) is fundamental to a survival analysis and gives the probability that a person survives longer than some specified time t: that is, S(t) gives the probability that the random variable T exceeds the specified time t. The hazard function h(t) gives the instantaneous potential per unit time for the event to occur.
4-Kaplan-Meier Survival Analysis (KMSA) 4-Kaplan-Meier Survival Analysis (KMSA) Several methods have been developed for constructing survival curve estimates, the most common methods being the life Table, and Kaplan-Meier methods.
Kaplan and Meier (1958) were the first who carried out the solution of a problem to estimate the survival curve in a simple way while considering the right censoring.
5- The Log–Rank Test for Comparison of two Survival Distribution The log– rank test is a Nonparametric Method for Comparing Survival distributions and the most popular testing method of comparing the survival of groups.
The problem of comparing survival distributions arises often in biomedical Research. For example a clinical oncologist may be interested in comparing the ability of two or more treatments to prolong life or maintain health. A statistical test is necessary These differences can be illustrated by drawing graphs of the estimated survivorship functions, but that gives only a rough idea of the difference between the distributions. It does not reveal whether the differences are significant or merely chance variations
6- Cox Proportional Hazards Model (CPHM) We have been discussed a most commonly used model in survival data analysis, the Cox (1972) proportional hazards model, and it related statistical inference. This model does not require knowledge of the underlying distribution.
We can say, the Cox proportional hazards model (CPHM) is a “robust” model, so that the results from using the Cox model will closely approximate the results for the correct parametric model. For example, if the correct parametric model is lognormal, then the use of the Cox model typically will give results comparable to those obtained using a lognormal model. Alternatively, if the correct model is exponential, then the Cox model results will closely approximate the results from fitting an exponential model. The Cox proportional hazards model (CPHM), a popular mathematical model used for analyzing survival data.
8-Case study 8-1-Introduction Cancer disease is considered as one of the main medical problems in the developed and developing countries due to its spreading rate, high costs of medical treatment and high mortality rates. In addition, it needs medical and educational programs like protective programs, early detections programs as well as social, medical and psychological rehabilitation programs for patients.
In this thesis we have been studied the breast cancer incidence in the Gaza Strip and analyses the data using different models of survival analysis. We have been started with Kaplan-Meier estimation of survivorship function (KME) then we have been used the Log–Rank test for Comparison of two survival distributions then applied the Cox Proportional Hazards Model (CPHM). The data has been analyzed using the R program is obtaining all the results below.
8-2-Cancer morbidity and reported cases. In 2005, breast cancer occupied the first type of cancer among the Palestinian population (17.3%) with an incidence rate of 7.5 per 100,000 population. Lung cancer occupied the first type of male cancer; which constitute 13.8% of total males, cancer with an incidence rate of 5.2 per 100,000 males. However, Breast Cancer occupied the first type of female cancer (31.4%) with an incidence rate of 15.1 per 100,000 population.
The data for all breast cancer cases in the Gaza Strip were collected from El-shifa hospital. Missing data was obtained from the patients records to complete the data set required for survival analysis.
8-5-Variables of the study 1- Number of patients. 2-Birth date of patients. 3-Gender 4-Marital Status. 5-Address 6-Smoking 7-Date of the first diagnosis (Incidence). 8-Date of the end of follow up. 9-Status ( death or censoring). 10-First place for the emergence of tumor ( all Histology of Primary).
11-Laterality : which is breast that contains the histology primary tumor,( 1=Right, 2= Left ) 12-Treatment 1, surgery,( 1=given, 2= no given). 13-Treatment 2, Radiotherapy,( 1=given, 2= no given). 14-Treatment 3,Chemotherapy,( 1=given, 2= no given). 15-Treatment 4, Hormonal therapy,( 1=given, 2= no given). 16-Topography code ( all C50).
8-6-Survival Analysis of the data Nonparametric or distribution-free methods are quite easy to understand and apply. However they are less efficient than parametric methods when survival times follow a theoretical distribution and more efficient when no suitable theoretical distributions are known for the data. In addition, the variable time of survival of patients do not follow the normal distribution or any distribution from the exponential family.
8-6-1-Kaplan-Meier Estimation of survivorship function (KME) A set of 103 breast cancer patients was given by AL –Shefa hospital (cancer registry ) from 2000 to 2005.Those breast cancer patients join a clinical study at the beginning of year 2000.By the end of the study among them only 38 patients die and 56 patients censoring. Their Survival time is computed from time of diagnosis in days. Table (1) below lists the survival times t in days for those cases who die by the end of the study.
Kaplan-Meier Estimation of survivorship function (KME) is estimated following the formula ( 3.11). The computations had been carried out using the R statistical program and the results are displayed in table (1). We note that the Kaplan-Meier estimation of survivorship function (KME) has an inverse relationship with the variable time (t).
Similar to other estimators, the standard error (S.E.) of the Kaplan Meier estimator of gives an indication of the potential error of by formula (3.16), The confidence interval deserves more attention than just the point estimate. A 95% confidence interval for is estimated by.This also has been calculated using the R program, and the results are illustrated in table (1).
Table (1 ),Kaplan-Meier Estimation of survivorship function (KME) Estimate NODAYS STATUS CUMULATIVE PROPORTION SURVIVING AT THE TIME LOWER 95% CI UPPER 95% CI HAZARD Std. Error 152event event event event event event event event event event event event event event event event event
NODAYSSTATUSCUMULATIVE PROPORTION SURVIVING AT THE TIME LOWER 95% CI UPPER 95% CI HAZARD Std. Error 18883event event event event event E3event E3event E3event E3event E3event E3event E3event E3event E3event E3event E3event E3event E3event E3event E3event E3event
For breast cancer data in the Gaza Strip, the mean survival time is estimated by 1751 days using formula (3.19). and the standard error of mean survival time given by using formula (3.20) is estimated by , which is indicated in table (2). Table(2), Means for Survival Time MEANS FOR SURVIVAL TIME Mean95% Confidence Interval EstimateStd.ErrorLower BoundUpper Bound 1.751E
The estimated median survival time is the 50th percentile, which is the value of t at, The median survival time for breast cancer cases in the Gaza Strip is approximately 2140 days at, which is indicated in table (3) below. Table(3 ), Median for Survival Time RECORDSN. MAXN. STARTEVENTSMEDIAN
Theoretically,the estimator of survival function which is plotted in graph (1) is expected to appears as a step function since it remains constant between two observed exact survival times. However, The most commonly used summary statistic in survival analysis is the median survival time. The median survival time ( =2140 days ) is estimated from the survival curve. The estimated mean survival time( =1751 days ) can be seen to equal the area under the estimated survivorship function as described by formula (3.18).
Graph (1 ), Kaplan-Meier estimate of the survivorship function for the data in Table (1) and its 95% confidence intervals.
Graph (2 ) and table (1) of the estimated hazard function show that the death rate( hazard function ) is low in the first 52 days after diagnosis. From the end of the first 52days to the beginning of the day one thousand (1000), the death rate (hazard function ) remains increasing continuously between 0.02 and However, after the day 1000 to the end of day (1890) the death rate (hazard function ) remains increasing continuously, between 0.25 and 0.40.But after the day (1890) to end of the day (2140), the death rate (hazard function) increased rapidly from 0.43 to Generally speaking,the hazard rate is generally high after the day 52, and rapidly increasing until the day 2140 from to
Graph (2) Hazard function for breast cancer patients in the Gaza Strip.
8-6-2-The Log–Rank test for Comparison of two Survival Distributions The problem here is to compare survival times of two groups of patients of breast cancer exposed to four different treatments ( Surgery, Radiotherapy,Chemotherapy, Hormonal Therapy) by comparing the survivorship function and hazard function of the two groups. The following survival data for 103 females with breast cancer, contains two groups, the first group contains the patients of ages less than 50 years old and the second group contains patients with ages greater or equal to 50 years old.
Survival times are estimated for both groups from time of diagnosis in days. Table (4) lists the survival times t in days. Kaplan-Meier Estimation of survivorship function (KME) is computed, in table(4). Similar to other estimators, the standard error (S.E.) of the Kaplan Meier estimator of and A 95% confidence interval for is also estimated in table (4).
Table (4),Kaplan-Meier Estimation of survivorship function (KME) for two groups of breast cancer cases in the Gaza Strip. NODAYSSTATUSCUMULATIVE PROPORTION SURVIVING AT THE TIME LOWER 95% CI UPPER 95% CI DIFFERENCEHAZARD Std. Error 1447event less than event less than event less than event less than event less than event less than event less than event less than event less than event less than event less than E+03event less than E+03event less than E+03event less than E+03event less than E+03event less than E+03event less than
NODAYSSTATUSCUMULATIVE PROPORTION SURVIVING AT THE TIME LOWER 95% CI UPPER 95% CI DIFFERENCEHAZARD Std. Error E+03event less than E+03event less than event greater or equal event greater or equal event greater or equal event greater or equal event greater or equal event greater or equal event greater or equal event greater or equal event greater or equal event greater or equal event greater or equal E+03event greater or equal E+03event greater or equal E+03event greater or equal E+03event greater or equal E+03event greater or equal E+03event greater or equal E+03event greater or equal E+03event greater or equal
The estimated mean survival time for the first group is 1583 days, and the standard error of the mean survival time is However, the estimated mean survival time or the second group is 1832, and the standard error of the mean survival time is Moreover, the estimated mean survival time for all patients is 1751 days and the standard error of the mean survival time for all patients is The above results are illustrated in table (5) below.
Table (5), Means for Survival Time for two groups of breast cancer in the Gaza Strip DIFERENT MEAN EstimateStd. Error95% Confidence Interval Lower BoundUpper Bound less E more E Overall1.751E
For the remission data, the log–rank statistic computed using formula (3.24), is and indicated in table (6) and the corresponding P-value is.014 which indicates that the null hypothesis should be rejected.The null hypothesis being tested is that there is no overall difference between the two survival curves. We can therefore conclude that the first group and the second group are significantly different (KME) survival curves. Table (6), Test of equality of survival distributions for the different levels of different by Log Rank test OVERALL COMPARISONS Chi-SquaredfSig. Log Rank (Mantel-Cox)
Now, plots of the (KME) curves for the first group together with the second group are shown here in the graph (3) below. Notice that the (KME) curve for the second group is consistently higher than the (KME) curve for the first group. These results indicate that the second group, has better survival and better response to treatment than first group. Moreover, as the number of days increases, the two curves appear to get further apart, which indicate that the effect of treatment on the second group is greater than the effect of treatment on the first group to stay in remission.
Graph (3), Kaplan-Meier estimate of the survivorship function for the data in table (5.24) for two groups
The graph (4) of the estimated hazard function shows that the death rate for both groups are low in the first 750 days after diagnosis. After 750 days to the end, the death rate remains increasing continuously for both groups, but the first group contains patients with ages less than 50 years old with death rates between ( ) while the second group contains patients with ages greater or equal to 50 years old with death rates between ( ). The hazard rate is generally high and increase continuously for first and second group, but the hazard rate for the first group is higher than the second group.Notice that the second group contains patients who have ages greater or equal to 50 years old.
Graph (4) Hazard function for first and second group
8-6-3-Cox Proportional Hazards Model (CPHM) The Formula for the Cox Proportional Hazards Model We are thus considering a problem involving four explanatory variables as predictors of survival time T, where T denotes days until going out of remission “death” and we label the explanatory variables (two groups of breast cancer patients), with 34 patients in the first group which contains the patients with ages less than 50 years old and 69 patients in the second group which contains patients who have ages greater or equal to 50 years old.
The data set also contains three variables :. =Lateral: the breast that contains the histology primary tumor,(1=Right, 2= Lift) =Surgery : breast sparing surgery, ( 1=given, 2= not given). = Hormonal therapy,( 1=given, 2= not given).
The outcome variable for the model is the time in days until a patient goes out of remission (died). We have been described the final model and their results concerning breast cancer cases in the Gaza Strip. We now describe final model and their results concerning breast cancer cases in the Gaza Strip
The method of estimation used to obtain the coefficients for the final model is maximum likelihood (ML) estimation. Notice that the p-value of which is obtained for the coefficient of Groups ages indicates that there is significant effect for that variable. Moreover, the p-value of is obtained for the coefficient of Surgery, which indicates that there is significant effect for that variable. This Z statistic is known as a Wald statistic. All the above results can be found in the table ( 7 ) above.
Table(7) Variables in the equation UPPER.95LOWER.95Z SE(COEF) HRCOEF Groups Ages Surgery likelihood =
Finally,we consider final model for the remission data. The fitted model written in terms of the hazard function using formula (4.1) is given by.
Adjusted Survival Curves Using the (CPHM) Typically, when computing adjusted survival curves, the value chosen for a covariate being adjusted is an average value like an arithmetic mean or median. In fact, most computer programs for the Cox model automatically use the mean value over all subjects for each covariate being adjusted.A general formula for the adjusted survival curve for all covariates in the model following formula (4.19) is given by:
To obtain the adjusted survival curve, we then substitute the mean values in the formula in the model fitted. The formula and the resulting expression for the adjusted survival curve are shown below and the results of application of the adjusted survival carve are given in the third column of the table (8)
Table (8), Adjusted Survival function Using the Cox PH Model NOTIMESURVIVALS.ELOWER 95% CI UPPER 95% CI Baseline Cum Hazard Baseline survivorship function Cumulative hazard function
NOTIMESURVIVALS.ELOWER 95% CI UPPER 95% CI Baseline Cum Hazard Baseline survivorship function Cumulative hazard function E E E E E E E
Graph (5) adjusted survival curves obtained from fitting a Cox model
9-The recommendations 1- We recommend reactivation and rehabilitation of the Cancer Registry Center in Palestine to know the exact oncology cases and its different diagnostic sources in order to define the problem and its spreading reasons. 2- Develop the use of (Ten International Classification ICD10) which is related to diseases, deaths and disabled people. Using such kind of classification required training for doctors, health professionals and data entry persons.
3- We recommend establishing an advanced system for record registry of causes of death on death certificates. 4- Development of cooperation and full coordination between system and information Department and Cancer Registry Center for monitoring and recording of cases of tumors in the Ministry of Health through a sophisticated electronic system.
5- It is important to develop the Cancer Patients data in cancer Registry Center. Furthermore, it is required to improve the cooperation between Ministry of Health and Palestinian Central Bureau of Statistics to minimize the gaps of indicators which depend on the Ministry of health data and estimated indicators of the PCBS which come as a result of health surveys. 6- We recommend applying the Kaplan-Meier Estimation of survivorship function (KME ) and estimated mean survival time for all cancer patient with confidence interval for.
7- We recommend the determination of the relationship between (KME ) and time for all cancer patients and determination of the relationship between hazard ratio and time for all cancer patients, using the statistics program R. 8- A clinical oncologist may be interested in comparing the ability of two or more treatments to prolong life or maintain health for two group from patients ages. Almost invariably, survival times of the different groups vary. Therefore, we recommend using the Log–Rank Test For Comparison of two survival distributions for all cancer patients.
9- We recommend using the Cox proportional hazards model (CPHM), for analyzing survival data, that contain the most important variables as predictors of survival time T, where T denotes days until going out of remission “death or survive”, for all cancer patients in Palestine.
قال رسول الله صلى الله عليه وسلم : (( إن الله يحب إذا عمل أحدكم عملا أن يتقنه )).