Faculty of Economics and Administrative Sciences Department of Applied Statistics Survival Analysis of Breast Cancer Patients in Gaza Strip.

Faculty of Economics and Administrative Sciences Department of Applied Statistics Survival Analysis of Breast Cancer Patients in Gaza Strip

1- Introduction Survival analysis has become a popular tool in observational and experimental studies involving follow-up of study participants over time. These studies often experience late arrival and early departure of subjects into and out of the observation period.

Survival analysis techniques allow for a study to start without all experimental units enrolled and to end before all experimental units have experienced an event.

2- Terminology and Notations: Survival analysis is a collection of statistical procedures for data analysis for which the outcome variable of interest is time until an event occurs. Survival time can be defined broadly as the time to the occurrence of a given Event. Time, we mean years, months, weeks, or days from the beginning of follow-up of an individual until an event occurs.

event, we mean death, disease incidence, relapse from remission, recovery (e.g., return to work) or any designated experience of interest that may happen to an individual, Although more than one event may be considered in the same analysis, we will assume that only one event is of designated interest.

Censored Data. Most survival analyses consider a key analytical problem called censoring. In essence, censoring occurs when we have some information about individual survival time, but we don’t know the survival time exactly. There are generally three reasons why censoring may occur: (1)A person does not experience the event before the study ends. (2) A person is lost to follow-up during the study period. (3) A person withdraws from the study because of death (if death is not the event of interest) or some other reason.

The survivor function S(t) is fundamental to a survival analysis and gives the probability that a person survives longer than some specified time t: that is, S(t) gives the probability that the random variable T exceeds the specified time t. The hazard function h(t) gives the instantaneous potential per unit time for the event to occur.

4-Kaplan-Meier Survival Analysis (KMSA) 4-Kaplan-Meier Survival Analysis (KMSA) Several methods have been developed for constructing survival curve estimates, the most common methods being the life Table, and Kaplan-Meier methods.

Kaplan and Meier (1958) were the first who carried out the solution of a problem to estimate the survival curve in a simple way while considering the right censoring.

5- The Log–Rank Test for Comparison of two Survival Distribution The log– rank test is a Nonparametric Method for Comparing Survival distributions and the most popular testing method of comparing the survival of groups.

The problem of comparing survival distributions arises often in biomedical Research. For example a clinical oncologist may be interested in comparing the ability of two or more treatments to prolong life or maintain health. A statistical test is necessary These differences can be illustrated by drawing graphs of the estimated survivorship functions, but that gives only a rough idea of the difference between the distributions. It does not reveal whether the differences are significant or merely chance variations

6- Cox Proportional Hazards Model (CPHM) We have been discussed a most commonly used model in survival data analysis, the Cox (1972) proportional hazards model, and it related statistical inference. This model does not require knowledge of the underlying distribution.

We can say, the Cox proportional hazards model (CPHM) is a “robust” model, so that the results from using the Cox model will closely approximate the results for the correct parametric model. For example, if the correct parametric model is lognormal, then the use of the Cox model typically will give results comparable to those obtained using a lognormal model. Alternatively, if the correct model is exponential, then the Cox model results will closely approximate the results from fitting an exponential model. The Cox proportional hazards model (CPHM), a popular mathematical model used for analyzing survival data.

8-Case study 8-1-Introduction Cancer disease is considered as one of the main medical problems in the developed and developing countries due to its spreading rate, high costs of medical treatment and high mortality rates. In addition, it needs medical and educational programs like protective programs, early detections programs as well as social, medical and psychological rehabilitation programs for patients.

In this thesis we have been studied the breast cancer incidence in the Gaza Strip and analyses the data using different models of survival analysis. We have been started with Kaplan-Meier estimation of survivorship function (KME) then we have been used the Log–Rank test for Comparison of two survival distributions then applied the Cox Proportional Hazards Model (CPHM). The data has been analyzed using the R program is obtaining all the results below.

8-2-Cancer morbidity and reported cases. In 2005, breast cancer occupied the first type of cancer among the Palestinian population (17.3%) with an incidence rate of 7.5 per 100,000 population. Lung cancer occupied the first type of male cancer; which constitute 13.8% of total males, cancer with an incidence rate of 5.2 per 100,000 males. However, Breast Cancer occupied the first type of female cancer (31.4%) with an incidence rate of 15.1 per 100,000 population.

The data for all breast cancer cases in the Gaza Strip were collected from El-shifa hospital. Missing data was obtained from the patients records to complete the data set required for survival analysis.

8-5-Variables of the study 1- Number of patients. 2-Birth date of patients. 3-Gender 4-Marital Status. 5-Address 6-Smoking 7-Date of the first diagnosis (Incidence). 8-Date of the end of follow up. 9-Status ( death or censoring). 10-First place for the emergence of tumor ( all Histology of Primary).

11-Laterality : which is breast that contains the histology primary tumor,( 1=Right, 2= Left ) 12-Treatment 1, surgery,( 1=given, 2= no given). 13-Treatment 2, Radiotherapy,( 1=given, 2= no given). 14-Treatment 3,Chemotherapy,( 1=given, 2= no given). 15-Treatment 4, Hormonal therapy,( 1=given, 2= no given). 16-Topography code ( all C50).

8-6-Survival Analysis of the data Nonparametric or distribution-free methods are quite easy to understand and apply. However they are less efficient than parametric methods when survival times follow a theoretical distribution and more efficient when no suitable theoretical distributions are known for the data. In addition, the variable time of survival of patients do not follow the normal distribution or any distribution from the exponential family.

8-6-1-Kaplan-Meier Estimation of survivorship function (KME) A set of 103 breast cancer patients was given by AL –Shefa hospital (cancer registry ) from 2000 to 2005.Those breast cancer patients join a clinical study at the beginning of year 2000.By the end of the study among them only 38 patients die and 56 patients censoring. Their Survival time is computed from time of diagnosis in days. Table (1) below lists the survival times t in days for those cases who die by the end of the study.

Kaplan-Meier Estimation of survivorship function (KME) is estimated following the formula ( 3.11). The computations had been carried out using the R statistical program and the results are displayed in table (1). We note that the Kaplan-Meier estimation of survivorship function (KME) has an inverse relationship with the variable time (t).

Similar to other estimators, the standard error (S.E.) of the Kaplan Meier estimator of gives an indication of the potential error of by formula (3.16), The confidence interval deserves more attention than just the point estimate. A 95% confidence interval for is estimated by.This also has been calculated using the R program, and the results are illustrated in table (1).

Table (1 ),Kaplan-Meier Estimation of survivorship function (KME) Estimate NODAYS STATUS CUMULATIVE PROPORTION SURVIVING AT THE TIME LOWER 95% CI UPPER 95% CI HAZARD Std. Error 152event.990.0100.9721.0000.0098 272event.981.0140.9541.0000.0196 3221event.971.0170.9391.0000.0296 4281event.961.0190.9250.9990.0396 5355event.951.0210.9110.9940.04976 6447event.942.0230.8980.9880.060 7528event.932.0250.8850.9820.0703 8596event.922.0260.8720.9750.0809 9609event.913.0280.8600.9690.0914 10626event.903.0290.8480.9620.10213 11680event.893.0300.8360.9550.1129 12748event.883.0320.8240.9480.12387 13754event.874.0330.8120.9400.13492 14767event.864.0340.8000.9330.1461 15802event.854.0350.7890.9250.1574 16806event.845.0360.7780.9180.1688 17810event.835.0370.7660.9100.1804

NODAYSSTATUSCUMULATIVE PROPORTION SURVIVING AT THE TIME LOWER 95% CI UPPER 95% CI HAZARD Std. Error 18883event.825.0370.7550.9020.1921 19892event.816.0380.7440.8940.2039 20893event.806.0390.7330.8860.21589 21929event.796.0400.7220.8780.2280 22968event.786.0400.7110.8700.2403 231.002E3event.777.0410.7000.8610.2527 241.247E3event.767.0420.6900.8530.2653 251.413E3event.757.0420.6790.8450.2780 261.492E3event.748.0430.6680.8360.2909 271.503E3event.738.0430.6580.8280.30399 281.608E3event.728.0440.6470.8190.3172 291.660E3event.718.0440.6370.8110.3307 301.733E3event.709.0450.6260.8020.3443 311.757E3event.699.0450.6160.7930.3581 321.792E3event.689.0460.6060.7850.372 331.839E3event.679.0460.5940.7750.3878 341.893E3event.667.0470.5810.7650.4055 351.946E3event.653.0480.5660.7540.4261 362.048E3event.622.0550.5230.7390.4749 372.106E3event.574.0680.4550.7250.55495 382.140E3event.492.0960.3360.7210.7091

For breast cancer data in the Gaza Strip, the mean survival time is estimated by 1751 days using formula (3.19). and the standard error of mean survival time given by using formula (3.20) is estimated by 62.512, which is indicated in table (2). Table(2), Means for Survival Time MEANS FOR SURVIVAL TIME Mean95% Confidence Interval EstimateStd.ErrorLower BoundUpper Bound 1.751E362.5121628.5141873.560

The estimated median survival time is the 50th percentile, which is the value of t at, The median survival time for breast cancer cases in the Gaza Strip is approximately 2140 days at, which is indicated in table (3) below. Table(3 ), Median for Survival Time RECORDSN. MAXN. STARTEVENTSMEDIAN 103 382140

Theoretically,the estimator of survival function which is plotted in graph (1) is expected to appears as a step function since it remains constant between two observed exact survival times. However, The most commonly used summary statistic in survival analysis is the median survival time. The median survival time ( =2140 days ) is estimated from the survival curve. The estimated mean survival time( =1751 days ) can be seen to equal the area under the estimated survivorship function as described by formula (3.18).

Graph (1 ), Kaplan-Meier estimate of the survivorship function for the data in Table (1) and its 95% confidence intervals.

Graph (2 ) and table (1) of the estimated hazard function show that the death rate( hazard function ) is low in the first 52 days after diagnosis. From the end of the first 52days to the beginning of the day one thousand (1000), the death rate (hazard function ) remains increasing continuously between 0.02 and 0.24. However, after the day 1000 to the end of day (1890) the death rate (hazard function ) remains increasing continuously, between 0.25 and 0.40.But after the day (1890) to end of the day (2140), the death rate (hazard function) increased rapidly from 0.43 to 0.71. Generally speaking,the hazard rate is generally high after the day 52, and rapidly increasing until the day 2140 from 0.0196 to 0.7091

Graph (2) Hazard function for breast cancer patients in the Gaza Strip.

8-6-2-The Log–Rank test for Comparison of two Survival Distributions The problem here is to compare survival times of two groups of patients of breast cancer exposed to four different treatments ( Surgery, Radiotherapy,Chemotherapy, Hormonal Therapy) by comparing the survivorship function and hazard function of the two groups. The following survival data for 103 females with breast cancer, contains two groups, the first group contains the patients of ages less than 50 years old and the second group contains patients with ages greater or equal to 50 years old.

Survival times are estimated for both groups from time of diagnosis in days. Table (4) lists the survival times t in days. Kaplan-Meier Estimation of survivorship function (KME) is computed, in table(4). Similar to other estimators, the standard error (S.E.) of the Kaplan Meier estimator of and A 95% confidence interval for is also estimated in table (4).

Table (4),Kaplan-Meier Estimation of survivorship function (KME) for two groups of breast cancer cases in the Gaza Strip. NODAYSSTATUSCUMULATIVE PROPORTION SURVIVING AT THE TIME LOWER 95% CI UPPER 95% CI DIFFERENCEHAZARD Std. Error 1447event0.9710.0290.914161.02784less than 500.0299 2528event0.9410.040.86261.0194less than 500.061 3609event0.9120.0490.815961.00804less than 500.0921 4748event0.8820.0550.77420.9898less than 500.1252 5754event0.8530.0610.733440.97256less than 500.1592 6767event0.8240.0650.69660.9514less than 500.1942 7806event0.7940.0690.658760.92924less than 500.231 8883event0.7650.0730.621920.90808less than 500.2683 9892event0.7350.0760.586040.88396less than 500.310 10893event0.7060.0780.553120.85888less than 500.348 11968event0.6760.080.51920.8328less than 500.391 121.00E+03event0.6470.0820.486280.80772less than 500.435 131.25E+03event0.6180.0830.455320.78068less than 500.482 141.61E+03event0.5880.0840.423360.75264less than 500.531 151.66E+03event0.5590.0850.39240.7256less than 500.582 161.76E+03event0.5290.0860.360440.69756less than 500.636 171.79E+03event0.50.0860.331440.66856less than 500.6932

NODAYSSTATUSCUMULATIVE PROPORTION SURVIVING AT THE TIME LOWER 95% CI UPPER 95% CI DIFFERENCEHAZARD Std. Error 182.11E+03event0.4170.1040.213160.62084less than 500.8755 192.14E+03event0.3120.1190.078760.54524less than 501.1632 2052event0.9860.0140.958561.01344greater or equal 500.0146 2172event0.9710.020.93181.0102greater or equal 500.02946 22221event0.9570.0250.9081.006greater or equal 500.04445 23281event0.9420.0280.887120.99688greater or equal 500.05972 24355event0.9280.0310.867240.98876greater or equal 500.075 25596event0.9130.0340.846360.97964greater or equal 500.09097 26626event0.8990.0360.828440.96956greater or equal 500.10697 27680event0.8840.0390.807560.96044greater or equal 500.1232 28802event0.870.0410.789640.95036greater or equal 500.13976 29810event0.8550.0420.772680.93732greater or equal 500.15657 30929event0.8410.0440.754760.92724greater or equal 500.1737 311.41E+03event0.8260.0460.735840.91616greater or equal 500.19106 321.49E+03event0.8120.0470.719880.90412greater or equal 500.2088 331.50E+03event0.7970.0480.702920.89108greater or equal 500.2268 341.73E+03event0.7830.050.6850.881greater or equal 500.2451 351.84E+03event0.7660.0510.666040.86596greater or equal 500.2662 361.89E+03event0.7480.0530.644120.85188greater or equal 500.2897 371.95E+03event0.7280.0550.62020.8358greater or equal 500.31788 382.05E+03event0.6670.0770.516080.81792greater or equal 500.4049

The estimated mean survival time for the first group is 1583 days, and the standard error of the mean survival time is 109.04. However, the estimated mean survival time or the second group is 1832, and the standard error of the mean survival time is 74.23. Moreover, the estimated mean survival time for all patients is 1751 days and the standard error of the mean survival time for all patients is 62.512.The above results are illustrated in table (5) below.

Table (5), Means for Survival Time for two groups of breast cancer in the Gaza Strip DIFERENT MEAN EstimateStd. Error95% Confidence Interval Lower BoundUpper Bound less501.583E3109.0421369.3751796.818 more501.832E374.2331686.6441977.638 Overall1.751E362.5121628.5141873.560

For the remission data, the log–rank statistic computed using formula (3.24), is 6.004 and indicated in table (6) and the corresponding P-value is.014 which indicates that the null hypothesis should be rejected.The null hypothesis being tested is that there is no overall difference between the two survival curves. We can therefore conclude that the first group and the second group are significantly different (KME) survival curves. Table (6), Test of equality of survival distributions for the different levels of different by Log Rank test OVERALL COMPARISONS Chi-SquaredfSig. Log Rank (Mantel-Cox)6.0041.014

Now, plots of the (KME) curves for the first group together with the second group are shown here in the graph (3) below. Notice that the (KME) curve for the second group is consistently higher than the (KME) curve for the first group. These results indicate that the second group, has better survival and better response to treatment than first group. Moreover, as the number of days increases, the two curves appear to get further apart, which indicate that the effect of treatment on the second group is greater than the effect of treatment on the first group to stay in remission.

Graph (3), Kaplan-Meier estimate of the survivorship function for the data in table (5.24) for two groups

The graph (4) of the estimated hazard function shows that the death rate for both groups are low in the first 750 days after diagnosis. After 750 days to the end, the death rate remains increasing continuously for both groups, but the first group contains patients with ages less than 50 years old with death rates between ( 0.13 - 0.70) while the second group contains patients with ages greater or equal to 50 years old with death rates between ( 0.14 - 0.40). The hazard rate is generally high and increase continuously for first and second group, but the hazard rate for the first group is higher than the second group.Notice that the second group contains patients who have ages greater or equal to 50 years old.

Graph (4) Hazard function for first and second group

8-6-3-Cox Proportional Hazards Model (CPHM) 8-6-3-1-The Formula for the Cox Proportional Hazards Model We are thus considering a problem involving four explanatory variables as predictors of survival time T, where T denotes days until going out of remission “death” and we label the explanatory variables (two groups of breast cancer patients), with 34 patients in the first group which contains the patients with ages less than 50 years old and 69 patients in the second group which contains patients who have ages greater or equal to 50 years old.

The data set also contains three variables :. =Lateral: the breast that contains the histology primary tumor,(1=Right, 2= Lift) =Surgery : breast sparing surgery, ( 1=given, 2= not given). = Hormonal therapy,( 1=given, 2= not given).

The outcome variable for the model is the time in days until a patient goes out of remission (died). We have been described the final model and their results concerning breast cancer cases in the Gaza Strip. We now describe final model and their results concerning breast cancer cases in the Gaza Strip

The method of estimation used to obtain the coefficients for the final model is maximum likelihood (ML) estimation. Notice that the p-value of 0.0187 which is obtained for the coefficient of Groups ages indicates that there is significant effect for that variable. Moreover, the p-value of 0.0325 is obtained for the coefficient of Surgery, which indicates that there is significant effect for that variable. This Z statistic is known as a Wald statistic. All the above results can be found in the table ( 7 ) above.

Table(7) Variables in the equation UPPER.95LOWER.95Z SE(COEF) HRCOEF 0.87980.24390.0181-2.3510.32730.46320.7695-Groups Ages 0.9170.13710.03252.139-0.48490.3545-1.037Surgery likelihood = - 157.9798

Finally,we consider final model for the remission data. The fitted model written in terms of the hazard function using formula (4.1) is given by.

8-6-3-2-Adjusted Survival Curves Using the (CPHM) Typically, when computing adjusted survival curves, the value chosen for a covariate being adjusted is an average value like an arithmetic mean or median. In fact, most computer programs for the Cox model automatically use the mean value over all subjects for each covariate being adjusted.A general formula for the adjusted survival curve for all covariates in the model following formula (4.19) is given by:

To obtain the adjusted survival curve, we then substitute the mean values in the formula in the model fitted. The formula and the resulting expression for the adjusted survival curve are shown below and the results of application of the adjusted survival carve are given in the third column of the table (8)

Table (8), Adjusted Survival function Using the Cox PH Model NOTIMESURVIVALS.ELOWER 95% CI UPPER 95% CI Baseline Cum Hazard Baseline survivorship function Cumulative hazard function 1520.9910.0090.97510.2280.79630.009 2720.9830.0120.9610.4570.63320.017 32210.9740.0150.94710.6880.502780.026 42810.9660.0170.93410.9210.398130.035 53550.9570.0190.9220.9951.1590.313920.044 64470.9490.0210.910.9911.40.246580.053 75280.940.0220.8980.9861.6480.192390.062 85960.9310.0240.8860.981.9020.149210.072 96090.9220.0260.8740.9742.1590.11540.081 106260.9130.0270.8630.9682.4190.0890.091 116800.9040.0280.8510.9622.6810.068510.101 127480.8950.030.840.9562.9450.052590.111 137540.8860.0310.8290.9493.2140.04020.121 147670.8770.0320.8180.9433.4870.030590.131 158020.8680.0330.8070.9363.7630.023210.142 168060.8590.0340.7960.9294.0430.017550.152 178100.850.0350.7850.9224.3250.013230.163 188830.840.0360.7740.9154.6120.009930.174

NOTIMESURVIVALS.ELOWER 95% CI UPPER 95% CI Baseline Cum Hazard Baseline survivorship function Cumulative hazard function 198920.8310.0370.7630.9084.9070.007390.185 208930.8220.0380.7520.9015.2130.005440.196 219290.8120.0390.7420.8935.5230.003990.208 229680.8030.0390.7310.8865.8370.002920.22 2310020.7930.040.720.8786.1570.002120.232 2412470.7830.0410.7090.876.4830.001530.244 2514130.7740.0420.6980.8626.8130.00110.257 2614920.7640.0420.6870.8547.1460.000790.269 2715030.7540.0430.6770.8467.4860.000560.282 2816080.7440.0440.6660.8387.8330.00040.295 2916600.7350.0440.6550.838.1870.000280.309 3017330.7250.0450.6440.8228.5470.000190.322 3117570.7150.0460.6340.8138.9120.000130.336 3217920.7050.0460.6230.8059.2859.3E-050.35 3318390.6940.0470.6110.7959.6996.1E-050.365 3418930.6810.0480.5970.78510.23.7E-050.384 3519460.6670.0490.5810.77410.762.1E-050.406 3620480.6390.0550.5460.7611.96.8E-060.448 3721060.5930.0660.4870.74813.869.5E-070.522 3821400.5190.0870.3930.74117.392.8E-080.655

Graph (5) adjusted survival curves obtained from fitting a Cox model

9-The recommendations 1- We recommend reactivation and rehabilitation of the Cancer Registry Center in Palestine to know the exact oncology cases and its different diagnostic sources in order to define the problem and its spreading reasons. 2- Develop the use of (Ten International Classification ICD10) which is related to diseases, deaths and disabled people. Using such kind of classification required training for doctors, health professionals and data entry persons.

3- We recommend establishing an advanced system for record registry of causes of death on death certificates. 4- Development of cooperation and full coordination between system and information Department and Cancer Registry Center for monitoring and recording of cases of tumors in the Ministry of Health through a sophisticated electronic system.

5- It is important to develop the Cancer Patients data in cancer Registry Center. Furthermore, it is required to improve the cooperation between Ministry of Health and Palestinian Central Bureau of Statistics to minimize the gaps of indicators which depend on the Ministry of health data and estimated indicators of the PCBS which come as a result of health surveys. 6- We recommend applying the Kaplan-Meier Estimation of survivorship function (KME ) and estimated mean survival time for all cancer patient with confidence interval for.

7- We recommend the determination of the relationship between (KME ) and time for all cancer patients and determination of the relationship between hazard ratio and time for all cancer patients, using the statistics program R. 8- A clinical oncologist may be interested in comparing the ability of two or more treatments to prolong life or maintain health for two group from patients ages. Almost invariably, survival times of the different groups vary. Therefore, we recommend using the Log–Rank Test For Comparison of two survival distributions for all cancer patients.

9- We recommend using the Cox proportional hazards model (CPHM), for analyzing survival data, that contain the most important variables as predictors of survival time T, where T denotes days until going out of remission “death or survive”, for all cancer patients in Palestine.

قال رسول الله صلى الله عليه وسلم : (( إن الله يحب إذا عمل أحدكم عملا أن يتقنه )).

Faculty of Economics and Administrative Sciences Department of Applied Statistics Survival Analysis of Breast Cancer Patients in Gaza Strip.

Similar presentations

Presentation on theme: "Faculty of Economics and Administrative Sciences Department of Applied Statistics Survival Analysis of Breast Cancer Patients in Gaza Strip."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Faculty of Economics and Administrative Sciences Department of Applied Statistics Survival Analysis of Breast Cancer Patients in Gaza Strip.

Similar presentations

Presentation on theme: "Faculty of Economics and Administrative Sciences Department of Applied Statistics Survival Analysis of Breast Cancer Patients in Gaza Strip."— Presentation transcript:

Similar presentations

About project

Feedback