Faculty of Economics and Administrative Sciences Department of Applied Statistics Survival Analysis of Breast Cancer Patients in Gaza Strip.

Slides:



Advertisements
Similar presentations
The analysis of survival data in nephrology. Basic concepts and methods of Cox regression Paul C. van Dijk 1-2, Kitty J. Jager 1, Aeilko H. Zwinderman.
Advertisements

A small taste of inferential statistics
Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
Survival Analysis In many medical studies, the primary endpoint is time until an event occurs (e.g. death, remission) Data are typically subject to censoring.
Survival Analysis-1 In Survival Analysis the outcome of interest is time to an event In Survival Analysis the outcome of interest is time to an event The.
Comparing Two Proportions (p1 vs. p2)
Survival Analysis. Statistical methods for analyzing longitudinal data on the occurrence of events. Events may include death, injury, onset of illness,
Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
Chapter 10: Hypothesis Testing
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
بسم الله الرحمن الرحیم. Generally,survival analysis is a collection of statistical procedures for data analysis for which the outcome variable of.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Biostatistics in Research Practice Time to event data Martin Bland Professor of Health Statistics University of York
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Today Concepts underlying inferential statistics
Measures of disease frequency (I). MEASURES OF DISEASE FREQUENCY Absolute measures of disease frequency: –Incidence –Prevalence –Odds Measures of association:
Sample Size Determination
Sample Size Determination Ziad Taib March 7, 2014.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
Lecture 16 Duration analysis: Survivor and hazard function estimation
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Survival Curves Marshall University Genomics Core.
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Inference for regression - Simple linear regression
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
NASSER DAVARZANI DEPARTMENT OF KNOWLEDGE ENGINEERING MAASTRICHT UNIVERSITY, 6200 MAASTRICHT, THE NETHERLANDS 22 OCTOBER 2012 Introduction to Survival Analysis.
HSRP 734: Advanced Statistical Methods July 10, 2008.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
8.1 Inference for a Single Proportion
Inference for a Single Population Proportion (p).
Comparing Two Population Means
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
CHAPTER 18: Inference about a Population Mean
Assessing Survival: Cox Proportional Hazards Model
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Bayesian Analysis and Applications of A Cure Rate Model.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
INTRODUCTION TO SURVIVAL ANALYSIS
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
HSRP 734: Advanced Statistical Methods July 17, 2008.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
HSRP 734: Advanced Statistical Methods July 31, 2008.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
1 Lecture 6: Descriptive follow-up studies Natural history of disease and prognosis Survival analysis: Kaplan-Meier survival curves Cox proportional hazards.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
Chapter 8: Simple Linear Regression Yang Zhenlin.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
01/20151 EPI 5344: Survival Analysis in Epidemiology Hazard March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
INTRODUCTION TO CLINICAL RESEARCH Survival Analysis – Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010.
02/20161 EPI 5344: Survival Analysis in Epidemiology Hazard March 8, 2016 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
SURVIVAL ANALYSIS PRESENTED BY: DR SANJAYA KUMAR SAHOO PGT,AIIH&PH,KOLKATA.
Carolinas Medical Center, Charlotte, NC Website:
Inference for a Single Population Proportion (p)
BIOST 513 Discussion Section - Week 10
Comparing Cox Model with a Surviving Fraction with regular Cox model
April 18 Intro to survival analysis Le 11.1 – 11.2
Chapter 9: Inferences Involving One Population
Multiple logistic regression
Presentation transcript:

Faculty of Economics and Administrative Sciences Department of Applied Statistics Survival Analysis of Breast Cancer Patients in Gaza Strip

1- Introduction Survival analysis has become a popular tool in observational and experimental studies involving follow-up of study participants over time. These studies often experience late arrival and early departure of subjects into and out of the observation period.

Survival analysis techniques allow for a study to start without all experimental units enrolled and to end before all experimental units have experienced an event.

2- Terminology and Notations: Survival analysis is a collection of statistical procedures for data analysis for which the outcome variable of interest is time until an event occurs. Survival time can be defined broadly as the time to the occurrence of a given Event. Time, we mean years, months, weeks, or days from the beginning of follow-up of an individual until an event occurs.

event, we mean death, disease incidence, relapse from remission, recovery (e.g., return to work) or any designated experience of interest that may happen to an individual, Although more than one event may be considered in the same analysis, we will assume that only one event is of designated interest.

Censored Data. Most survival analyses consider a key analytical problem called censoring. In essence, censoring occurs when we have some information about individual survival time, but we don’t know the survival time exactly. There are generally three reasons why censoring may occur: (1)A person does not experience the event before the study ends. (2) A person is lost to follow-up during the study period. (3) A person withdraws from the study because of death (if death is not the event of interest) or some other reason.

The survivor function S(t) is fundamental to a survival analysis and gives the probability that a person survives longer than some specified time t: that is, S(t) gives the probability that the random variable T exceeds the specified time t. The hazard function h(t) gives the instantaneous potential per unit time for the event to occur.

4-Kaplan-Meier Survival Analysis (KMSA) 4-Kaplan-Meier Survival Analysis (KMSA) Several methods have been developed for constructing survival curve estimates, the most common methods being the life Table, and Kaplan-Meier methods.

Kaplan and Meier (1958) were the first who carried out the solution of a problem to estimate the survival curve in a simple way while considering the right censoring.

5- The Log–Rank Test for Comparison of two Survival Distribution The log– rank test is a Nonparametric Method for Comparing Survival distributions and the most popular testing method of comparing the survival of groups.

The problem of comparing survival distributions arises often in biomedical Research. For example a clinical oncologist may be interested in comparing the ability of two or more treatments to prolong life or maintain health. A statistical test is necessary These differences can be illustrated by drawing graphs of the estimated survivorship functions, but that gives only a rough idea of the difference between the distributions. It does not reveal whether the differences are significant or merely chance variations

6- Cox Proportional Hazards Model (CPHM) We have been discussed a most commonly used model in survival data analysis, the Cox (1972) proportional hazards model, and it related statistical inference. This model does not require knowledge of the underlying distribution.

We can say, the Cox proportional hazards model (CPHM) is a “robust” model, so that the results from using the Cox model will closely approximate the results for the correct parametric model. For example, if the correct parametric model is lognormal, then the use of the Cox model typically will give results comparable to those obtained using a lognormal model. Alternatively, if the correct model is exponential, then the Cox model results will closely approximate the results from fitting an exponential model. The Cox proportional hazards model (CPHM), a popular mathematical model used for analyzing survival data.

8-Case study 8-1-Introduction Cancer disease is considered as one of the main medical problems in the developed and developing countries due to its spreading rate, high costs of medical treatment and high mortality rates. In addition, it needs medical and educational programs like protective programs, early detections programs as well as social, medical and psychological rehabilitation programs for patients.

In this thesis we have been studied the breast cancer incidence in the Gaza Strip and analyses the data using different models of survival analysis. We have been started with Kaplan-Meier estimation of survivorship function (KME) then we have been used the Log–Rank test for Comparison of two survival distributions then applied the Cox Proportional Hazards Model (CPHM). The data has been analyzed using the R program is obtaining all the results below.

8-2-Cancer morbidity and reported cases. In 2005, breast cancer occupied the first type of cancer among the Palestinian population (17.3%) with an incidence rate of 7.5 per 100,000 population. Lung cancer occupied the first type of male cancer; which constitute 13.8% of total males, cancer with an incidence rate of 5.2 per 100,000 males. However, Breast Cancer occupied the first type of female cancer (31.4%) with an incidence rate of 15.1 per 100,000 population.

The data for all breast cancer cases in the Gaza Strip were collected from El-shifa hospital. Missing data was obtained from the patients records to complete the data set required for survival analysis.

8-5-Variables of the study 1- Number of patients. 2-Birth date of patients. 3-Gender 4-Marital Status. 5-Address 6-Smoking 7-Date of the first diagnosis (Incidence). 8-Date of the end of follow up. 9-Status ( death or censoring). 10-First place for the emergence of tumor ( all Histology of Primary).

11-Laterality : which is breast that contains the histology primary tumor,( 1=Right, 2= Left ) 12-Treatment 1, surgery,( 1=given, 2= no given). 13-Treatment 2, Radiotherapy,( 1=given, 2= no given). 14-Treatment 3,Chemotherapy,( 1=given, 2= no given). 15-Treatment 4, Hormonal therapy,( 1=given, 2= no given). 16-Topography code ( all C50).

8-6-Survival Analysis of the data Nonparametric or distribution-free methods are quite easy to understand and apply. However they are less efficient than parametric methods when survival times follow a theoretical distribution and more efficient when no suitable theoretical distributions are known for the data. In addition, the variable time of survival of patients do not follow the normal distribution or any distribution from the exponential family.

8-6-1-Kaplan-Meier Estimation of survivorship function (KME) A set of 103 breast cancer patients was given by AL –Shefa hospital (cancer registry ) from 2000 to 2005.Those breast cancer patients join a clinical study at the beginning of year 2000.By the end of the study among them only 38 patients die and 56 patients censoring. Their Survival time is computed from time of diagnosis in days. Table (1) below lists the survival times t in days for those cases who die by the end of the study.

Kaplan-Meier Estimation of survivorship function (KME) is estimated following the formula ( 3.11). The computations had been carried out using the R statistical program and the results are displayed in table (1). We note that the Kaplan-Meier estimation of survivorship function (KME) has an inverse relationship with the variable time (t).

Similar to other estimators, the standard error (S.E.) of the Kaplan Meier estimator of gives an indication of the potential error of by formula (3.16), The confidence interval deserves more attention than just the point estimate. A 95% confidence interval for is estimated by.This also has been calculated using the R program, and the results are illustrated in table (1).

Table (1 ),Kaplan-Meier Estimation of survivorship function (KME) Estimate NODAYS STATUS CUMULATIVE PROPORTION SURVIVING AT THE TIME LOWER 95% CI UPPER 95% CI HAZARD Std. Error 152event event event event event event event event event event event event event event event event event

NODAYSSTATUSCUMULATIVE PROPORTION SURVIVING AT THE TIME LOWER 95% CI UPPER 95% CI HAZARD Std. Error 18883event event event event event E3event E3event E3event E3event E3event E3event E3event E3event E3event E3event E3event E3event E3event E3event E3event E3event

For breast cancer data in the Gaza Strip, the mean survival time is estimated by 1751 days using formula (3.19). and the standard error of mean survival time given by using formula (3.20) is estimated by , which is indicated in table (2). Table(2), Means for Survival Time MEANS FOR SURVIVAL TIME Mean95% Confidence Interval EstimateStd.ErrorLower BoundUpper Bound 1.751E

The estimated median survival time is the 50th percentile, which is the value of t at, The median survival time for breast cancer cases in the Gaza Strip is approximately 2140 days at, which is indicated in table (3) below. Table(3 ), Median for Survival Time RECORDSN. MAXN. STARTEVENTSMEDIAN

Theoretically,the estimator of survival function which is plotted in graph (1) is expected to appears as a step function since it remains constant between two observed exact survival times. However, The most commonly used summary statistic in survival analysis is the median survival time. The median survival time ( =2140 days ) is estimated from the survival curve. The estimated mean survival time( =1751 days ) can be seen to equal the area under the estimated survivorship function as described by formula (3.18).

Graph (1 ), Kaplan-Meier estimate of the survivorship function for the data in Table (1) and its 95% confidence intervals.

Graph (2 ) and table (1) of the estimated hazard function show that the death rate( hazard function ) is low in the first 52 days after diagnosis. From the end of the first 52days to the beginning of the day one thousand (1000), the death rate (hazard function ) remains increasing continuously between 0.02 and However, after the day 1000 to the end of day (1890) the death rate (hazard function ) remains increasing continuously, between 0.25 and 0.40.But after the day (1890) to end of the day (2140), the death rate (hazard function) increased rapidly from 0.43 to Generally speaking,the hazard rate is generally high after the day 52, and rapidly increasing until the day 2140 from to

Graph (2) Hazard function for breast cancer patients in the Gaza Strip.

8-6-2-The Log–Rank test for Comparison of two Survival Distributions The problem here is to compare survival times of two groups of patients of breast cancer exposed to four different treatments ( Surgery, Radiotherapy,Chemotherapy, Hormonal Therapy) by comparing the survivorship function and hazard function of the two groups. The following survival data for 103 females with breast cancer, contains two groups, the first group contains the patients of ages less than 50 years old and the second group contains patients with ages greater or equal to 50 years old.

Survival times are estimated for both groups from time of diagnosis in days. Table (4) lists the survival times t in days. Kaplan-Meier Estimation of survivorship function (KME) is computed, in table(4). Similar to other estimators, the standard error (S.E.) of the Kaplan Meier estimator of and A 95% confidence interval for is also estimated in table (4).

Table (4),Kaplan-Meier Estimation of survivorship function (KME) for two groups of breast cancer cases in the Gaza Strip. NODAYSSTATUSCUMULATIVE PROPORTION SURVIVING AT THE TIME LOWER 95% CI UPPER 95% CI DIFFERENCEHAZARD Std. Error 1447event less than event less than event less than event less than event less than event less than event less than event less than event less than event less than event less than E+03event less than E+03event less than E+03event less than E+03event less than E+03event less than E+03event less than

NODAYSSTATUSCUMULATIVE PROPORTION SURVIVING AT THE TIME LOWER 95% CI UPPER 95% CI DIFFERENCEHAZARD Std. Error E+03event less than E+03event less than event greater or equal event greater or equal event greater or equal event greater or equal event greater or equal event greater or equal event greater or equal event greater or equal event greater or equal event greater or equal event greater or equal E+03event greater or equal E+03event greater or equal E+03event greater or equal E+03event greater or equal E+03event greater or equal E+03event greater or equal E+03event greater or equal E+03event greater or equal

The estimated mean survival time for the first group is 1583 days, and the standard error of the mean survival time is However, the estimated mean survival time or the second group is 1832, and the standard error of the mean survival time is Moreover, the estimated mean survival time for all patients is 1751 days and the standard error of the mean survival time for all patients is The above results are illustrated in table (5) below.

Table (5), Means for Survival Time for two groups of breast cancer in the Gaza Strip DIFERENT MEAN EstimateStd. Error95% Confidence Interval Lower BoundUpper Bound less E more E Overall1.751E

For the remission data, the log–rank statistic computed using formula (3.24), is and indicated in table (6) and the corresponding P-value is.014 which indicates that the null hypothesis should be rejected.The null hypothesis being tested is that there is no overall difference between the two survival curves. We can therefore conclude that the first group and the second group are significantly different (KME) survival curves. Table (6), Test of equality of survival distributions for the different levels of different by Log Rank test OVERALL COMPARISONS Chi-SquaredfSig. Log Rank (Mantel-Cox)

Now, plots of the (KME) curves for the first group together with the second group are shown here in the graph (3) below. Notice that the (KME) curve for the second group is consistently higher than the (KME) curve for the first group. These results indicate that the second group, has better survival and better response to treatment than first group. Moreover, as the number of days increases, the two curves appear to get further apart, which indicate that the effect of treatment on the second group is greater than the effect of treatment on the first group to stay in remission.

Graph (3), Kaplan-Meier estimate of the survivorship function for the data in table (5.24) for two groups

The graph (4) of the estimated hazard function shows that the death rate for both groups are low in the first 750 days after diagnosis. After 750 days to the end, the death rate remains increasing continuously for both groups, but the first group contains patients with ages less than 50 years old with death rates between ( ) while the second group contains patients with ages greater or equal to 50 years old with death rates between ( ). The hazard rate is generally high and increase continuously for first and second group, but the hazard rate for the first group is higher than the second group.Notice that the second group contains patients who have ages greater or equal to 50 years old.

Graph (4) Hazard function for first and second group

8-6-3-Cox Proportional Hazards Model (CPHM) The Formula for the Cox Proportional Hazards Model We are thus considering a problem involving four explanatory variables as predictors of survival time T, where T denotes days until going out of remission “death” and we label the explanatory variables (two groups of breast cancer patients), with 34 patients in the first group which contains the patients with ages less than 50 years old and 69 patients in the second group which contains patients who have ages greater or equal to 50 years old.

The data set also contains three variables :. =Lateral: the breast that contains the histology primary tumor,(1=Right, 2= Lift) =Surgery : breast sparing surgery, ( 1=given, 2= not given). = Hormonal therapy,( 1=given, 2= not given).

The outcome variable for the model is the time in days until a patient goes out of remission (died). We have been described the final model and their results concerning breast cancer cases in the Gaza Strip. We now describe final model and their results concerning breast cancer cases in the Gaza Strip

The method of estimation used to obtain the coefficients for the final model is maximum likelihood (ML) estimation. Notice that the p-value of which is obtained for the coefficient of Groups ages indicates that there is significant effect for that variable. Moreover, the p-value of is obtained for the coefficient of Surgery, which indicates that there is significant effect for that variable. This Z statistic is known as a Wald statistic. All the above results can be found in the table ( 7 ) above.

Table(7) Variables in the equation UPPER.95LOWER.95Z SE(COEF) HRCOEF Groups Ages Surgery likelihood =

Finally,we consider final model for the remission data. The fitted model written in terms of the hazard function using formula (4.1) is given by.

Adjusted Survival Curves Using the (CPHM) Typically, when computing adjusted survival curves, the value chosen for a covariate being adjusted is an average value like an arithmetic mean or median. In fact, most computer programs for the Cox model automatically use the mean value over all subjects for each covariate being adjusted.A general formula for the adjusted survival curve for all covariates in the model following formula (4.19) is given by:

To obtain the adjusted survival curve, we then substitute the mean values in the formula in the model fitted. The formula and the resulting expression for the adjusted survival curve are shown below and the results of application of the adjusted survival carve are given in the third column of the table (8)

Table (8), Adjusted Survival function Using the Cox PH Model NOTIMESURVIVALS.ELOWER 95% CI UPPER 95% CI Baseline Cum Hazard Baseline survivorship function Cumulative hazard function

NOTIMESURVIVALS.ELOWER 95% CI UPPER 95% CI Baseline Cum Hazard Baseline survivorship function Cumulative hazard function E E E E E E E

Graph (5) adjusted survival curves obtained from fitting a Cox model

9-The recommendations 1- We recommend reactivation and rehabilitation of the Cancer Registry Center in Palestine to know the exact oncology cases and its different diagnostic sources in order to define the problem and its spreading reasons. 2- Develop the use of (Ten International Classification ICD10) which is related to diseases, deaths and disabled people. Using such kind of classification required training for doctors, health professionals and data entry persons.

3- We recommend establishing an advanced system for record registry of causes of death on death certificates. 4- Development of cooperation and full coordination between system and information Department and Cancer Registry Center for monitoring and recording of cases of tumors in the Ministry of Health through a sophisticated electronic system.

5- It is important to develop the Cancer Patients data in cancer Registry Center. Furthermore, it is required to improve the cooperation between Ministry of Health and Palestinian Central Bureau of Statistics to minimize the gaps of indicators which depend on the Ministry of health data and estimated indicators of the PCBS which come as a result of health surveys. 6- We recommend applying the Kaplan-Meier Estimation of survivorship function (KME ) and estimated mean survival time for all cancer patient with confidence interval for.

7- We recommend the determination of the relationship between (KME ) and time for all cancer patients and determination of the relationship between hazard ratio and time for all cancer patients, using the statistics program R. 8- A clinical oncologist may be interested in comparing the ability of two or more treatments to prolong life or maintain health for two group from patients ages. Almost invariably, survival times of the different groups vary. Therefore, we recommend using the Log–Rank Test For Comparison of two survival distributions for all cancer patients.

9- We recommend using the Cox proportional hazards model (CPHM), for analyzing survival data, that contain the most important variables as predictors of survival time T, where T denotes days until going out of remission “death or survive”, for all cancer patients in Palestine.

قال رسول الله صلى الله عليه وسلم : (( إن الله يحب إذا عمل أحدكم عملا أن يتقنه )).