Cervical Cancer Case Study Presented by: University of Guelph Baktiar Hasan Mark Kane Melanie Laframboise Michael Maschio Andy Quigley.

Slides:



Advertisements
Similar presentations
Progress Against Melanoma. 1970–1979 Progress Against Melanoma 1970– : Hereditary syndrome linked to increased melanoma risk.
Advertisements

Dummy Dependent variable Models
If we use a logistic model, we do not have the problem of suggesting risks greater than 1 or less than 0 for some values of X: E[1{outcome = 1} ] = exp(a+bX)/
Logistic Regression Psy 524 Ainsworth.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Introduction Cure models within the framework of flexible parametric survival models T.M-L. Andersson1, S. Eloranta1, P.W. Dickman1, P.C. Lambert1,2 1.
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
Logistic Regression Part I - Introduction. Logistic Regression Regression where the response variable is dichotomous (not continuous) Examples –effect.
6.1.4 AIC, Model Selection, and the Correct Model oAny model is a simplification of reality oIf a model has relatively little bias, it tends to provide.
Goodness of Fit of a Joint Model for Event Time and Nonignorable Missing Longitudinal Quality of Life Data – A Study by Sneh Gulati* *with Jean-Francois.
Cervical Cancer Case Study Eshetu Atenafu, Sandra Gardner, So-hee Kang, Anjela Tzontcheva University of Toronto Department of Public Health Sciences (Biostatistics)
The SSC presented a data set on cervical cancer for analysis. Purpose of the analysis: determine the different attributes (covariates) for predicting relapse.
the Cox proportional hazards model (Cox Regression Model)
Part 21: Hazard Models [1/29] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Further Inference in the Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Modeling clustered survival data The different approaches.
Survival Analysis for Risk-Ranking of ESP System Performance Teddy Petrou, Rice University August 17, 2005.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
17. Duration Modeling. Modeling Duration Time until retirement Time until business failure Time until exercise of a warranty Length of an unemployment.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Failure of Treatment in Cervical Cancer Patients *Dr. Zohreh Yousefi fellow ship of gynecology oncology of Mashhad university Fatemeh Homaee, Marzieh.
Statistical approaches to analyse interval-censored data in a confirmatory trial Margareta Puu, AstraZeneca Mölndal 26 April 2006.
1 Multiple Imputation : Handling Interactions Michael Spratt.
Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding.
D:/rg/folien/ms/ms-USA ppt F 1 Assessment of prediction error of risk prediction models Thomas Gerds and Martin Schumacher Institute of Medical.
Bayesian Analysis and Applications of A Cure Rate Model.
Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.
Time-dependent covariates and further remarks on likelihood construction Presenter Li,Yin Nov. 24.
Cervical Cancer Case Study Supervising Professor: Dr. P.D.M. Macdonald Team Members: Christine Calzonetti, Simo Goshev, Rongfang Gu, Shahidul Mohammad.
Logistic Regression Database Marketing Instructor: N. Kumar.
Modelling Longitudinal Data Survival Analysis. Event History. Recurrent Events. A Final Point – and link to Multilevel Models (perhaps).
Deriving and Modelling Fertility Variables in the NCDS and BCS70 Dylan Kneale, Institute of Education Supervisors: Professor Heather Joshi & Dr Jane Elliott.
HSRP 734: Advanced Statistical Methods July 17, 2008.
Introduction to Multivariate Analysis Epidemiological Applications in Health Services Research Dr. Ibrahim Awad Ibrahim.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Evidence for a Survival Benefit Conferred by Adjuvant Radiotherapy in a Cohort of 608 Women with Early-stage Endometrial Cancer O. Kenneth Macdonald 1,
More complex event history analysis. Start of Study End of Study 0 t1 0 = Unemployed; 1 = Working UNEMPLOYMENT AND RETURNING TO WORK STUDY Spell or Episode.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Chap 6 Further Inference in the Multiple Regression Model
1 Chapter 16 logistic Regression Analysis. 2 Content Logistic regression Conditional logistic regression Application.
Multilevel and multifrailty models. Overview  Multifrailty versus multilevel Only one cluster, two frailties in cluster e.g., prognostic index (PI) analysis,
Treat everyone with sincerity,
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Lynette.
REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
[Topic 11-Duration Models] 1/ Duration Modeling.
Mamounas EP et al. Proc SABCS 2012;Abstract S1-10.
Chapter 15 Multiple Regression Model Building
Comparing Cox Model with a Surviving Fraction with regular Cox model
Deep Feedforward Networks
Logistic Regression APKC – STATS AFAC (2016).
INTRODUCTION The SSC presented a data set on cervical cancer for analysis. Purpose of the analysis: determine the different attributes (covariates) for.
Notes on Logistic Regression
STA 216 Generalized Linear Models
Review of survival models:
Parametric Survival Models (ch. 7)
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Choosing a test: ... start from thinking whether our variables are continuous or discrete.
Survival analysis Diagnostic Histopathology
Refining the Nodal Staging for Esophageal Squamous Cell Carcinoma Based on Lymph Node Stations  Jun Peng, MD, Wen-Ping Wang, MD, Ting Dong, MD, Jie Cai,
Treat everyone with sincerity,
Presentation transcript:

Cervical Cancer Case Study Presented by: University of Guelph Baktiar Hasan Mark Kane Melanie Laframboise Michael Maschio Andy Quigley

Objectives To determine an appropriate model for the prediction of recurrence of cervical cancer To classify future patients on their risk of recurrence of cervical cancer

Cervical Cancer Data Set The original data set included 905 cases Patients were removed from the data set if they had ANY of the following: Were NOT free of the disease after surgery  845 Cases remain NO follow up date ZERO survival time

Modeling Methods Mixture Model with Accelerated Failure time –Peng and Debham (1998) Cox Proportional Hazard Model Latent Variable Model Bayesian Survival Analysis –Seltman, Greenhouse, and Wassserman (2001) –Chen, Ibrahim, and Sinha (1999)

Mixture model The model we chose for modeling time to recurrence is a mixture model of the form: S(t)=pS u (t) + (1-p) F(t)=pF u (t) Benefits: Allows for cure rate Covariates can be incorporated into survival time [S u (t)] AND\OR cure rate [1-p]

Mixture Model (Con’t) The model can be fit using a S-plus library (GFCURE) written by Peng. Further details about the library and the model can be found in Peng et al. (1998) and Maller and Zhou (1996). It should be mentioned that we found an error in the S-plus library written by Peng. The function pred.gfcure has a small error which can cause the program to crash or produce incorrect predicted values in some situations.

“Immunes” and Sufficient Follow up Maller and Zhou (1996) suggest tests to examine the hypotheses of: –Presence of “immunes” in the data set –Sufficient follow up time In the data set, it was found that immunes were present and there was not strong evidence to suggest that follow up time was insufficient

Missing Covariates It was noticed that a large proportion of the cases (≈40%) had at least one covariate with a missing value Various methods to handle this situation include: –Ignoring cases with missing covariate data –Maximum Likelihood Methods Chen and Ibrahim (2001)

Missing Covariates (Con’t) We chose to perform variable selection on only the cases that contain no missing covariates (n=534). BIAS introduced ??? CHECK: compare distributions of covariates in “full” and “reduced” data sets NO significant bias was introduced

Distribution A variety of distributions were considered for modeling recurrence time including Weibull, gamma, lognormal, log- logistic, extended generalized gamma and generalized F. From comparing the distributions using AIC for the above models, there was little improvement from fitting a distribution with 3 or 4 parameters versus a 2 parameter distribution. Of the 2 parameter distributions considered the Weibull distribution surfaced as the best distribution in terms of likelihood and prediction of the cure rate.

Variable Selection Stepwise variable selection was performed using the 534 patients previously mentioned; AIC was used as the entering criterion. Variables were allowed to enter both the cure rate portion of the model and survival time portion of the model. The final model chosen uses the explanatory variables pelvis lymph node involvement (PELLYMPH) and size of tumor (SIZE) to model the survival time of uncured patients and uses Capillary Lymphatic Spaces (CLS) and depth of tumor (MAXDEPTH) to predict cure rate.

Variable Selection (Con’t) It should be noted that CLS was modeled as a continuous variable rather than discrete because twice the difference of log likelihoods from modeling CLS as continuous versus discrete is Interactions of the significant covariates in the chosen model were also considered, but were found to be non-significant.

Chosen Model VariableCoefficientS.E.p-value Terms in accelerated failure time model PELLYMPH SIZE < Terms in the logistic model CLS MAXDEPTH

Interpretation of the Model The negative coefficient of PELLYMPH indicates that uncured patients found positive for pelvis lymph node involvement will have a lower recurrence time than patients found negative for pelvis lymph node involvement. The coefficient of SIZE is also negative, which means that for uncured patients, larger tumor size corresponds to quicker recurrence of cancer. The positive value of CLS in the cure rate portion of the model indicates that patients with a positive prognosis have a higher probability of recurrence. The coefficient of MAXDEPTH is also positive, indicating that patients with a large tumor depth have a higher probability of recurrence.

Model Validation In order to determine how well the chosen model will predict future patients, the data was randomly split into two subsets. Since it is not known if a patient who did not relapse was cured or censored it is not possible to compare the predicted probability of recurrence with the actual probability of recurrence. A graphical method was utilized for determining how well the predicted probabilities performed.

Model Validation (Con’t) The graphical method involved predicting the probability of recurrence before time t i (F(t)) for a number of chosen times. This prediction is smoothed against recurrence, which is 1 if recurrence occurred before time t i or 0 if recurrence has not occurred before time t i A criticism of this graphical method is that it is possible for a patient with a survival time less than t i but no recurrence to have a recurrence between their censored survival time and t i so they should have been coded as a 1 not a zero for the graph.

Classification The second objective is to classify patients into 3 groups: Low relapse, Moderate relapse, and High relapse. We classified patients based on their estimated cure rate from the final model previously mentioned. Low relapse: estimated cure rate ≥ 94% Moderate relapse: 84% < estimated cure rate < 94% High relapse: estimated cure rate ≤ 84%

Conclusions We found that the attributes Capillary Lymphatic Spaces and depth of tumor are important for predicting the probability of relapse and pelvis lymph node involvement and size of tumor are important for predicting the survival time of uncured patients. We used these attributes in a Weibull mixture model to classify patients according to their risk of recurrence.

References Chen, M., and Ibrahim, J. (2001), “Maximum likelihood methods for cure rate models with missing covariates” Biometrics, 57, Chen, M., Ibrahim, J., and Sinha, D. (1999), “A new bayesian model for survival data with a surviving fraction” JASA, 94, Maller, R., and Zhou, X. (1996), Survival Analysis with Long-Term Survivors. Toronto: John Wiley & Sons. Peng, Y., Dear, K., and Debham, J. (1998), “A generalized F mixture model for cure rate estimation” Statistics in Medicine, 17, Seltman, H., Greenhouse, J., and Wasserman, L. (2001), “Bayesian model selection: analysis of a survival model with a surviving function” Statistics in Medicine 20,