Comparing Cox Model with a Surviving Fraction with regular Cox model

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Surviving Survival Analysis
Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
Survival Analysis In many medical studies, the primary endpoint is time until an event occurs (e.g. death, remission) Data are typically subject to censoring.
Survival Analysis-1 In Survival Analysis the outcome of interest is time to an event In Survival Analysis the outcome of interest is time to an event The.
Informative Censoring Addressing Bias in Effect Estimates Due to Study Drop-out Mark van der Laan and Maya Petersen Division of Biostatistics, University.
Integration of sensory modalities
Goodness of Fit of a Joint Model for Event Time and Nonignorable Missing Longitudinal Quality of Life Data – A Study by Sneh Gulati* *with Jean-Francois.
Evaluating Hypotheses
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Visual Recognition Tutorial
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Modeling clustered survival data The different approaches.
Accelerated Failure Time (AFT) Model As An Alternative to Cox Model
Model Checking in the Proportional Hazard model
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
Estimating cancer survival and clinical outcome based on genetic tumor progression scores Jörg Rahnenführer 1,*, Niko Beerenwinkel 1,, Wolfgang A. Schulz.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
On ranking in survival analysis: Bounds on the concordance index
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Bayesian Analysis and Applications of A Cure Rate Model.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Yaomin Jin Design of Experiments Morris Method.
INTRODUCTION TO SURVIVAL ANALYSIS
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Borgan and Henderson:. Event History Methodology
Lecture 13: Cox PHM Part II Basic Cox Model Parameter Estimation Hypothesis Testing.
HSRP 734: Advanced Statistical Methods July 17, 2008.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
Modeling Cure Rates Using the Survival Distribution of the General Population Wei Hou 1, Keith Muller 1, Michael Milano 2, Paul Okunieff 1, Myron Chang.
HSRP 734: Advanced Statistical Methods July 31, 2008.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
Censoring an observation of a survival r.v. is censored if we don’t know the survival time exactly. usually there are 3 possible reasons for censoring.
Lecture 12: Cox Proportional Hazards Model
Introduction Sample Size Calculation for Comparing Strategies in Two-Stage Randomizations with Censored Data Zhiguo Li and Susan Murphy Institute for Social.
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 6.1: Recurrent event data Intensity processes and rate functions Robust.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Lynette.
Copyright © Cengage Learning. All rights reserved. 14 Goodness-of-Fit Tests and Categorical Data Analysis.
SURVIVAL ANALYSIS PRESENTED BY: DR SANJAYA KUMAR SAHOO PGT,AIIH&PH,KOLKATA.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
Copyright © Cengage Learning. All rights reserved. 16 Quality Control Methods.
Carolinas Medical Center, Charlotte, NC Website:
Inference for a Single Population Proportion (p)
April 18 Intro to survival analysis Le 11.1 – 11.2
Survival curves We know how to compute survival curves if everyone reaches the endpoint so there is no “censored” data. Survival at t = S(t) = number still.
Notes on Logistic Regression
Survival Analysis: From Square One to Square Two Yin Bun Cheung, Ph.D. Paul Yip, Ph.D. Readings.
From: Tipping the Balance of Benefits and Harms to Favor Screening Mammography Starting at Age 40 YearsA Comparative Modeling Study of Risk Ann Intern.
Chapter 8: Inference for Proportions
Statistics 103 Monday, July 10, 2017.
Cox Regression Model Under Dependent Truncation
Integration of sensory modalities
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Statistics 262: Intermediate Biostatistics
If we can reduce our desire,
Chapter 13: Inference for Distributions of Categorical Data
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME Reza Shadmehr
EM Algorithm 主講人:虞台文.
Kaplan-Meier survival curves and the log rank test
Presentation transcript:

Comparing Cox Model with a Surviving Fraction with regular Cox model Liping Huang & Yumin Zhao

Introduction: In some clinical follow-up studies, a positive proportion of patients who respond favorably to treatment appear subsequently to be free of any signs or symptoms of the disease and may be considered “cured” . While the remaining subjects who are susceptible to the event are referred to as “uncured” .

A typical case: A clinical study of breast cancer patients, analyzed by Farewell (1986) In this study, the time to relapse or death for three treatments was observed from 139 patients. Four covariates, clinical stage, pathological stage, histological stage and the number of lymph nodes having disease involvement were also observed for each patient. The Kaplan-Meier survival curves of patients from the three treatment groups are given in Figure 1.

All of them are above 40%. Of particular interest here is the curve for treatment B, which levels off at 73%. At the tails of these curves a number of long-term censored observations exist, which correspond to patients who may be cured in each of 3 groups.

Objective: compare the most popular cure model, mixture model, with the regular Cox model, in which, we assume all observations are susceptible to event of interest, in other words, all observations are “uncured”. Statistical tools: SAS Splus

Notations: Let T be a nonnegative random variable denoting the time to relapse or death due to the disease. ---probability density function of T ---survival function of T where x, z are two observed covariate vectors on which the distribution of T may depend; ---probability density function of T of uncured patients ---survival function of T of uncured patients where x is a covariate vector on which T of uncured patients may depend; ---probability density function of T of cured patients ---survival function of T of cured patients. Because cured patients will never experience a relapse or death due to the disease, survival time is infinite. So, for all finite values of t

Let U be an indicator of uncured patients, i. e Let U be an indicator of uncured patients, i.e. U=1 if the patient is not cured and U=0 otherwise. Let be the probability of being uncured given a covariate vector z. Farewell (1982, 1986) modeled the distribution of cure rate, defined as the proportion of cured observations in the population, following a logistic model: where b0 is a scalar parameter, b is a row vector of parameters and z a column vector of covariates. Then the mixture model is given as follows: (1) (2)

The proportional hazards assumption which is used in regular Cox model is also used in cure model to describe the effect of x on the distribution of the failure time of uncured patients. That is, the hazard function of an uncured patient with the covariate x at time t, denoted by hu (t | x), is given as where is an arbitrary unspecified baseline hazard function.

Suppose we have data in the form (ti,δi,xi,zi), i=1,2,…,n, where ti denotes the observed survival time for the ith patient, δi is the censoring indicator with 0 if ti is censored and 1 otherwise, and xi and zi are observed values of the two covariate vectors. The likelihood function for the mixture model is Plug in (1) and (2), we have (3)

a vector u=(u1,…,un) is defined where ui is the value of U for the ith patient. Recall, ui=1 if the patient is uncured and zero otherwise, i=1, 2,…, n. Obviously, the vector u is partially missing information because if δi =1, then ui=1, but if δi =0, ui is not observable and it can be 1 or 0. So ui is latent variable in our case. Given ui, i.e., the complete data are available, the complete likelihood function is (4) Comparing (3) and (4), If ui=1, then π(zi)= 1 and 1-π(zi)=0, so (3) and (4) is reduced to If ui=0, then π(zi)= 0 and 1-π(zi)=1, so (3) and (4) is reduced to Therefore, (4) which takes account of ui is equivalent to (3) which does not.

An expectation-maximization (EM) algorithm is used to find the maximum likelihood estimates of parameters in probalilistic models, where the model depends on unobserved latent variables. EM alternates between performing an expectation (E) step, which computes an expectation of the likelihood by including the latent variables as if they were observed, and a maximization (M) step, which computes the maximum likelihood estimates of the parameters by maximizing the expected likelihood found on the E step. The parameters found on the M step are then used to begin another E step, and the process is repeated. The E-step in the EM algorithm calculates the expectation of (4) for given the current estimates of fu(ti|xi), Su(ti|xi) and π(zi), which is the sum of following functions: (5) (6) where gi is the expectation of ui conditional on the current estimates of Su(t|x) and π(zi), given by (7)

which is the probability of the ith patient being uncured which is the probability of the ith patient being uncured. Therefore, the E-step of the EM algorithm for this problem consists of assigning the probability gi to each patient. The M-step of the EM algorithm consists of maximizing (5) and (6) with respect to fu(.), Su(.) and b0, b for fixed gi. The advantage of using EM algorithm here is that the maximum likelihood estimates of the failure time distribution of uncured patients and b0, b can be obtained separately because (5) only depends on b0 and b while (6) only depends on the failure time distribution of uncured patients. Following Kalbfleisch and Prentice(1973), (6) can be approximated by If all patients are uncured, then gi=1. In this case, (8) reduces to the usual likelihood function used in Cox’s PH model which is (8)

Simulation We let x=z Control group: x=0 Trt group: x=1 Logistic parameters: b0=2 and b=-1 Control group uncured rate: 0.8808 Trt group uncured rate: 0.7311

Simulation We let ~ exp(1) and set Then ~exp(0.5) = -0.693 =0.5 Since x=1, so

Simulation We will use function called semicure from Dr. Yingwei Peng (http://www.math.mun.ca/~ypeng/research/) semicure(Surv(time, cens) ~ transplant, ~ transplant, data = goldman.data)

Simulation Function: present.txt Simulation results: recall the true beta is -0.693 and b0=2 and b=-1 Methods Mean(beta) Std(beta) Bias(beta) b0 b1 Coxph -0.809 0.362 0.116 -- Semicure -0.709 0.451 0.016 5.469 -3.123

Simulation Histogram of beta:

Data Analysis 45 breast cancer patients Breast data set from textbook. SURV ---survival time The variable x: x=1 if the tumor had a positive marker for possible metastasis and x=0 otherwise Sort data by survival time in increasing order

Data Analysis ---original breast cancer data Survival curve from SAS output 1.The curve is above 0.35 2.a number of long-term censored observations at the right tail

Data Analysis ---original breast cancer data Methods Coef Exp(coef) Se(coef) z p Coxph 0.909 2.48 0.501 1.82 0.069 semicure 1.29 3.63 0.623 2.07 0.038 For Coxph, p>0.05, metastasis is not significant in model survival time ---contradiction with medical expectation 2. For seimcure, p<0.05, metastasis is significant in model survival time ---consistent with medical expetation 3. Semicure gives higher hazard ratio compared to coxph

Data Analysis ---modified breast cancer data with higher surv prob Change observations 32, 33, 34, 37 to be censored to make higher survival probability and more number of long-term censored observations at the tail of curve

Data Analysis ---modified breast cancer data with higher surv prob Methods Coef Exp(coef) Se(coef) z p Coxph 1.22 3.38 0.622 1.96 0.05 semicure 0.242 1.27 0.629 0.384 0.70 For Coxph, p=0.05, metastasis is still not significant in model survival time ---result does not change a lot compared to original data set analysis 2. For seimcure, p>0.70, metastasis is not significant in model survival time ---reverse the conclusion compared to original data set analysis 3. Semicure gives lower hazard ratio compared to coxph

Data Analysis ---modified breast cancer data with lower surv prob Change observations 42, 43, 45 to be uncensored to make lower survival probability and less number of long-term censored observations at the tail of curve.

Data Analysis ---modified breast cancer data with lower surv prob Methods Coef Exp(coef) Se(coef) z p Coxph 1.00 2.72 0.496 2.01 0.044 semicure 0.976 2.65 1.97 0.049 Two methods give almost same results

Data Analysis Betas in above three data sets are positive, which means patients with metastasis increase hazard to die compared to patients without.

Conclusion 1.Theoretically,gi=1, semicure is reduced to regular cox. 2.If survival curve satisfy: High survival probabilty A number of censored obs at the end of curve(tail) then Cox model with surviving fraction(semicure) gives more accurate result 3.If survival curve does not satisfy a, b then considering surviving fraction or not will not influence conclusion