A Spatial Scan Statistic for Survival Data Lan Huang, Dep Statistics, Univ Connecticut Martin Kulldorff, Harvard Medical School David Gregorio, Dep Community Medicine, Univ Connecticut
Motivation and Background What is the geographical distribution of prostate cancer survival in Connecticut? Are there geographical clusters with exceptionally short or long survival?
Survival Data For each person: Time of diagnosis. Whether dead or censored Time until death/censoring Residential geographical coordinates Age etc
Motivation and Background Spatial scan-statistics with Bernoulli and Poisson models are designed for count data. Length of survival is continuous data. Survival data is often censored.
Solution Spatial Scan Statistic using an Exponential Probability Model
Methodology Exponential model based spatial statistic H 0: θ in = θ out H a: θ in θ out Exponential likelihood Spatial scan-statistic distribution Permutation test Stat inference Hypothesis test Detect a significant cluster
Methods Evaluation Location of 610 Connecticut prostate cancer patients diagnosed in patients in southwest Connecticut constitute a cluster with shorter survival (cluster radius: 8.65 km) Each of the 610 patients assigned a random survival or censoring time using different distributions inside and outside the cluster
Model Evaluation Exponential Gamma Log-normal θ in θ out θ diff Non-cen censored random fixed 610 individuals =
#individuals inside the true cluster, successfully detected for the simulated datasets without censoring θ diff P-value<0.05 s
#individuals inside the true cluster, successfully detected for censored datasets with fixed censoring time θ diff P-value<0.05 s
#individuals inside the true cluster, successfully detected for censored datasets with random censoring time P-value<0.05 θ diff s
Model Evaluation Exponential model is robust, since the exponential based scan statistic is able to reject the null hypothesis with a low p-value when the distribution difference is moderate or large, no matter the distribution and censoring mechanism.
Application to Prostate Cancer Data Between 1984 and 1995, the Connecticut Tumor registry recorded invasive prostate cancer incidence cases among the population-at-risk (roughly 1.2 million males 20+ years old in 1990) records available after data cleaning. Follow-up through December had died and 8753 were censored.
Significant clusters using exponential model
clusterIn clusterRRLLRP #death#indivi Short survival Long survival Application to Prostate Cancer Data
Covariate Adjustment Younger patients may live longer Geographical variation in histology or stage
Significant clusters after age-adjustment
Discuss Exponential model works well for censored and non- censored survival data from difference distribution, but probably no do well for all continuous variables, like data that is approximated normally distributed. The statistical inference is valid even though the survival times are not exponentially distributed because of the permutation based test procedure.
Discussion The covariate adjustment method here is based on the exponential model, assuming a constant hazard. It could be extended to non-constant hazard with several levels, or as a function of survival time associated with different kind of models. It could be extends to a space-time scan statistic when time series data are available. It could also be extended to create a scan-statistic with elliptical or other cluster shapes. Unfortunatly, no statistical software available.