Early Detection of Disease Outbreaks with Applications in New York City Martin Kulldorff University of Connecticut Farzad Mostashari and James Miller New York City Department of Health
Content Prospective Disease Surveillance The Spatial Scan Statistic Thyroid Cancer Incidence in New Mexico Dead Birds and West-Nile Virus Surveillance in New York City Other Current Applications
Prospective Surveillance For a pre-specified geographical area, there are existing statistical methods for the detection of a sudden disease outbreak, e.g. CUSUM methods.
Three Important Issues An outbreak may start locally. Can be used simultaneously for multiple geographical areas, but that leads to multiple testing. Disease outbreaks may not conform to the pre-specified geographical areas.
Level of Aggregation If too little geographical aggregation, the rates are statistically unstable. With aggregation, there is arbitrariness in the boundaries chosen.
Step One: Purely Spatial Scan Statistic Pre-determined time period. Geographical areas with observed number of cases and population at risk. Evaluate overlapping circles of different sizes at different locations.
One-Dimensional Scan Statistic
The Spatial Scan Statistic Move a circular window across the map. Use a variable circle radius, from zero up to a maximum where 50 percent of the population is included.
A small sample of the circles used
For each circle: – Obtain actual and expected number of cases inside and outside the circle. – Calculate Likelihood Function Compare Circles: – Pick circle with maximum likelihood. This is the most likely cluster, i.e., the cluster least likely to have occurred by chance. Inference: – Generate random replicas of the data set under the null- hypothesis of no clusters (Monte Carlo sampling). – Compare most likely clusters in real and random data sets (Likelihood ratio test).
Spatial Scan Statistic: Properties –Adjusts for inhomogeneous population density. –Simultaneously tests for clusters of any size and any location, by using circular windows with continuously variable radius. –Accounts for multiple testing. –Possibility to include confounding variables, such as age, sex or socio-economic variables. –Aggregated or non-aggregated data (states, counties, census tracts, block groups, households, individuals).
Example: Thyroid Cancer Incidence in New Mexico Data Source: New Mexico Tumor Registry Gender: Male Aggregation Level: 32 Counties Adjustments for: Age and Temporal Trends
Thyroid Cancer Median age at diagnosis: 44 years United States (SEER) incidence: 4.5 / 100,000 United States mortality: 0.3 / 100,000 Five year survival: 95% Known risk factors: Radiation treatment for head and neck conditions. Radioactive downfall (Hiroshima/Nagasaki, Chernobyl, Marshall Islands) Work as radiologic technician (USA) or x-ray operator (Sweden).
Purely Spatial Scan Statistic 1978 analysis Data Years Cases Expected RRp=Most Likely Cluster Bernadillo,Valencia
Time-Periodic Surveillance New cases are added on a yearly basis. Reanalysis when new data arrives. Data Available: Surveillance Starts: 1978 Total Cases: 333
Purely Spatial Scan Statistic Data Years CasesExpectedRRp=Most Likely Cluster Bernadillo,Valencia Bernadillo,Valencia Bernadillo,Valencia Bernadillo,Valencia Bernadillo,Valencia Bernadillo,Valencia North Central Counties North Central - SanMiguel North Central + Colfax,Harding North Central + Colfax,Harding North Central - SanMiguel North Central + Colfax,Harding North Central + Torrance North Central + Colfax,Harding North Central + Torrance North Central Counties = Bernadillo, Los Alamos, Mora, Rio Arriba, Sandoval, San Miguel, Santa Fe and Taos.
North Central Counties
Two Problems While we are adjusting for the multiple testing stemming from many possible cluster locations and cluster sizes, we are not adjusting for the multiple testing due to repeated analyses every year. Low power to quickly detect emerging clusters.
Solution: Space-Time Scan Statistic
Detecting Emerging Clusters Instead of a circular window in two dimensions, we use a cylindrical window in three dimensions. The base of the cylinder represents space, while the height represents time. The cylinder is flexible in its circular base and starting date, but we only consider those cylinders that reach all the way to the end of the study period. Hence, we are only considering ‘alive’ clusters.
Hypothesis Test Find Likelihood for Each Choice of Cylinder Through Maximum Likelihood Estimation, Find the Most Likely Cluster Apply Likelihood Ratio Test Evaluate Significance Through Monte Carlo Simulation
Space-Time Scan Statistic Alive Clusters YearsMost Likely Cluster CasesCluster Period Expected RRp=p= LosAlamos, Rio Arriba Bernadillo + 7 counties West LosAlamos, Rio Arriba North Central – SanMiguel North Central – SanMiguel Bernadillo, Valencia North Central Counties = Bernadillo, Los Alamos, Mora, Rio Arriba, Sandoval, San Miguel, Santa Fe and Taos North Central Lincoln North Central + Colfax, Harding North Central + Colfax, Harding North Central – SanMiguel North Central + Colfax,Harding LosAlamos, RioArriba, SantaFe, Taos LosAlamos
Los Alamos
Space-Time Scan Statistic Alive Clusters YearsMost Likely Cluster CasesCluster Period Expected RRp=p= LosAlamos, Rio Arriba Bernadillo + 7 counties West LosAlamos, Rio Arriba North Central – SanMiguel North Central – SanMiguel Bernadillo, Valencia North Central Counties = Bernadillo, Los Alamos, Mora, Rio Arriba, Sandoval, San Miguel, Santa Fe and Taos North Central Lincoln North Central + Colfax, Harding North Central + Colfax, Harding North Central – SanMiguel North Central + Colfax,Harding LosAlamos, RioArriba, SantaFe, Taos LosAlamos LosAlamos
Problem We have still not adjusted for repeated time- period analyses conducted every year.
Adjusting for Yearly Surveillance While interest is only in ‘alive’ clusters, the p-value will be calculated based on the probability of obtaining a likelihood higher than the observed for any cylinder used during the present or past analyses. This is done using a space-time scan statistic evaluating all cylinders irrespectively of start and end year.
Adjusting for Yearly Surveillance The Los Alamos Cluster 1991 Analysis: p=0.13 (unadjusted p=0.02) 1992 Analysis: p=0.016 (unadjusted p=0.002)
Los Alamos cases
Thyroid Cancer in Los Alamos The New Mexico Department of Health have investigated the individual nature of all 17 male thyroid cancer cases reported in Los Alamos All were confirmed cases.
Thyroid Cancer in Los Alamos 3/17 had a history of therapeutic ionizing radiation treatment to the head and neck. 8/17 had been regularly monitored for exposure to ionizing radiation due to their particular work at the Los Alamos National Laboratory. 2/17 had had significant workplace-related exposure to ionizing radiation from atmospheric weapons testing fieldwork. A know risk factor, ionizing radiation, is hence a likely explanation for the observed cluster.
West Nile Virus Surveillance in New York City 2000 Data: Simulation/Testing of Prospective Surveillance System 2001 Data: Real Time Implementation of Daily Prospective Surveillance
2000 Data - Dead birds reported by the public - Simulation of a daily prospective surveillance system - Start date: June 1, 2000.
Major epicenter on Staten Island Dead bird surveillance system: June 14 Positive bird report: July 16 (coll. July 5) Positive mosquito trap: July 24 (coll. July 7) Human case report: July 28 (onset July 20)
2001 Data Real time prospective surveillance Daily analyses starting June 22
Syndromic Surveillance Symptoms of disease such as diarrhea, respiratory problems, headache, etc Earlier reporting than diagnosed disease Less specific, more noise
Hospital Emergency Admissions in New York City Hospital emergency admissions data from a majority of New York City hospitals. At midnight, hospitals report last 24 hour of data to New York City Department of Health A spatial scan statistic analysis is performed every morning If an alarm, a local investigation is conducted
Other Syndromic Surveillance Data Sources 911 Ambulance Dispatches Pharmacy Sales Employee Sick Leave Physician Visits Veterinary Clinic Visits
Conclusions The space-time scan statistic can serve as an important tool in prospective systematic time- periodic geographical surveillance for the early detection of disease outbreaks. It is possible to detect emerging clusters, and we can adjust for the multiple tests performed over the years. The method can be used for different diseases.
References Early Detection: Kulldorff M. Prospective time-periodic geographical disease surveillance using a scan statistic. J Royal Statistical Society, A164:61-72, West Nile: Mostashari F, Kulldorff M, Miller J. Dead bird clustering: An early warning system for West Nile virus activity. Under review. Software: Kulldorff M, Rand K, Gherman G, Williams G, DeFrancesco D. SaTScan v.2.1: Software for the spatial and space-time scan statistics. Bethesda, MD: National Cancer Institute,