Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards Improved Sensitivity, Specificity, and Timeliness of Syndromic Surveillance Systems Anna L. Buczak, PhD, Linda J. Moniz, PhD, Joseph Lombardo,

Similar presentations


Presentation on theme: "Towards Improved Sensitivity, Specificity, and Timeliness of Syndromic Surveillance Systems Anna L. Buczak, PhD, Linda J. Moniz, PhD, Joseph Lombardo,"— Presentation transcript:

1 Towards Improved Sensitivity, Specificity, and Timeliness of Syndromic Surveillance Systems Anna L. Buczak, PhD, Linda J. Moniz, PhD, Joseph Lombardo, MS PHIN 2008, Session F7, August 27, 2008 This work was supported by the JHU/APL Internal Research and Development (IR&D) Program

2 2 Outline  Motivation  Classical disease outbreak detection methods  Novel machine learning methodology for disease outbreak detection  Results  Conclusions  Future directions

3 3 Motivation  Develop methodology for reliable detection of disease outbreaks  Most of the existing methods use univariate statistics i.e. look at each of the syndrome/ subsyndrome/ age/ gender, etc. combination separately  proliferation of false alarms  Our goal: develop multivariate models for detecting abnormal relationships between time series  reduced false alarms

4 4 Approaches to Outbreak Detection  Two broad types of approaches:  Anomaly detection  Detectors flag any anomalous behavior  Statistical methods: CUSUM, C1, C2, C3, EWMA  Machine learning methods: clustering techniques, SVMs  Specific disease outbreak detection  Detectors are geared towards a specific disease  In depth knowledge about given disease manifestation needed  Separate model needed for each disease  Methods: Bayesian networks, Markov Decision Processes This talk will concentrate on anomaly detection methods

5 5 Statistical disease outbreak detection methods  They often determine whether the counts in a given syndrome/ subsyndrome time series are unusually high and thus worth investigating.  Statistical detection algorithms: C2, C3, EWMA  C2 & C3: 7 day baseline, 2 day guardband;  Individual day statistic for day j with lag n:  S j,n = Max {0, ( Count j – [μ n + σ n ] ) / σ n }, where  μ n is 7-day average with n-day lag ( so μ 3 is mean of counts in [j-3, j-9] )  σ n = standard deviation of same 7-day window  C2: C2 statistic for day k is S k,3 (2-day lag)  Alerts if Individual day statistic exceeds threshold  C3 statistic for day k is S k,3 + S k-1,3 + S k-2,3  Alerts when statistic for day k exceeds threshold

6 6 Sample Univariate Algorithm Output: C2 Accessible Alerting Algorithms: courtesy of Dr. Howard Burkom, JHU/APL

7 7 Sample Univariate Algorithm Output: C3 Accessible Alerting Algorithms: courtesy of Dr. Howard Burkom, JHU/APL

8 8 EWMA  Exponential Weighted Moving Average (EWMA): average with most weight on recent count X t  28-day baseline, 2 day guard band  Test Statistic:  Often threshold = 3 >= threshold

9 9 Sample Univariate Algorithm Output: EWMA Accessible Alerting Algorithms: courtesy of Dr. Howard Burkom, JHU/APL

10 10 Machine Learning for Disease Outbreak Detection  Approach:  Learn the model of regular activities  Use one-class Support Vector Machine  Detect anomaly based on its dissimilarity from regular activities  Advantages:  Only normal behavior data needed  Detectors flag any anomalous behavior  Capable of detecting anomalies for new pathogens  No need for separate models for each disease NormalAnomalous In SVMs a hyperplane in n- dimensional space divides data into two classes in terms of the largest margin

11 11 Support Vector Machines (SVM)  SVM learning algorithm developed by Vapnik based on statistical learning theory  Learning problem:  Find the best separating hyperplane dividing two classes in terms of the largest margin  To construct a classifier for a given data set, SVM solves a quadratic programming problem  To solve non-linear problems SVM employs inner-product kernels such as: polynomial, RBF, sigmoid, etc.  SVMs build the decision surface using only those training examples that are near the boundary region  Data points at the margin are called support vectors  One-Class SVM used  Proposed by Schölkopf et al. (1999)  Only positive training examples are needed  SVM classifiers are based on hyperplanes corresponding to decision functions:  Where w is the weight vector, b is the threshold of the decision rule, x is the classified pattern.

12 12 Details of the Approach Normal Anomalous Data Streams (3021) Combined Syn SVM GI SVM Fever SVM … EWMA Smoothing Resp SVM 3021 SVM Module

13 13 Data Sets  Training  ESSENCE data - no outbreaks (180 days)  Flu season weeks removed from training data  Testing (Recall)  ESSENCE data - no outbreaks (132 days that were not used in training)  Simulated outbreaks added to real background data:  Tularemia  Hep A – Sets 1, 2 & 3  Real problem Real background data with simulated outbreaks

14 14 TotalMaleFemal e Age 0- 4 Age 5- 17 …Alexa ndria …Washi ngton Bot_Like Fever GI Hem_Ill Loc_Les Lymph Neuro … AbdominalCramps AbdominalPain AbdominalPainGroup AbdominalTenderness Abscess AcuteBloodAbnormalities … Unresponsive UrinaryTract ViralSyndrome Vomiting Wheezing 3021 Data Streams Number of data streams: (11+148)*19 = 3021 Time series for: each Syndrome & subsyndrome: total / gender / age group / county Fever All GI - Male Viral Syndrome: 0-4 Vomiting - Female

15 15 Results: Univariate EWMA  omega = 0.8, baseline = 28 days, guard band = 2 days  Alert when EWMA Test Statistic >= 3  Average of alarms per day on normal data: 28.1 Proliferation of false alarms Proliferation of false alarms

16 16 Results Initial SVM Results  Only Fever SVM and GI SVM trained  Results per day:  Specificity: 94.7%  Sensitivity: 54.8%  Results per outbreak:  Specificity: 94.7%  Sensitivity: 100% Polling Univariate EWMA Results  omega = 0.8, baseline = 28 days, guard band = 2 days  Results per day:  Specificity: 94.0% (th = 53)  Sensitivity: 29.0%  Specificity: 90.9% (th = 44)  Sensitivity: 41.9%  Specificity: 75.8% (th = 35)  Sensitivity : 48.4%  Specificity: 68.9% (th = 32 )  Sensitivity: 54.8%  Results per outbreak:  Specificity: 94.0%  Sensitivity: 80% Similar Specificity -> SVM has Sensitivity better by 25.8% Similar Sensitivity -> SVM has Specificity better by 25.8%

17 17 Results

18 18 Timeliness of Detection  SVM  Specificity: 94.7%  Avg number of days to detect an outbreak: 1.67  All outbreaks detected  PU EWMA  Specificity: 94.0 %  Avg number of days to detect an outbreak: 2.25  2 outbreaks not detected at all  Specificity: 68.9%  Avg number of days to detect an outbreak: 1.8  1 outbreak not detected at all SVM obtains better Timeliness of Detection, higher Specificity for same Sensitivity, and higher Sensitivity for same Specificity than PU EWMA SVM obtains better Timeliness of Detection, higher Specificity for same Sensitivity, and higher Sensitivity for same Specificity than PU EWMA

19 19 Conclusions  New approach for disease outbreak detection designed  Very promising initial results obtained for normal data, 4 simulated outbreaks, one real problem  Favorable comparison of initial SVM results to PU EWMA results

20 20 Future Directions  Proof-of-concept:  Remaining SVMs to be trained  More sophisticated decision fusion module  Testing on many different types of simulated outbreaks (varying amplitude, day of injection)  Testing on real Influenza outbreaks  Explanation capability for the SVM system:  Capability for users to drill down to find the reason for the alert

21 21 Contact info: Dr. Anna L. Buczak National Security Technology Department Johns Hopkins University Applied Physics Laboratory tel. 443-778-9350 e-mail: anna.buczak@jhuapl.eduanna.buczak@jhuapl.edu


Download ppt "Towards Improved Sensitivity, Specificity, and Timeliness of Syndromic Surveillance Systems Anna L. Buczak, PhD, Linda J. Moniz, PhD, Joseph Lombardo,"

Similar presentations


Ads by Google