A Wavelet-based Anomaly Detector for Disease Outbreaks Thomas Lotze Galit Shmueli University of Maryland College Park Sean Murphy Howard Burkom Johns Hopkins.

Slides:



Advertisements
Similar presentations
Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.
Advertisements

DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Exam 1 review: Quizzes 1-6.
Part II – TIME SERIES ANALYSIS C3 Exponential Smoothing Methods © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference,
Time Series and Forecasting
University of Ioannina - Department of Computer Science Wavelets and Multiresolution Processing (Background) Christophoros Nikou Digital.
Early Statistical Detection of Bio-Terrorism Attacks by Tracking OTC Medication Sales Galit Shmueli Dept. of Statistics and CALD Carnegie Mellon University.
4-1 Operations Management Forecasting Chapter 4 - Part 2.
Project Mimic: Simulation for Syndromic Surveillance Thomas Lotze Applied Mathematics and Scientific Computation University of Maryland Galit Shmueli and.
Exponential Smoothing Methods
Time Series Analysis Autocorrelation Naive & Simple Averaging
An introduction to time series approaches in biosurveillance Professor The Auton Lab School of Computer Science Carnegie Mellon University
Forecasting Demand ISQA 511 Dr. Mellie Pullman.
What’s Strange About Recent Events (WSARE) v3.0: Adjusting for a Changing Baseline Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon.
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
1 Spreadsheet Modeling & Decision Analysis: A Practical Introduction to Management Science, 3e by Cliff Ragsdale.
MOVING AVERAGES AND EXPONENTIAL SMOOTHING
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Introduction to Wavelets
Wavelet-based Coding And its application in JPEG2000 Monia Ghobadi CSC561 project
Part II – TIME SERIES ANALYSIS C2 Simple Time Series Methods & Moving Averages © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Spatiotemporal Cluster Detection in ESSENCE Biosurveillance Systems Panelist: Howard Burkom National Security Technology Department, John Hopkins University.
A Signal Analysis of Network Traffic Anomalies Paul Barford, Jeffrey Kline, David Plonka, and Amos Ron.
1 Bayesian Network Anomaly Pattern Detection for Disease Outbreaks Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University)
Slides 13b: Time-Series Models; Measuring Forecast Error
A Signal Analysis of Network Traffic Anomalies Paul Barford with Jeffery Kline, David Plonka, Amos Ron University of Wisconsin – Madison Summer, 2002.
Constant process Separate signal & noise Smooth the data: Backward smoother: At any give T, replace the observation yt by a combination of observations.
ENG4BF3 Medical Image Processing
The Geometric Moving Average Control Chart: A Full-Purpose Process-Control Tool ASQ-Baltimore Section Meeting December 10, 2002 Melvin T. Alexander Past.
1 Spreadsheet Modeling & Decision Analysis: A Practical Introduction to Management Science, 3e by Cliff Ragsdale.
Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Time Series Forecasting Chapter 16.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Time Series Forecasting Chapter 13.
Multiresolution analysis and wavelet bases Outline : Multiresolution analysis The scaling function and scaling equation Orthogonal wavelets Biorthogonal.
Analyzing over-the-counter medication purchases for early detection of epidemics and bio-terrorism by Anna Goldenberg Advisor: Rich Caruana Note: Sponsored.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.
Time Series Analysis and Forecasting
1 Forecasting Formulas Symbols n Total number of periods, or number of data points. A Actual demand for the period (  Y). F Forecast demand for the period.
4-1 Operations Management Forecasting Chapter 4 - Part 2.
Confidence Interval Estimation For statistical inference in decision making:
COMPARING NOISE REMOVAL IN THE WAVELET AND FOURIER DOMAINS Dr. Robert Barsanti SSST March 2011, Auburn University.
Introduction to Inference Sampling Distributions.
APPLICATION OF A WAVELET-BASED RECEIVER FOR THE COHERENT DETECTION OF FSK SIGNALS Dr. Robert Barsanti, Charles Lehman SSST March 2008, University of New.
1 SMU EMIS 7364 NTU TO-570-N Control Charts Basic Concepts and Mathematical Basis Updated: 3/2/04 Statistical Quality Control Dr. Jerrell T. Stracener,
Page 1© Crown copyright 2004 The use of an intensity-scale technique for assessing operational mesoscale precipitation forecasts Marion Mittermaier and.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 14 l Time Series: Understanding Changes over Time.
1 1 Chapter 6 Forecasting n Quantitative Approaches to Forecasting n The Components of a Time Series n Measures of Forecast Accuracy n Using Smoothing.
~PPT Howard Burkom 1, PhD Yevgeniy Elbert 2, MSc LTC Julie Pavlin 2, MD MPH Christina Polyak 2, MPH 1 The Johns Hopkins University Applied Physics.
No More Black Box: Methods for visualizing and understanding your data for useful analysis Howard Burkom National Security Technology Department Johns.
Forecasting Demand. Problems with Forecasts Forecasts are Usually Wrong. Every Forecast Should Include an Estimate of Error. Forecasts are More Accurate.
Times Series Forecasting and Index Numbers Chapter 16 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Assignable variation Deviations with a specific cause or source. forecast bias or assignable variation or MSE? Click here for Hint.
Managerial Decision Modeling 6 th edition Cliff T. Ragsdale.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
Towards Improved Sensitivity, Specificity, and Timeliness of Syndromic Surveillance Systems Anna L. Buczak, PhD, Linda J. Moniz, PhD, Joseph Lombardo,
Lecture 9 Forecasting. Introduction to Forecasting * * * * * * * * o o o o o o o o Model 1Model 2 Which model performs better? There are many forecasting.
Financial Analysis, Planning and Forecasting Theory and Application
Forecasting Operations Analysis and Improvement 2017 Spring
Operations Management Contemporary Concepts and Cases
Online Conditional Outlier Detection in Nonstationary Time Series
“The Art of Forecasting”
APHA, Washington, November, 2007
4th Joint EU-OECD Workshop on BCS, Brussels, October 12-13
LESSON 21: REGRESSION ANALYSIS
Exponential Smoothing
Forecasting - Introduction
OUTLINE Questions? Quiz Go over homework Next homework Forecasting.
Exponential Smoothing
Presentation transcript:

A Wavelet-based Anomaly Detector for Disease Outbreaks Thomas Lotze Galit Shmueli University of Maryland College Park Sean Murphy Howard Burkom Johns Hopkins University Applied Physics Lab

Outline ● Motivation ● Wavelet method ● Difficulties ● Preconditioning ● Results

Related Work ● Bakshi  Wavelets in Chemical SPC ● Zhang  Baseline wavelets  Normalize syndromic baseline ● Goldenberg, et. al.  Wavelets in syndromic surveillance

Motivation ● Detecting disease outbreaks  Bioterrorist attacks  Virulent diseases  Early detection saves lives! ● Syndromic Data will show outbreaks ● Anomaly detection to find outbreaks faster

Wavelets ● Models a series as a sum of “wavelets” ● Wavelets are at different scales ● Wavelets are local (change over time)

Goldenberg et. al., 2002 XtXt SPC( ) AR(a L X) AR(d L X) AR(d m X) AR(d 1 X) WTWT Decompose the series with desired wavelet Use an AR at each of the detail levels and coarsest approximation level to forecast the next point Reconstruct series and obtain next day forecast Compare the forecast with the actual value. Use a control chart to monitor the discrepancy W

Difficulties ● Holidays ● Non-stationary  Day of week  Seasonal ● Noisy ● Outbreaks are not labeled ● Outbreak pattern not known in advance

Preconditioning ● Differs from Goldenberg, et. al ● Replace holidays  One week previous ● Day-of-week  Ratio to moving average

Evaluation: Simulated Outbreaks ● Real data from 5 cities, Resp and Gi ● Simulated outbreak patterns inserted ● Specific pattern of additional syndromes over several days ● Size is normalized by standard deviation of recent days ● Inserted at different starting points within the sample data ● Average detection rates vs. false alarm rates can be determined to create ROC curves

Results ● Comparable to Holt- Winters ● Not amazing

Results ● Preconditioning is important ● Detection is much better when preconditioned

Results ● Easier to detect on some days than others ● Days with low counts ● Daily preconditioni ng not sufficient

Summary ● Wavelets are a fairly good detection method ● Preconditioning is very important ● Day-of-week not fully accounted for

Questions? ● More details on wavelets method? ● Difficulties? ● Other outbreak signals? ● Future work? ● Will Microsoft survive Bill Gates' stepping down?

Bonus: More on Wavelets ● Level 1: –Run the data through a low- pass filter. This gives the approximation coefficients –Run the data through a high- pass filter. This gives the detail coefficients –Down-sample –Reconstruct approximation and detail by up-sampling and running “reconstruction” filters. ● Level 2 and on: –Repeat the steps by applying them to the previous level approximation coefficients.

Bonus: Wavelets on Cough Medication Sales Haar Wavelet: h = [1/sqrt(2), 1/sqrt(2)] g = [1/sqrt(2), -1/sqrt(2)] Downsample Upsample h* = [1/sqrt(2), 1/sqrt(2)] g* = [-1/sqrt(2), 1/sqrt(2)] In general: s = a5 + d1 + d2 +… + d5

Bonus: Wavelet Prediction ● Additional details: ● 5 level decomposition  Can be performed with more or fewer ● SWT: Fill in “holes”  Perform a decomposition for every possible position  Series are no longer independent ● Edge issue  Prediction is not possible at all time steps  Solution: construct wavelets “backwards” from most recent observations

Bonus: Ratio-to-Moving- Average ● Way of normalizing day-of-week effects ● 1: Determine moving averages  a(i)=(x(i-3) + x(i-2) x(i+3)) /7 ● 2: Determine ratio (“raw seasonal”) for each day  r(i)=x(i)/a(i) ● 3: Determine avg. ratio for each day  r(Mon)=sum(r(i): i is Mon) / count(i is Mon) ● 4: Normalize ratios to sum to 1  r'(Mon)=r(Mon) / (r(Mon) r(Sun)) ● 5: Divide each day by its ratio  x'(i)=x(i)/r(Mon)

Bonus: Possible Extensions ● Multivariate wavelets ● Each day-of-week as a separate series ● Different wavelet shapes ● Different wavelet scale basis ● Different preconditioning ● Different sizes, lengths of outbreaks ● Don't normalize outbreak by standard deviation of recent days Show when outbreaks are harder to detect ● Estimate confidence based on experience ● Boosting

Bonus: Wavelet Prediction ● Decompose into timescales ● Use AR or EWMA to predict for each timescale ● Reconstruct prediction from predicted timescales ● Monitor deviations from prediction

Bonus: Alternative Preconditioning ● Regression using day-of-week predictors ● 7-day differencing ● Holt-Winters as preconditioner ● Seasonal preconditioning

Bonus: Other Outbreak Signals ● Normalized by total size ● Lognormal, exponential, step ● Spike is much easier than the others