Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Wavelet-based Anomaly Detector for Disease Outbreaks Thomas Lotze Galit Shmueli University of Maryland College Park Sean Murphy Howard Burkom Johns Hopkins.

Similar presentations


Presentation on theme: "A Wavelet-based Anomaly Detector for Disease Outbreaks Thomas Lotze Galit Shmueli University of Maryland College Park Sean Murphy Howard Burkom Johns Hopkins."— Presentation transcript:

1 A Wavelet-based Anomaly Detector for Disease Outbreaks Thomas Lotze Galit Shmueli University of Maryland College Park Sean Murphy Howard Burkom Johns Hopkins University Applied Physics Lab

2 Outline ● Motivation ● Wavelet method ● Difficulties ● Preconditioning ● Results

3 Related Work ● Bakshi  Wavelets in Chemical SPC ● Zhang  Baseline wavelets  Normalize syndromic baseline ● Goldenberg, et. al.  Wavelets in syndromic surveillance

4 Motivation ● Detecting disease outbreaks  Bioterrorist attacks  Virulent diseases  Early detection saves lives! ● Syndromic Data will show outbreaks ● Anomaly detection to find outbreaks faster

5 Wavelets ● Models a series as a sum of “wavelets” ● Wavelets are at different scales ● Wavelets are local (change over time)

6 Goldenberg et. al., 2002 XtXt SPC( ) AR(a L X) AR(d L X) AR(d m X) AR(d 1 X) WTWT Decompose the series with desired wavelet Use an AR at each of the detail levels and coarsest approximation level to forecast the next point Reconstruct series and obtain next day forecast Compare the forecast with the actual value. Use a control chart to monitor the discrepancy W

7 Difficulties ● Holidays ● Non-stationary  Day of week  Seasonal ● Noisy ● Outbreaks are not labeled ● Outbreak pattern not known in advance

8 Preconditioning ● Differs from Goldenberg, et. al ● Replace holidays  One week previous ● Day-of-week  Ratio to moving average

9 Evaluation: Simulated Outbreaks ● Real data from 5 cities, Resp and Gi ● Simulated outbreak patterns inserted ● Specific pattern of additional syndromes over several days ● Size is normalized by standard deviation of recent days ● Inserted at different starting points within the sample data ● Average detection rates vs. false alarm rates can be determined to create ROC curves

10 Results ● Comparable to Holt- Winters ● Not amazing

11 Results ● Preconditioning is important ● Detection is much better when preconditioned

12 Results ● Easier to detect on some days than others ● Days with low counts ● Daily preconditioni ng not sufficient

13 Summary ● Wavelets are a fairly good detection method ● Preconditioning is very important ● Day-of-week not fully accounted for

14 Questions? ● More details on wavelets method? ● Difficulties? ● Other outbreak signals? ● Future work? ● Will Microsoft survive Bill Gates' stepping down?

15 Bonus: More on Wavelets ● Level 1: –Run the data through a low- pass filter. This gives the approximation coefficients –Run the data through a high- pass filter. This gives the detail coefficients –Down-sample –Reconstruct approximation and detail by up-sampling and running “reconstruction” filters. ● Level 2 and on: –Repeat the steps by applying them to the previous level approximation coefficients.

16 Bonus: Wavelets on Cough Medication Sales Haar Wavelet: h = [1/sqrt(2), 1/sqrt(2)] g = [1/sqrt(2), -1/sqrt(2)] Downsample Upsample h* = [1/sqrt(2), 1/sqrt(2)] g* = [-1/sqrt(2), 1/sqrt(2)] In general: s = a5 + d1 + d2 +… + d5

17 Bonus: Wavelet Prediction ● Additional details: ● 5 level decomposition  Can be performed with more or fewer ● SWT: Fill in “holes”  Perform a decomposition for every possible position  Series are no longer independent ● Edge issue  Prediction is not possible at all time steps  Solution: construct wavelets “backwards” from most recent observations

18 Bonus: Ratio-to-Moving- Average ● Way of normalizing day-of-week effects ● 1: Determine moving averages  a(i)=(x(i-3) + x(i-2) +... + x(i+3)) /7 ● 2: Determine ratio (“raw seasonal”) for each day  r(i)=x(i)/a(i) ● 3: Determine avg. ratio for each day  r(Mon)=sum(r(i): i is Mon) / count(i is Mon) ● 4: Normalize ratios to sum to 1  r'(Mon)=r(Mon) / (r(Mon) +... + r(Sun)) ● 5: Divide each day by its ratio  x'(i)=x(i)/r(Mon)

19 Bonus: Possible Extensions ● Multivariate wavelets ● Each day-of-week as a separate series ● Different wavelet shapes ● Different wavelet scale basis ● Different preconditioning ● Different sizes, lengths of outbreaks ● Don't normalize outbreak by standard deviation of recent days Show when outbreaks are harder to detect ● Estimate confidence based on experience ● Boosting

20 Bonus: Wavelet Prediction ● Decompose into timescales ● Use AR or EWMA to predict for each timescale ● Reconstruct prediction from predicted timescales ● Monitor deviations from prediction

21 Bonus: Alternative Preconditioning ● Regression using day-of-week predictors ● 7-day differencing ● Holt-Winters as preconditioner ● Seasonal preconditioning

22 Bonus: Other Outbreak Signals ● Normalized by total size ● Lognormal, exponential, step ● Spike is much easier than the others


Download ppt "A Wavelet-based Anomaly Detector for Disease Outbreaks Thomas Lotze Galit Shmueli University of Maryland College Park Sean Murphy Howard Burkom Johns Hopkins."

Similar presentations


Ads by Google