Download presentation
Presentation is loading. Please wait.
1
Time Series Algorithm Tutorial
Adopted from Andrew Moore’s slides RODS: Auton Lab: Copyright © 2002, 2003, 2004 Andrew Moore
2
Copyright © 2002, 2003, Andrew Moore
3
The Basic Task: Analyze a time series data stream to find outbreaks without sounding too many false alarms Signal Time Copyright © 2002, 2003, Andrew Moore
4
Many Methods! Method Has Pitt/CMU tried it? Tried but little used
Tried and used Under development Multivariate signal tracking? Spatial? Time-weighted averaging Yes Serfling ARIMA SARIMA + External Factors Univariate HMM Kalman Filter Recursive Least Squares Support Vector Machine Neural Nets Randomization Spatial Scan Statistics (w/ Howard Burkom) Bayesian Networks Contingency Tables Scalar Outlier (SQC) Multivariate Anomalies Change-point statistics FDR Tests WSARE (Recent patterns) PANDA (Causal Model) FLUMOD (space/Time HMM) Details of these methods and bibliography available from “Summary of Biosurveillance-relevant statistical and data mining technologies” by Moore, Cooper, Tsui and Wagner. Downloadable (PDF format) from Copyright © 2002, 2003, Andrew Moore
5
What you’ll learn about
Noticing events in bio-event time series Tracking many series at once Copyright © 2002, 2003, Andrew Moore
6
What you’ll learn about
These are all powerful statistical methods, which means they all have to have one thing in common… Noticing events in bio-event time series Tracking many series at once Copyright © 2002, 2003, Andrew Moore
7
What you’ll learn about
These are all powerful statistical methods, which means they all have to have one thing in common… Boring Names. Noticing events in bio-event time series Tracking many series at once Copyright © 2002, 2003, Andrew Moore
8
What you’ll learn about
These are all powerful statistical methods, which means they all have to have one thing in common… Boring Names. Noticing events in bio-event time series Tracking many series at once Univariate Anomaly Detection Multivariate Anomaly Detection Copyright © 2002, 2003, Andrew Moore
9
What you’ll learn about
Noticing events in bio-event time series Tracking many series at once Univariate Anomaly Detection Multivariate Anomaly Detection Copyright © 2002, 2003, Andrew Moore
10
Univariate Time Series
Signal Time Example Signals: Number of ED visits today Number of ED visits this hour Number of Respiratory Cases Today School absenteeism today Nyquil Sales today NyQuil: an OTC medicine for cold and flu. Copyright © 2002, 2003, Andrew Moore
11
(When) is there an anomaly?
Copyright © 2002, 2003, Andrew Moore
12
(When) is there an anomaly?
This is a time series of counts of primary-physician visits in data from Norfolk in December I added a fake outbreak, starting at a certain date. Can you guess the start date? Copyright © 2002, 2003, Andrew Moore
13
(When) is there an anomaly?
Here (much too high for a Friday) This is a time series of counts of primary-physician visits in data from Norfolk in December I added a fake outbreak, starting at a certain date. Can you guess when? (injected outbreak) Copyright © 2002, 2003, Andrew Moore
14
An easy case Signal Time Dealt with by Statistical Quality Control
Record the mean and standard deviation up to the current time. Signal an alarm if we go outside 3 sigmas Mean: \mu standard deviation: \sigma often uses 1.96\sigma instead of 3\sigma Copyright © 2002, 2003, Andrew Moore
15
An easy case: Control Charts
Upper Safe Range Signal Mean Time Dealt with by Statistical Quality Control Record the mean and standard deviation up to the current time. Signal an alarm if we go outside 3 sigmas Copyright © 2002, 2003, Andrew Moore
16
Control Charts on the Norfolk Data
Alarm Level Predicted value is the mean up to the current time. (injected outbreak) Copyright © 2002, 2003, Andrew Moore
17
Control Charts on the Norfolk Data
Alarm Level How well the anomaly detection algorithm perform largely depends on how the alarm level (or severity) reflects the actual (injected outbreak) Copyright © 2002, 2003, Andrew Moore
18
Control Charts on the Norfolk Data
Alarm Level Previously we used two-week data, now let’s look at almost 3 months, plus we have more than one data points perday. Challenge: the alarm level of the anomaly might not be very different from those of the normal data points if the mean/variation are both increasing. Copyright © 2002, 2003, Andrew Moore
19
Looking at changes from yesterday
Control Chart’s problem: too insensitive to recent changes. Now, what if we go to the other extreme, we use yesterday’s data to predict today’s value Copyright © 2002, 2003, Andrew Moore
20
Looking at changes from yesterday
Alarm Level It was not successful in detecting Friday’s outbreak --- Thursday’s value is too high . Fase negative. Copyright © 2002, 2003, Andrew Moore
21
Looking at changes from yesterday
Alarm Level Let’s look at the 3 month data. There will be many false alarms. Copyright © 2002, 2003, Andrew Moore
22
We need a happy medium: Control Chart: Too insensitive to recent changes Change from yesterday: Too sensitive to recent changes Control chart: either we miss the outbreak/rampup or there are too many false alarms in the last few weeks. Too many false spikes detected. Copyright © 2002, 2003, Andrew Moore
23
Moving Average Copyright © 2002, 2003, Andrew Moore
24
Moving Average Adapt to recent changes, but not too fast. In this data, Alarm is still detected on Monday, not Friday. Copyright © 2002, 2003, Andrew Moore
25
Moving Average However, for the 3-month data, the spike stands out in the data. Copyright © 2002, 2003, Andrew Moore
26
Looks better. But how can we be quantitative about this?
Moving Average In other words, how much better is moving average than control chart and using-yesterday? Looks better. But how can we be quantitative about this? Copyright © 2002, 2003, Andrew Moore
27
Algorithm Performance
Allowing one False Alarm per TWO weeks… Allowing one False Alarm per SIX weeks… Algorithm Performance Fraction of spikes detected Days to detect a ramp attack Fraction of spikes detected Days to detect a ramp attack But how do we calculate these metrics? It turns out to be not easy. Copyright © 2002, 2003, Andrew Moore
28
The evaluation data we used so far is called semi-synthetic data
Copyright © 2002, 2003, Andrew Moore
29
Copyright © 2002, 2003, Andrew Moore
30
Copyright © 2002, 2003, Andrew Moore
31
Copyright © 2002, 2003, Andrew Moore
32
Copyright © 2002, 2003, Andrew Moore
33
Copyright © 2002, 2003, Andrew Moore
34
Copyright © 2002, 2003, Andrew Moore
35
Copyright © 2002, 2003, Andrew Moore
36
Copyright © 2002, 2003, Andrew Moore
37
Copyright © 2002, 2003, Andrew Moore
38
Copyright © 2002, 2003, Andrew Moore
39
Copyright © 2002, 2003, Andrew Moore
40
Copyright © 2002, 2003, Andrew Moore
41
Copyright © 2002, 2003, Andrew Moore
42
Algorithm Performance
Allowing one False Alarm per TWO weeks… Allowing one False Alarm per SIX weeks… Algorithm Performance Fraction of spikes detected Days to detect a ramp attack Fraction of spikes detected Days to detect a ramp attack Again, these are the results we got using semi-synthetic data. Copyright © 2002, 2003, Andrew Moore
43
Algorithm Performance
Allowing one False Alarm per TWO weeks… Allowing one False Alarm per SIX weeks… Algorithm Performance Fraction of spikes detected Days to detect a ramp attack Fraction of spikes detected Days to detect a ramp attack Copyright © 2002, 2003, Andrew Moore
44
Algorithm Performance
Allowing one False Alarm per TWO weeks… Allowing one False Alarm per SIX weeks… Algorithm Performance Fraction of spikes detected Days to detect a ramp attack Fraction of spikes detected Days to detect a ramp attack Why seven days have a better performance than 3 and 56? This is because of the weekly seasonal effects. Copyright © 2002, 2003, Andrew Moore
45
Seasonal Effects Signal Time
Fit a periodic function (e.g. sine wave) to previous data. Predict today’s signal and 3-sigma confidence intervals. Signal an alarm if we’re off. Reduces False alarms from Natural outbreaks. Different times of year deserve different thresholds. The season can be hours, days of week, weeks of month, months of year etc. Copyright © 2002, 2003, Andrew Moore
46
Algorithm Performance
Allowing one False Alarm per TWO weeks… Allowing one False Alarm per SIX weeks… Algorithm Performance Fraction of spikes detected Days to detect a ramp attack Fraction of spikes detected Days to detect a ramp attack If we consider different hours_of_daylight, then we get better performance. But we know different day-of-week are different in our dataset. Copyright © 2002, 2003, Andrew Moore
47
Day-of-week effects Fit a day-of-week component
E[Signal] = a + deltaday E.G: deltamon= +5.42, deltatue= +2.20, deltawed= +3.33, deltathu= +3.10, deltafri= +4.02, deltasat= -12.2, deltasun= A simple form of ANOVA Copyright © 2002, 2003, Andrew Moore
48
Regression using Hours-in-day & IsMonday
Predict= c + \beta * IsMonay. Outbreak detection is not satisfying. Hours-of-day is not helpful since there is no hourly data in this data set. Copyright © 2002, 2003, Andrew Moore
49
Regression using Hours-in-day & IsMonday
Adding Is_Monday helps here – at least it help with Monday data. Copyright © 2002, 2003, Andrew Moore
50
Algorithm Performance
Allowing one False Alarm per TWO weeks… Allowing one False Alarm per SIX weeks… Algorithm Performance Fraction of spikes detected Days to detect a ramp attack Fraction of spikes detected Days to detect a ramp attack Copyright © 2002, 2003, Andrew Moore
51
Regression using Mon-Tue
Let’s say do this : predict = c + \alpha * is_Monday + \beta *isTueday Copyright © 2002, 2003, Andrew Moore
52
Algorithm Performance
Allowing one False Alarm per TWO weeks… Allowing one False Alarm per SIX weeks… Algorithm Performance Fraction of spikes detected Days to detect a ramp attack Fraction of spikes detected Days to detect a ramp attack Note: we get a higher detection rate when we add more days of week into account. Copyright © 2002, 2003, Andrew Moore
53
CUSUM CUmulative SUM Statistics
Keep a running sum of “surprises”: a sum of excesses each day over the prediction When this sum exceeds threshold, signal alarm and reset sum Copyright © 2002, 2003, Andrew Moore
54
CUSUM In general it works well with outbreak/ramp-up detction, outbreak is detected on Saturday. Copyright © 2002, 2003, Andrew Moore
55
CUSUM But might not work well with spike detection
Copyright © 2002, 2003, Andrew Moore
56
Algorithm Performance
Allowing one False Alarm per TWO weeks… Allowing one False Alarm per SIX weeks… Algorithm Performance Fraction of spikes detected Days to detect a ramp attack Fraction of spikes detected Days to detect a ramp attack Copyright © 2002, 2003, Andrew Moore
57
The Sickness/Availability Model
Counts = sickness * availability Plot this Sickness = counts / availability e.g. less counts during weekend, but this does not mean less sickness. Sick people may seek care more often on certain days due to availability of medical services or time in their schedules, so adjust for that phenomenon Copyright © 2002, 2003, Andrew Moore
58
The Sickness/Availability Model
Columbus day is a Monday. Veterans day Nov 11th. Copyright © 2002, 2003, Andrew Moore
59
The Sickness/Availability Model
Copyright © 2002, 2003, Andrew Moore
60
The Sickness/Availability Model
Sickness is different from count. Alarm level is based on sickness. Copyright © 2002, 2003, Andrew Moore
61
The Sickness/Availability Model
Sickness is different from count Copyright © 2002, 2003, Andrew Moore
62
The Sickness/Availability Model
Sickness is different from count. Here the alarm level is based on sickness. Copyright © 2002, 2003, Andrew Moore
63
The Sickness/Availability Model
Successfully detect the outbreak: the underlying reason for the weekly pattern is due to the availability issue. - can be used to deal with holidays (which might not have the fixed cycles). First replace count with sickness, then do the seasonal effect etc. Copyright © 2002, 2003, Andrew Moore
64
The Sickness/Availability Model
The spike really stands out - see the green spikes. Copyright © 2002, 2003, Andrew Moore
65
Algorithm Performance
Allowing one False Alarm per TWO weeks… Allowing one False Alarm per SIX weeks… Algorithm Performance Fraction of spikes detected Days to detect a ramp attack Fraction of spikes detected Days to detect a ramp attack Apply sickness/availability model first, and then moving average…. Again, 7 days works the best, and it works much better than the simple moving average of 7 days. Copyright © 2002, 2003, Andrew Moore
66
Other state-of-the-art methods
Wavelets Change-point detection Kalman filters Hidden Markov Models Many others Copyright © 2002, 2003, Andrew Moore
67
Copyright © 2002, 2003, Andrew Moore
68
Copyright © 2002, 2003, Andrew Moore
69
A generalized anomaly detector model based on time series algorithms
For example 1 Historical Average sThld Then, we need to set a severity threshold to decide anomalies. So, this is a general detector model. Different detectors basically work in these two steps, except that they use different techniques or algorithms to measure the severities. 2018/11/24
70
Open-sourced Libraries for Time Series Algorithms
2017/02 Facebook Prophet (R/Python) Yahoo! egads (Java) Twitter anomaly detection (R) 2015 Netflix Surus (Pig,based on PCA) Etsy skyline (python) 2013 Numenta NuPIC (python,based on HTM) 1997 RRDtool HWPREDICT。(C,based on holt-winters)
71
What you’ll learn about
Noticing events in bio-event time series Tracking many series at once Univariate Anomaly Detection Read the remaining slides by yourselves Multivariate Anomaly Detection Copyright © 2002, 2003, Andrew Moore
72
Multiple Signals Copyright © 2002, 2003, Andrew Moore
73
Multivariate Signals (relevant to inhalational diseases)
Copyright © 2002, 2003, Andrew Moore
74
Multi Source Signals Lab Flu WebMD School Cough& Cold Throat Resp
Viral Death Influcenza: contiguous cold weeks Copyright © 2002, 2003, Andrew Moore
75
What if you’ve got multiple signals?
Red: Cough Sales Blue: ED Respiratory Visits Signal Time Idea One: Simply treat it as two separate alarm-from-signal problems. …Question: why might that not be the best we can do? Copyright © 2002, 2003, Andrew Moore
76
Another View Signal Red: Cough Sales Blue: ED Respiratory Visits
Question: why might that not be the best we can do? Cough Sales ED Respiratory Visits Copyright © 2002, 2003, Andrew Moore
77
This should be an anomaly
Another View Red: Cough Sales Blue: ED Respiratory Visits Signal This should be an anomaly Question: why might that not be the best we can do? Cough Sales ED Respiratory Visits Copyright © 2002, 2003, Andrew Moore
78
N-dimensional Gaussian
Red: Cough Sales Blue: ED Respiratory Visits Signal Good Practical Idea: Model the joint with a Gaussian One Sigma Cough Sales Normal distribution: \mu (Mean) and \sigma (standard deviation) Gaussian model: multiple dimension. Sqc: standard quality control Sensible: wise 2 Sigma ED Respiratory Visits Copyright © 2002, 2003, Andrew Moore
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.