Online Conditional Outlier Detection in Nonstationary Time Series

Slides:



Advertisements
Similar presentations
Part II – TIME SERIES ANALYSIS C5 ARIMA (Box-Jenkins) Models
Advertisements

STAT 497 APPLIED TIME SERIES ANALYSIS
Anomaly Detection in the WIPER System using A Markov Modulated Poisson Distribution Ping Yan Tim Schoenharl Alec Pawling Greg Madey.
Bayesian Nonparametric Matrix Factorization for Recorded Music Reading Group Presenter: Shujie Hou Cognitive Radio Institute Friday, October 15, 2010 Authors:
13 Introduction toTime-Series Analysis. What is in this Chapter? This chapter discusses –the basic time-series models: autoregressive (AR) and moving.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering On-line Alert Systems for Production Plants A Conflict Based Approach.
Software Quality Control Methods. Introduction Quality control methods have received a world wide surge of interest within the past couple of decades.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Results 2 (cont’d) c) Long term observational data on the duration of effective response Observational data on n=50 has EVSI = £867 d) Collect data on.
X-12 ARIMA Eurostat, Luxembourg Seasonal Adjustment.
1 Bayesian Network Anomaly Pattern Detection for Disease Outbreaks Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University)
WAC/ISSCI Automated Anomaly Detection Using Time-Variant Normal Profiling Jung-Yeop Kim, Utica College Rex E. Gantenbein, University of Wyoming.
Traffic modeling and Prediction ----Linear Models
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
TIME SERIES by H.V.S. DE SILVA DEPARTMENT OF MATHEMATICS
AP Statistics Chapter 9 Notes.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Monte Carlo Simulation and Personal Finance Jacob Foley.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Time Series Analysis and Forecasting
It’s About Time Mark Otto U. S. Fish and Wildlife Service.
Week 11 Introduction A time series is an ordered sequence of observations. The ordering of the observations is usually through time, but may also be taken.
Acknowledgements Contact Information Anthony Wong, MTech 1, Senthil K. Nachimuthu, MD 1, Peter J. Haug, MD 1,2 Patterns and Rules  Vital signs medoids.
K. Kolomvatsos 1, C. Anagnostopoulos 2, and S. Hadjiefthymiades 1 An Efficient Environmental Monitoring System adopting Data Fusion, Prediction & Fuzzy.
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
1 OUTPUT ANALYSIS FOR SIMULATIONS. 2 Introduction Analysis of One System Terminating vs. Steady-State Simulations Analysis of Terminating Simulations.
Time Series Analysis and Forecasting. Introduction to Time Series Analysis A time-series is a set of observations on a quantitative variable collected.
Lesson Poisson Probability Distribution. Objectives Understand when a probability experiment follows a Poisson process Compute probabilities of.
Chapter 8 Lesson 8.2 Sampling Variability and Sampling Distributions 8.2: The Sampling Distribution of a Sample Mean.
Sampling and Sampling Distribution
Chapter 8 Lesson : The Sampling Distribution of a Sample Mean
Double and Multiple Sampling Plan
Time Series And Business Forecasting
Supply Chain Management for Non Supply Chain Management Professionals
What Is Statistics? Chapter 1.
Understanding Sampling Distributions: Statistics as Random Variables
Hypothesis Testing: One Sample Cases
Belinda Boateng, Kara Johnson, Hassan Riaz
Continuous Random Variables
Statistics: The Z score and the normal distribution
Shohreh Mirzaei Yeganeh United Nations Industrial Development
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Statistical Data Analysis
Chapter 4: Seasonal Series: Forecasting and Decomposition
Statistical Process Control (SPC)
System Control based Renewable Energy Resources in Smart Grid Consumer
Oliver Sawi1,2, Hunter Johnson1, Kenneth Paap1;
When Security Games Go Green
Data Mining Association Rules: Advanced Concepts and Algorithms
Inferential Statistics and Probability a Holistic Approach
LESSON 20: HYPOTHESIS TESTING
CSCI 5822 Probabilistic Models of Human and Machine Learning
Using Baseline Data in Quality Problem Solving
INTRODUCTION TO HYPOTHESIS TESTING
Use of Time-Series Data in Strategic Managerial Decisions
Statistical Data Analysis
Chapter 3: Random Variables and Probability Distributions
K. Kolomvatsos1, C. Anagnostopoulos2, and S. Hadjiefthymiades1
Intelligent Contextual Data Stream Monitoring
Parametric Methods Berlin Chen, 2005 References:
Model Enhanced Classification of Serious Adverse Events
Chapter 9 Hypothesis Testing: Single Population
Quality Assessment The goal of laboratory analysis is to provide the accurate, reliable and timeliness result Quality assurance The overall program that.
Type I and Type II Errors
Propagation of Error Berlin Chen
Chapter 13: Item nonresponse
Discrete Probability Distributions
Continuous Random Variables: Basics
Presentation transcript:

Online Conditional Outlier Detection in Nonstationary Time Series Siqi Liu1, Adam Wright2, and Milos Hauskrecht1 1Department of Computer Science, University of Pittsburgh 2Brigham and Women's Hospital and Harvard Medical School

Outline Introduction Method Experiments and Results Conclusion

Time Series A time series is a sequence of data points indexed by (discrete) time. For example, a univariate time series 𝑦 𝑡 ∈ℝ:𝑡=1, 2,… . Generally, the points are not independent of each other.

Time Series are Everywhere Daily prices of stocks Monthly usage of electricity Daily temperature, humidity, ... Patient’s heart rate, blood pressure, ... Number of items sold every month Number of cars passed through a highway every hour …

Monitoring Systems By monitoring some attribute of a target (e.g., the heart rate of a patient), we naturally get a time series. Analyzing the time series gives us insights about the target. In this work, we are interested in finding outliers in the time series in real time.

Outliers Outliers are the points that do not follow the “pattern” of the majority of the data. More strictly, they are points that do not follow the probability distribution generating the majority of the data. Outliers provide useful insights, because they indicate anomaly or novelty, i.e., events requiring attention. extremely low volume on a highway → traffic accident unusually frequent access to a server → server being attacked increasing use of a rare word on a social network → new trending topic

Challenge 1: Nonstationarity Detecting outliers in time series is challenging because of the nonstationarity (i.e., the distribution of the data changes over time) Specifically, the changes could be long-term changes periodic changes (a.k.a. seasonality) These hinder outlier detection, because they result in false positives and false negatives

Challenge 1.1: Long-Term Changes An extreme value in the past could be normal now A normal value in the past could be extreme now normal outlier value outlier time

Challenge 1.2: Seasonality outlier value normal normal time

Challenge 2: Context By considering the context, some “outliers” become normal; some “normal” points become outliers.

Existing Work in Outlier Detection in Time Series Existing work in outlier detection in time series usually assumes a model like autoregressive-moving-average (ARMA). (e.g., Tsay 1988; Yamanishi and Takeuchi 2002) These models cannot deal with nonstationary (seasonal) time series directly. A solution is to difference the time series, resulting in: autoregressive- integrated-moving-average (ARIMA). We use it as a baseline in our experiments.

Outline Introduction Method Experiments and Results Conclusion

Method: Two Layers A time 𝑡=1, 2,… we sequentially receive the observations of the target time series 𝑦={ 𝑦 𝑡 ∈ℝ:𝑡=1,2,…}, and the associated context variables 𝑥= 𝑥 𝑡 ∈ ℝ 𝑝 :𝑡=1,2,… . Our model consists of two layers: First layer uses a sliding window to compute a local score; Second layer combines the local score with the context variables to compute a global score (which is the final outlier score).

First Layer First, we decompose the time series (within a sliding window) into 3 components using a nonparametric decomposition algorithm called STL (Cleveland et al. 1990).

First Layer Then, we compute a local deviation score 𝑧 𝑡 = 𝑦 𝑡 (𝑅) − 𝜇 𝑡 (𝑅) 𝜎 𝑡 (𝑅) on the remainder.

Second Layer At each time 𝑡, given ( 𝑧 𝑡 , 𝑥 𝑡 ), where 𝑧 𝑡 is the local score (first-layer output) and 𝑥 𝑡 is the context variables, keep updating a Bayesian linear model 𝑧 𝑡 |𝑤,𝛽, 𝑥 𝑡 ∼𝑁 𝑥 𝑡 𝑇 𝑤, 𝛽 −1 , with the conjugate prior 𝑤,𝛽∼𝑁 𝑤 𝑚 0 , 𝛽 −1 𝑆 0 𝐺𝑎𝑚 𝛽 𝑎 0 , 𝑏 0 . The model is built globally (aggregating all the information from the beginning), because contextual variables may correspond to rare events (e.g., holidays), but we need enough examples to have a good model; local scores are normalized locally, so no need to worry about nonstationarity.

Second Layer The final outlier score is calculated based on the marginal distribution of 𝑧 𝑡 given 𝑥 𝑡 and the history 𝑧 𝑡 | 𝐷 𝑡−1 , 𝑥 𝑡 ∼𝑆𝑡( 𝑧 𝑡 | 𝜇 𝑡 , 𝜎 𝑡 2 , 𝜈 𝑡 ), where 𝐷 𝑡−1 = z u , x u u=1, 2, …, 𝑡−1}.

Outline Introduction Method Experiments and Results Conclusion

Data Sets Bike data consists of the time series (of length 733) that records the daily bike trip counts taken in San Francisco Bay Area through the bike share system from August 2013 to August 2015 . CDS data consists of daily rule firing counts of a clinical decision support (CDS) system in a large teaching hospital. (111 time series of length 1187) Traffic data consists of time series of vehicular traffic volume measurements collected by sensors placed on major highways. (2 time series of length 365)

Injecting Outliers Outliers are injected into the time series by randomly sampling a small proportion 𝑝 of points and changing their value by a specified size 𝛿 as 𝑦 𝑖 = 𝑦 𝑖 ⋅𝛿 for each 𝑦 𝑖 in the sample. We vary 𝑝 and 𝛿 to see the effects.

Methods Compared RND - detects outliers randomly. SARI - ARIMA(1,1,0)×(1,1,0 ) 7 , ARIMA with a weekly (7 day) period, (seasonal) differencing, and (seasonal) order-1 autoregressive term. SIMA - ARIMA(0,1,1)×(0,1,1 ) 7 , ARIMA with a weekly period, (seasonal) differencing, and (seasonal) order-1 moving-average term. SARIMA - ARIMA(1,1,1)×(1,1,1 ) 7 , ARIMA combining the above two. ND - our first-layer STL-based model, using absolute value of the output as outlier scores. TL1 - our two-layer model using holiday information as a contextual variable. TL2 - our two-layer model using holiday and additional information (if available) as context variables.

Evaluation: Precision-Alert-Rate Curves (Hauskrecht et al. 2016) Alert rate: the proportion of alerts raised out of all points. Precision: the proportion of true outliers out of alerts raised. We calculate AUC to compare the overall performance. Notice we focus on low-alert- rate region for practicality.

Varying Outlier Rate (0.1) - Bike Data

Varying Outlier Rate (0.05) - Bike Data

Varying Outlier Rate (0.01) - Bike Data

Varying Outlier Size (2.0) - Bike Data

Varying Outlier Size (1.5) - Bike Data

Varying Outlier Size (1.2) - Bike Data

Overall Performance By comparing the AUC, we have the following observations: When the size of the outliers are small, all methods perform similarly to random. In the other cases, our two-layer method is almost always the best method. Even using only the first-layer can achieve similar or better results as the ARIMA-based methods. Using additional information (e.g., using weather besides holiday info) improves the performance of the two-layer method.

Outline Introduction Method Experiments and Results Conclusion

Conclusion We have proposed a two-layer method to detect outliers in time series in real time. Our method takes account of the nonstationarity and the context of the data to detect outliers more accurately. Experiments on data sets from different domains have shown the advantages of our method.

Thank you! Q & A