Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.

Slides:



Advertisements
Similar presentations
High Performance Discovery from Time Series Streams
Advertisements

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Fast Algorithms For Hierarchical Range Histogram Constructions
Maintaining Variance and k-Medians over Data Stream Windows Brian Babcock, Mayur Datar, Rajeev Motwani, Liadan O’Callaghan Stanford University.
Efficient Anomaly Monitoring over Moving Object Trajectory Streams joint work with Lei Chen (HKUST) Ada Wai-Chee Fu (CUHK) Dawei Liu (CUHK) Yingyi Bu (Microsoft)
Genome-scale disk-based suffix tree indexing Benjarath Phoophakdee Mohammed J. Zaki Compiled by: Amit Mahajan Chaitra Venus.
Paul D. Bryan, Jason A. Poovey, Jesse G. Beu, Thomas M. Conte Georgia Institute of Technology.
Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Introduction to Analysis of Algorithms
On the Constancy of Internet Path Properties Yin Zhang, Nick Duffield AT&T Labs Vern Paxson, Scott Shenker ACIRI Internet Measurement Workshop 2001 Presented.
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
Monitoring Methods for Topic Drift in Message Streams By Christopher Ross & S. Muthu Muthukrishnan.
Online Pattern Discovery Applications in Data Streams Sensor-less: Pairs-trading in stock trading (find highly correlated pairs in n log n time) Sensor-full:
Elastic Burst Detection: Applications Discovering intervals with an unusually large numbers of events. –In astrophysics, the sky is constantly observed.
1 Efficient Algorithms for Non-Parametric Clustering With Clutter Weng-Keen Wong Andrew Moore.
Dunja Mladenić Marko Grobelnik Jožef Stefan Institute, Slovenia.
Based on Slides by D. Gunopulos (UCR)
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
Elaine Martin Centre for Process Analytics and Control Technology University of Newcastle, England The Conjunction of Process and.
Energy-efficient Self-adapting Online Linear Forecasting for Wireless Sensor Network Applications Jai-Jin Lim and Kang G. Shin Real-Time Computing Laboratory,
Analysis of Algorithms COMP171 Fall Analysis of Algorithms / Slide 2 Introduction * What is Algorithm? n a clearly specified set of simple instructions.
One-Pass Wavelet Decompositions of Data Streams TKDE May 2002 Anna C. Gilbert,Yannis Kotidis, S. Muthukrishanan, Martin J. Strauss Presented by James Chan.
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Face Detection using the Viola-Jones Method
FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
Mining Frequent Itemsets with Constraints Takeaki Uno Takeaki Uno National Institute of Informatics, JAPAN Nov/2005 FJWCP.
Online Detection of Change in Data Streams Shai Ben-David School of Computer Science U. Waterloo.
Assembler Efficient Discovery of Spatial Co-evolving Patterns in Massive Geo-sensory Data Sheng QIAN SIGKDD 2015.
1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 4.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
X.-X. Li, H.-H. He, F.-R. Zhu, S.-Z. Chen on behalf of the ARGO-YBJ collaboration Institute of High Energy Physics Nanjing GRB Conference,Nanjing,
Longest increasing subsequences in sliding windows Michael H. Albert, Alexander Golynski, Angele M. Hamel, Alejandro Lopez-Ortiz, S. Srinivasa Rao, Mohammad.
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
Data Extraction using Image Similarity CIS 601 Image Processing Ajay Kumar Yadav.
CSC 211 Data Structures Lecture 13
Energy-Efficient Monitoring of Extreme Values in Sensor Networks Loo, Kin Kong 10 May, 2007.
Stable Multi-Target Tracking in Real-Time Surveillance Video
K. Kolomvatsos 1, C. Anagnostopoulos 2, and S. Hadjiefthymiades 1 An Efficient Environmental Monitoring System adopting Data Fusion, Prediction & Fuzzy.
A Research Sampler dex.html.
Robust Real Time Face Detection
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
m-Privacy for Collaborative Data Publishing
Online Interval Skyline Queries on Time Series ICDE 2009.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.
Page 1© Crown copyright 2004 The use of an intensity-scale technique for assessing operational mesoscale precipitation forecasts Marion Mittermaier and.
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
Content Based Coding of Face Images
Dense-Region Based Compact Data Cube
Advanced Database Aggregation Query Processing
Introduction toData structures and Algorithms
Confidence Intervals Cont.
Experience Report: System Log Analysis for Anomaly Detection
Fast Subsequence Matching in Time-Series Databases.
Data Transformation: Normalization
Frequency Counts over Data Streams
Online Conditional Outlier Detection in Nonstationary Time Series
Time Series Filtering Time Series
COS 518: Advanced Computer Systems Lecture 11 Michael Freedman
Evaluation of Relational Operations: Other Operations
Progressive Transmission and Rendering of Foveated Volume Data
Incremental Training of Deep Convolutional Neural Networks
4. Computational Problem Solving
Intelligent Contextual Data Stream Monitoring
COS 518: Advanced Computer Systems Lecture 12 Michael Freedman
Discovering Frequent Poly-Regions in DNA Sequences
Presentation transcript:

Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences New York University SIGKDD 2003

Abstract Burst detection  Find abnormal aggregates in data streams  Sliding window In some applications, we want to monitor many sliding window sizes simultaneously.  Brute force: O(n 2 )  Shifted Wavelet Tree: near linear time

Problem Statement For a time series x 1, x 2, …, x n, given a set of window sizes w 1, w 2, …, w m, an aggregate function F and threshold associated with each window size, f(w j ), j = 1, 2, …, m Monitoring elastics window aggregates of the time series is to find all the subsequences of all the window sizes such that the aggregate applied to the subsequences cross their window sizes' thresholds, i.e.

Wavelet Tree Haar Wavelet Tree  Level 0: original time series  Level 1: pair wise averages and differences of the adjacent data items at level 0  Level i: pair wise averages and differences on averages at level i - 1 The wavelet coefficients can represent the trend of the time series.

Wavelet coefficient → Aggregate Average and difference → Sum Problem: the windows at the same level are non-overlapping Wavelet Tree (cont.)

Shifted Wavelet Tree Add additional “line” of windows They can be maintained explicitly or implicitly.

Shifted Wavelet Tree (cont.) Any subsequence of length w, w ≦ 2 i is included in one of the windows at level i + 1 of the SWT. We say that windows with size w, 2 i -1 < w ≦ 2 i, are monitored by level i + 1 of the SWT. Level 3 Level 4 73

SWT Construction For each level i (i ≧ 1)  Compute the pair wise aggregate (sum) for each two consecutive data items at level i - 1  Downsampling sampling every second item in the series of aggregates → the input for the higher level in the SWT O(n), n: time series length

Search for a Burst Given window size w ≦ 2 i, threshold f(w) Search in two stages  The potential burst is detected at the level i + 1 in the SWT  Detailed search in those subsequences of size 2 i with sum ≧ f(w) O(k), k: #alarms (output size)

Streaming Algorithm Assume that new data becomes available at every time unit. The set of window sizes are 2 L < w 1 < w 2 < … < w m < 2 U. Maintain the levels from L+2 to U+1 of the SWT that monitor those windows. Two methods  Online algorithm  Batch algorithm

Streaming Algorithm: Online Algorithm Whenever a new data item becomes available  Update those 2(U - L) aggregates of the windows in the SWT.  If the aggregate at level i exceeds δ i, perform a detailed search on those windows monitored by i. For level i, threshold δ i = min f(w j ), 2 i-2 < w j ≦ 2 i-1 Response time = one time unit

Streaming Algorithm: Batch Algorithm Maintain the aggregates at level L+1 The aggregate in the most recently completed window of level L+1 is updated every time unit. An aggregate of a window at the upper levels will not be computed until all the data in that window are available. Once an aggregate at a certain upper level is updated, we also check alarms for time intervals monitored by that level. Higher throughput, longer response time.

Other Aggregates The monitoring of many other aggregates based on elastic windows could benefit from our data structure, as long as the following conditions holds. 1. The aggregate F is monotonically increasing or decreasing with respect to the window. e.g.  Max, Count → monotonically increasing  Min → monotonically increasing 2. The alarm domain is one sided, that is,  monotonic increasing → [threshold, ∞)  monotonic decreasing → (-∞, threshold]

Extension to Two Dimensions The problem is to report the positions of spatial sliding windows (rectangle regions) having different sizes, within which the density exceeds some predefined threshold. Using the same techniques of SWT-1D. Wavelet Tree 2D Shifted Wavelet Tree 2D

Effectiveness Study Bursts of the number of times that countries were mentioned in the presidential speech of the state of the union.

A predefined sliding window size is insufficient. Bursts at large time scales are not necessarily reflected at smaller time scales.  may be composed of many consecutive “bumps" Effectiveness Study (cont.)

Bursts in population distribution data (1990) Window sizes 1°x1°, 2°x2° and 5°x5° in Latitude/Longitude Effectiveness Study (cont.)

Performance Study Experiments on a 1.5GHz Pentium 4 PC with 512 MB of main memory running Windows Datasets  The Gamma Ray data set 12 hours of data from a small region of the sky, where Gamma Ray bursts were actually reported The data are time series of the number of photons observed (events) every 0.1 second. Totally 19,015 events in this time series  The NYSE TAQ Stock data set Tick-by-tick trading activities of the IBM stock between July 1st, 1998 and July 1st, ,331,145 trading records (ticks) Each record contains trading time, trading price and trading volume.

Training threshold  Use the first few hours of Gamma Ray data and the first year of Stock data as training data.  For a window of size w, we compute the aggregates on the training data with sliding window of size w => → y  f(w) = avg(→ y) + ξstd(→ y) Window sizes: 5, 10, …,5 * N w time units  N w : #windows, varies from 5 to 50  Time units: 0.1 sec for the Gamma Ray data, and 1 min for the stock data. Performance Study (cont.)

The processing time of our algorithm is output-dependent. Performance Study (cont.)

Experiments on stock data Performance Study (cont.)

Use spread as aggregate function Performance Study (cont.)

Conclusion and Future Work This paper introduces elastic window model and demonstrates the desirability of the new model. A novel data structure for efficient detection of elastic bursts and other aggregates. Experiments show that our algorithm is faster than a brute force algorithm by several orders of magnitude. Future work  A robust way of setting the thresholds  Non-monotonic aggregates