Download presentation
Presentation is loading. Please wait.
1
Elastic Burst Detection: Applications Discovering intervals with an unusually large numbers of events. –In astrophysics, the sky is constantly observed for high-energy particles. When a particular astrophysical event happens, a shower of high-energy particles arrives in addition to the background noise. –In finance, stocks with unusual high trading volumes should attract the notice of traders (or perhaps regulators). Challenge : to discover not only the time of the burst, but also the duration of the burst which may vary widely. –In astrophysics, a burst of high-energy particles associated with a special event might last for a few milliseconds or a few hours or even a few days.
2
Burst Detection: Problem Statement Problem:Given a time series of positive number x 1, x 2,..., x n, and a threshold function f(w), w=1,2,...,n, find the subsequences of any size such that their sums are above the thresholds: –all 0 f(w) Brute force search : O(n^2) time Our shift wavelet tree (SWT): O(n+k) time. –k is the size of the output, i.e. the number of windows with bursts
3
Burst Detection: Data Structure and Algorithm –Lemma 1:any subsequence s is included by one window w in the SWT. –Lemma 2: if Sum(s)>threshold, then Sum(w)>threshold (no false positives).
4
StatStream: Motivation Stock prices streams –The New York Stock Exchange (NYSE) –50,000 securities (streams); 100,000 ticks (trade and quote) Pairs Trading, a.k.a. Correlation Trading Query:“which pairs of stocks were correlated with a value of over 0.9 for the last three hours?” XYZ and ABC have been correlated with a correlation of 0.95 for the last three hours. Now XYZ and ABC become less correlated as XYZ goes up and ABC goes down. They should converge back later. I will sell XYZ and buy ABC …
5
StatStream:Goal Given tens of thousands of high speed time series data streams, to detect high-value correlation, including synchronized and time-lagged, over sliding windows in real time. Real time –high update frequency of the data stream –fixed response time, online Correlated!
6
StatStream: Algorithm Naive algorithm –N : number of streams –w : size of sliding window –space O(N) and time O(N 2 w) VS space O(N 2 ) and time O(N 2 ). Suppose that the streams are updated every second. –With a Pentium 4 PC, the exact computing method can only monitor 700 streams with a delay of 2 minutes. Our Approach –Using Discrete Fourier Transform to approximate correlation –Using grid structure to filter out unlikely pairs –Our approach can monitor 10,000 streams with a delay of 2 minutes.
7
StatStream: Stream synoptic data structure Three level time interval hierarchy –Time point, Basic window, Sliding window Basic window (the key to our technique) –The computation for basic window i must finish by the end of the basic window i+1 –The basic window time is the system response time. Digests Sliding window digests: sum DFT coefs Basic window digests: sum DFT coefs Sliding window Basic window Time point Basic window digests: sum DFT coefs Basic window digests: sum DFT coefs Basic window digests: sum DFT coefs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.