Presentation is loading. Please wait.

Presentation is loading. Please wait.

Online Pattern Discovery Applications in Data Streams Sensor-less: Pairs-trading in stock trading (find highly correlated pairs in n log n time) Sensor-full:

Similar presentations


Presentation on theme: "Online Pattern Discovery Applications in Data Streams Sensor-less: Pairs-trading in stock trading (find highly correlated pairs in n log n time) Sensor-full:"— Presentation transcript:

1 Online Pattern Discovery Applications in Data Streams Sensor-less: Pairs-trading in stock trading (find highly correlated pairs in n log n time) Sensor-full: Gamma Ray Detection in astrophysics (burst detection over a large number of window sizes in almost linear time) Dennis Shasha (joint work with Yunyue Zhu) yunyue,shasha@cs.nyu.edu

2 Application 1: Pairs Trading Stock prices streams –The New York Stock Exchange (NYSE) –50,000 securities (streams); 100,000 ticks (trade and quote) Pairs Trading, a.k.a. Correlation Trading Query:“which pairs of stocks were correlated with a value of over 0.9 for the last three hours?” XYZ and ABC have been correlated with a correlation of 0.95 for the last three hours. Now XYZ and ABC become less correlated as XYZ goes up and ABC goes down. They should converge back later. I will sell XYZ and buy ABC …

3 Online Detection of High Correlation Given tens of thousands of high speed time series data streams, to detect high-value correlation, including synchronized and time-lagged, over sliding windows in real time. Real time –high update frequency of the data stream –fixed response time, online Correlated!

4 StatStream: Algorithm Naive algorithm –N : number of streams –w : size of sliding window –space O(N) and time O(N 2 w) VS space O(N 2 ) and time O(N 2 ). Suppose that the streams are updated every second. –With a Pentium 4 PC, the exact method can monitor only 700 streams with a delay of 2 minutes. Our Approach –Discrete Fourier Transform to approximate correlation –grid structure to filter out unlikely pairs –Our approach can monitor 10,000 streams with a delay of 2 minutes.

5 StatStream: Stream synoptic data structure Three level time interval hierarchy –Time point, Basic window, Sliding window Basic window (the key to our technique) –The computation for basic window i must finish by the end of the basic window i+1 –The basic window time is the system response time. Digests Sliding window digests: sum DFT coefs Basic window digests: sum DFT coefs Sliding window Basic window Time point Basic window digests: sum DFT coefs Basic window digests: sum DFT coefs Basic window digests: sum DFT coefs

6 Application 2: elastic burst detection Discover time intervals with an unusually large numbers of events. –In astrophysics, the sky is constantly observed for high-energy particles. When a particular astrophysical event happens, a shower of high-energy particles arrives in addition to the background noise. –In finance, stocks with unusual high trading volumes should attract the notice of traders (or perhaps regulators). Challenge : to discover time and duration of burst, which may vary –In astrophysics, a burst of high-energy particles associated with a special event might last for a few milliseconds or a few hours or even a few days NB: Similar idea may apply to spatial burst detection.

7 Application 2: burst detection example

8 Burst Detection: Problem Statement Problem:Given a time series of positive number x 1, x 2,..., x n, and a threshold function f(w), w=1,2,...,n, find the subsequences of any size such that their sums are above the thresholds: –all 0 f(w) Brute force search : O(n^2) time Our shift wavelet tree (SWT): O(n+k) time. –k is the size of the output, i.e. the number of windows with bursts

9 Burst Detection: Data Structure and Algorithm –Lemma 1:any subsequence s is included by one window w in the SWT. –Lemma 2: if Sum(s)>threshold, then Sum(w)>threshold (no false positives).


Download ppt "Online Pattern Discovery Applications in Data Streams Sensor-less: Pairs-trading in stock trading (find highly correlated pairs in n log n time) Sensor-full:"

Similar presentations


Ads by Google