High Performance Discovery from Time Series Streams

Slides:



Advertisements
Similar presentations
On an Improved Chaos Shift Keying Communication Scheme Timothy J. Wren & Tai C. Yang.
Advertisements

1 Fast Calculations of Simple Primitives in Time Series Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences New York.
A Comparison of HTTP and HTTPS Performance Arthur Goldberg, Robert Buff, Andrew Schmitt [artg, buff, Computer Science Department Courant.
Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Time Series II.
State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
MATLAB Session 5 ES 156 Signals and Systems 2007 HSEAS Prepared by Frank Tompkins.
Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.
1 StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time Pankaj Kumar Madhukar Rakesh Kumar Singh Puspendra Kumar Project Instructor:
Efficient Anomaly Monitoring over Moving Object Trajectory Streams joint work with Lei Chen (HKUST) Ada Wai-Chee Fu (CUHK) Dawei Liu (CUHK) Yingyi Bu (Microsoft)
Streaming Pattern Discovery in Multiple Time-Series Spiros Papadimitriou Jimeng Sun Christos Faloutsos Carnegie Mellon University VLDB 2005, Trondheim,
Indexing Time Series. Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time.
Fast Algorithms for Time Series with applications to Finance, Physics, Music and other Suspects Dennis Shasha Joint work with Yunyue Zhu, Xiaojian Zhao,
Data Mining: Concepts and Techniques Mining time-series data.
Abdullah Mueen UC Riverside Suman Nath Microsoft Research Jie Liu Microsoft Research.
Monitoring Methods for Topic Drift in Message Streams By Christopher Ross & S. Muthu Muthukrishnan.
Online Pattern Discovery Applications in Data Streams Sensor-less: Pairs-trading in stock trading (find highly correlated pairs in n log n time) Sensor-full:
Elastic Burst Detection: Applications Discovering intervals with an unusually large numbers of events. –In astrophysics, the sky is constantly observed.
Jessica Lin, Eamonn Keogh, Stefano Loardi
High Performance Correlation Techniques For Time Series
Continuous Data Stream Processing
Dunja Mladenić Marko Grobelnik Jožef Stefan Institute, Slovenia.
Based on Slides by D. Gunopulos (UCR)
Chapter 12 Fourier Transforms of Discrete Signals.
Multi-Resolution Analysis (MRA)
Fast Algorithms for Time Series with applications to Finance, Physics, Music and other Suspects Dennis Shasha Joint work with Yunyue Zhu, Xiaojian Zhao,
Research Project Next Year Three is probably the limit for any instructor, and 2 is more reasonable Think about what you really want—you can do nearshore.
Indexing Time Series.
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
Discrete Time Periodic Signals A discrete time signal x[n] is periodic with period N if and only if for all n. Definition: Meaning: a periodic signal keeps.
T Digital Signal Processing and Filtering
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Function approximation: Fourier, Chebyshev, Lagrange
Optimal distance estimation on compressed data (the data mining perspective) Nick Freris LCAV, EPFL November 4, 2013.
R ESEARCH BY E LAINE C HEW AND C HING -H UA C HUAN U NIVERSITY OF S OUTHERN C ALIFORNIA P RESENTATION BY S EAN S WEENEY D IGI P EN I NSTITUTE OF T ECHNOLOGY.
EVENT MANAGEMENT IN MULTIVARIATE STREAMING SENSOR DATA National and Kapodistrian University of Athens.
1 Chapter 5 Image Transforms. 2 Image Processing for Pattern Recognition Feature Extraction Acquisition Preprocessing Classification Post Processing Scaling.
BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos.
BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005.
Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
Signals and Systems Lecture #5 1. Complex Exponentials as Eigenfunctions of LTI Systems 2. Fourier Series representation of CT periodic signals 3. How.
Abdullah Mueen Eamonn Keogh University of California, Riverside.
CHEE825 Fall 2005J. McLellan1 Spectral Analysis and Input Signal Design.
Chapter 6 Spectrum Estimation § 6.1 Time and Frequency Domain Analysis § 6.2 Fourier Transform in Discrete Form § 6.3 Spectrum Estimator § 6.4 Practical.
Motivation: Wavelets are building blocks that can quickly decorrelate data 2. each signal written as (possibly infinite) sum 1. what type of data? 3. new.
A Research Sampler dex.html.
Signal Dragging Signal Dragging: Effects of Terminal Movement on War-Driving in CDMA/WCDMA Networks Daehyung Jo MMLab., Seoul National University LNCS.
Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.
NSF Career Award IIS University of California Riverside Eamonn Keogh Efficient Discovery of Previously Unknown Patterns and Relationships.
Fourier Transform.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
The Discrete Fourier Transform
Predictive Analytics derived from HVAC and PMU data at UCSD Chuck Wells Industry Principal OSIsoft, LLC 1.
The Frequency Domain Digital Image Processing – Chapter 8.
An inner product on a vector space V is a function that, to each pair of vectors u and v in V, associates a real number and satisfies the following.
Enabling Real Time Alerting through streaming pattern discovery Chengyang Zhang Computer Science Department University of North Texas 11/21/2016 CRI Group.
Ch 10.2: Fourier Series We will see that many important problems involving partial differential equations can be solved, provided a given function can.
Arthur Whitney, Chief Technical Officer, KX Systems.
High Frequency Trading
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,
Generalized Hough Transform
Fast Fourier Transform
Data Mining: Concepts and Techniques — Chapter 8 — 8
Fast Algorithms for Time Series with applications to Finance, Physics, Music and other Suspects Dennis Shasha Joint work with Yunyue Zhu, Xiaojian Zhao,
Lecture 17 DFT: Discrete Fourier Transform
Chapter 2 Discrete Fourier Transform (DFT)
Data Mining: Concepts and Techniques — Chapter 8 — 8
Online Analytical Processing Stream Data: Is It Feasible?
Data Mining: Concepts and Techniques — Chapter 8 — 8
Presentation transcript:

High Performance Discovery from Time Series Streams Dennis Shasha Joint work with Yunyue Zhu yunyue@cs.nyu.edu shasha@cs.nyu.edu Courant Institute, New York University

Overall Outline Data mining – both classical and activist Algorithmic tools for time series Surprise.

Goal of this work Time series are important in so many applications – biology, medicine, finance, music, physics, … A few fundamental operations occur all the time: burst detection, correlation, pattern matching. Do them fast to make data exploration faster, real time, and more fun. Extend functionality for music and science.

StatStream (VLDB,2002): Example Stock prices streams The New York Stock Exchange (NYSE) 50,000 securities (streams); 100,000 ticks (trade and quote) Pairs Trading, a.k.a. Correlation Trading Query:“which pairs of stocks were correlated with a value of over 0.9 for the last three hours?” XYZ and ABC have been correlated with a correlation of 0.95 for the last three hours. Now XYZ and ABC become less correlated as XYZ goes up and ABC goes down. They should converge back later. I will sell XYZ and buy ABC …

Online Detection of High Correlation Given tens of thousands of high speed time series data streams, to detect high-value correlation, including synchronized and time-lagged, over sliding windows in real time. Real time high update frequency of the data stream fixed response time, online Correlated!

Online Detection of High Correlation Given tens of thousands of high speed time series data streams, to detect high-value correlation, including synchronized and time-lagged, over sliding windows in real time. Real time high update frequency of the data stream fixed response time, online

Online Detection of High Correlation Given tens of thousands of high speed time series data streams, to detect high-value correlation, including synchronized and time-lagged, over sliding windows in real time. Real time high update frequency of the data stream fixed response time, online Correlated!

StatStream: Algorithm Naive algorithm N : number of streams w : size of sliding window space O(N) and time O(N2w) VS space O(N2) and time O(N2) . Suppose that the streams are updated every second. With a Pentium 4 PC, the exact computing method can only monitor 700 streams with a delay of 2 minutes. Our Approach Use Discrete Fourier Transform to approximate correlation Use grid structure to filter out unlikely pairs Our approach can monitor 10,000 streams with a delay of 2 minutes.

StatStream: Stream synoptic data structure Three level time interval hierarchy Time point, Basic window, Sliding window Basic window (the key to our technique) The computation for basic window i must finish by the end of the basic window i+1 The basic window time is the system response time. Digests Basic window digests: sum DFT coefs Sliding window Basic window Time point

StatStream: Stream synoptic data structure Three level time interval hierarchy Time point, Basic window, Sliding window Basic window (the key to our technique) The computation for basic window i must finish by the end of the basic window i+1 The basic window time is the system response time. Digests Basic window digests: sum DFT coefs Sliding window Basic window Time point Basic window digests: sum DFT coefs

StatStream: Stream synoptic data structure Three level time interval hierarchy Time point, Basic window, Sliding window Basic window (the key to our technique) The computation for basic window i must finish by the end of the basic window i+1 The basic window time is the system response time. Digests Basic window digests: sum DFT coefs Sliding window Basic window Time point Basic window digests: sum DFT coefs Sliding window digests: sum DFT coefs

StatStream: Stream synoptic data structure Three level time interval hierarchy Time point, Basic window, Sliding window Basic window (the key to our technique) The computation for basic window i must finish by the end of the basic window i+1 The basic window time is the system response time. Digests Basic window digests: sum DFT coefs Sliding window Basic window Time point Basic window digests: sum DFT coefs Sliding window digests: sum DFT coefs

StatStream: Stream synoptic data structure Three level time interval hierarchy Time point, Basic window, Sliding window Basic window (the key to our technique) The computation for basic window i must finish by the end of the basic window i+1 The basic window time is the system response time. Digests Basic window digests: sum DFT coefs Basic window digests: sum DFT coefs Basic window digests: sum DFT coefs Time point Basic window Sliding window

Synchronized Correlation Uses Basic Windows Inner-product of aligned basic windows Stream x Stream y Basic window Sliding window Inner-product within a sliding window is the sum of the inner-products in all the basic windows in the sliding window.

Approximate Synchronized Correlation Approximate with an orthogonal function family (e.g. DFT) x1 x2 x3 x4 x5 x6 x7 x8 f1(1) f1(2) f1(3) f1(4) f1(5) f1(6) f1(7) f1(8) f2(1) f2(2) f2(3) f2(4) f2(5) f2(6) f2(7) f2(8) f3(1) f3(2) f3(3) f3(4) f3(5) f3(6) f3(7) f3(8)

Approximate Synchronized Correlation Approximate with an orthogonal function family (e.g. DFT) x1 x2 x3 x4 x5 x6 x7 x8

Approximate Synchronized Correlation Approximate with an orthogonal function family (e.g. DFT) x1 x2 x3 x4 x5 x6 x7 x8 y1 y2 y3 y4 y5 y6 y7 y8

Approximate Synchronized Correlation Approximate with an orthogonal function family (e.g. DFT) Inner product of the time series Inner product of the digests The time and space complexity is reduced from O(b) to O(n). b : size of basic window n : size of the digests (n<<b) e.g. 120 time points reduce to 4 digests x1 x2 x3 x4 x5 x6 x7 x8 y1 y2 y3 y4 y5 y6 y7 y8

Approximate lagged Correlation Inner-product with unaligned windows sliding window The time complexity is reduced from O(b) to O(n2) , as opposed to O(n) for synchronized correlation. Reason: terms for different frequencies are non-zero in the lagged case.

Grid Structure(to avoid checking all pairs) The DFT coefficients yields a vector. High correlation => closeness in the vector space We can use a grid structure and look in the neighborhood, this will return a super set of highly correlated pairs. x

Empirical Study : Speed Our algorithm is parallelizable.

Empirical Study: Precision Approximation errors Larger size of digests, larger size of sliding window and smaller size of basic window give better approximation The approximation errors are small for the stock data.

Sketches : Random Projection Correlation between time series of the returns of stock Since most stock price time series are close to random walks, their return time series are close to white noise DFT/DWT can’t capture approximate white noise series because there is no clear trend (too many frequency components). Solution : Sketches (a form of random landmark) Sketches pool: matrix of random variables drawn from stable distribution Sketches : The random projection of all time series to lower dimensions by multiplication with the same matrix The Euclidean distance (correlation) between time series is approximated by the distance between their sketches with a probabilistic guarantee.

Burst Detection

Burst Detection: Applications Discovering intervals with unusually large numbers of events. In astrophysics, the sky is constantly observed for high-energy particles. When a particular astrophysical event happens, a shower of high-energy particles arrives in addition to the background noise. Might last milliseconds or days… In telecommunications, if the number of packages lost within a certain time period exceeds some threshold, it might indicate some network anomaly. Exact duration is unknown. In finance, stocks with unusual high trading volumes should attract the notice of traders (or perhaps regulators).

Bursts across different window sizes in Gamma Rays Challenge : to discover not only the time of the burst, but also the duration of the burst.

Elastic Burst Detection: Problem Statement Problem: Given a time series of positive numbers x1, x2,..., xn, and a threshold function f(w), w=1,2,...,n, find the subsequences of any size such that their sums are above the thresholds: all 0<w<n, 0<m<n-w, such that xm+ xm+1+…+ xm+w-1 ≥ f(w) Brute force search : O(n^2) time Our shifted wavelet tree (SWT): O(n+k) time. k is the size of the output, i.e. the number of windows with bursts

Burst Detection: Data Structure and Algorithm Define threshold for node for size 2k to be threshold for window of size 1+ 2k-1

Burst Detection: Example

Burst Detection: Example False Alarm True Alarm

False Alarms (requires work, but no errors)

Empirical Study : Gamma Ray Burst

Extension to other aggregates SWT can be used for any aggregate that is monotonic SUM, COUNT and MAX are monotonically increasing the alarm threshold is aggregate<threshold MIN is monotonically decreasing Spread =MAX-MIN Application in Finance Stock with burst of trading or quote(bid/ask) volume (Hammer!) Stock prices with high spread

Empirical Study : Stock Price Spread Burst

Extension to high dimensions

Elastic Burst in two dimensions Population Distribution in the US

How to find the threshold for Elastic Burst? Suppose that the moving sum of a time series is a random variable from a normal distribution. Let the number of bursts in the time series within sliding window size w be So(w) and its expectation be Se(w). Se(w) can be computed from the historical data. Given a threshold probability p, we set the threshold of burst f(w) for window size w such that Pr[So(w) ≥ f(w)] ≤p.

Find threshold for Elastic Bursts Φ(x) is the normal cdf, so symmetric around 0: Therefore Φ(x) p x Φ-1(p)

Summary Able to detect bursts of many different durations in essentially linear time. Can be used both for time series and for spatial searching. Can specify thresholds either with absolute numbers or with probability of hit. Algorithm is simple to implement and has low constants (code is available). Ok, it’s embarrassingly simple.

With a Little Help From My Warped Correlation Karen’s humming Match: Dennis’s humming Match: “What would you do if I sang out of tune?" Yunyue’s humming Match:

Related Work in Query by Humming Traditional method: String Matching [Ghias et. al. 95, McNab et.al. 97,Uitdenbgerd and Zobel 99] Music represented by string of pitch directions: U, D, S (degenerated interval) Hum query is segmented to discrete notes, then string of pitch directions Edit Distance between hum query and music score Problem Very hard to segment the hum query Partial solution: users are asked to hum articulately New Method : matching directly from audio [Mazzoni and Dannenberg 00] slowed down by DTW

Time Series Representation of Query Segment this! An example hum query Note segmentation is hard!

How to deal with poor hum queries? No absolute pitch Solution: the average pitch is subtracted Incorrect tempo Solution: Uniform Time Warping Inaccurate pitch intervals Solution: return the k-nearest neighbors Local timing variations Solution: Dynamic Time Warping

Dynamic Time Warping Euclidean distance: sum of point-by-point distance DTW distance: allowing stretching or squeezing the time axis locally

Envelope Transform using Piecewise Aggregate Approximation(PAA) [Keogh VLDB 02]

Envelope Transform using Piecewise Aggregate Approximation(PAA) Advantage of tighter envelopes Still no false negatives, and fewer false positives

Container Invariant Envelope Transform Container-invariant A transformation T for envelope such that Theorem: if a transformation is Container-invariant and Lower-bounding, then the distance between transformed times series x and transformed envelope of y lower bound their DTW distance. Feature Space

The Vision Ability to match time series quickly may open up entire new application areas, e.g. fast reaction to external events, music by humming and so on. Main problems: accuracy, excessive specification. Reference (advert): High Performance Discovery in Time Series (Springer 2004)