Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one Hundred Million Barrier for Time Series Motifs and Joins Yan Zhu, Zachary Zimmerman,

Slides:



Advertisements
Similar presentations
Discrimination and Classification. Discrimination Situation: We have two or more populations  1,  2, etc (possibly p-variate normal). The populations.
Advertisements

ECG Signal processing (2)
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Dynamic Programming Nithya Tarek. Dynamic Programming Dynamic programming solves problems by combining the solutions to sub problems. Paradigms: Divide.
Doruk Sart, Abdullah Mueen, Walid Najjar, Eamonn Keogh, Vit Niennatrakul 1.
A Parallel GPU Version of the Traveling Salesman Problem Molly A. O’Neil, Dan Tamir, and Martin Burtscher* Department of Computer Science.
1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)
Mining Time Series.
Introduction to Bioinformatics
Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.
Efficient Similarity Search in Sequence Databases Rakesh Agrawal, Christos Faloutsos and Arun Swami Leila Kaghazian.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
Finding Time Series Motifs on Disk-Resident Data
UNC Chapel Hill Lin/Manocha/Foskey Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject.
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Gene Matching Using JBits Steven A. Guccione Eric Keller.
Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.
Analysis of Constrained Time-Series Similarity Measures
Course 12 Calibration. 1.Introduction In theoretic discussions, we have assumed: Camera is located at the origin of coordinate system of scene.
Brandon Westover, Qiang Zhu, Jesin Zakaria, Eamonn Keogh
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.
Mining Time Series.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
1 CS 260 Winter 2014 Eamonn Keogh’s Presentation of Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu,
Abdullah Mueen Eamonn Keogh University of California, Riverside.
Discovering Deformable Motifs in Time Series Data Jin Chen CSE Fall 1.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
Exact indexing of Dynamic Time Warping
Fast Shapelets: All Figures in Higher Resolution.
A Fast and Scalable Nearest Neighbor Based Classification Taufik Abidin and William Perrizo Department of Computer Science North Dakota State University.
A split-and-merge framework for 2D shape summarization D. Gerogiannis, C. Nikou and A. Likas Department of Computer Science, University of Ioannina, Greece.
COMP 5331 Project Roadmap I will give a brief introduction (e.g. notation) on time series. Giving a notion of what we are playing with.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Outline Problem Background Theory Extending to NLP and Experiment
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject to some constraints. (There may.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
CSCI 631 – Foundations of Computer Vision March 15, 2016 Ashwini Imran Image Stitching.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Semi-Supervised Clustering
Clustering Patrice Koehl Department of Biological Sciences
Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets Chin-Chia Michael Yeh, Yan.
Efficient Image Classification on Vertically Decomposed Data
How to use… [matrixProfile, profileIndex, motifIndex, discordIndex] = interactiveMatrixProfile(data, subLen); Input data: input time series subLen: subsequence.
Discrimination and Classification
At Last! Time Series Joins, Motifs, Discords and Shapelets at Interactive Speeds  Eamonn Keogh With Yan Zhu, Chin-Chia Michael Yeh, Abdullah Mueen with.
Time Series Chains: A New Primitive for Time Series Data Mining
Efficient Image Classification on Vertically Decomposed Data
Pyramid Sketch: a Sketch Framework
Section 7.12: Similarity By: Ralucca Gera, NPS.
Place Value.
A Fast and Scalable Nearest Neighbor Based Classification
Predicting Traffic Dmitriy Bespalov.
Logistic Regression & Parallel SGD
Numerical Analysis Lecture 16.
Unit 4: Dynamic Programming
Locality Sensitive Hashing
Algorithms CSCI 235, Spring 2019 Lecture 28 Dynamic Programming III
University of Wisconsin - Madison
Simple Case Studies Using MASS
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Presentation transcript:

Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one Hundred Million Barrier for Time Series Motifs and Joins Yan Zhu, Zachary Zimmerman, Nader Shakibay Senobari Chin-Chia Michael Yeh, Gareth Funning, Abdullah Mueen, Philip Brisk, Eamonn Keogh Or, how to do four hundred ninety-nine quadrillion, nine hundred ninety-nine trillion, nine hundred ninety-nine billion, five hundred million pairwise comparisons very fast. http://www.cs.ucr.edu/~eamonn/MatrixProfile.html

Definition Review: Distance Profile A seismology time series, with two repeated earthquake patterns Query, the 1st subsequence in the time series Obtain the z-normalized Euclidean distance between Query and each window (subsequence) in the time series. We would obtain a vector like this: d1,1 d2,1 … dn-m+1,1 D1 di,j is the distance between the ist subsequence and the jth subsequence. We can obtain D2, D3, … Dn-m+1 similarly.

Definition Review: From Distance Profile to Matrix Profile jth d1,1 d1,2 … d1,n-m+1 d2,1 d2,2 d2,n-m+1 di,1 di,2 di,j di,n-m+1 dn-m+1,1 dn-m+1,2 dn-m+1,n-m+1 Note: this distance matrix is symmetric! ith di,j is the distance between the ith window and the jth window of the time series Min(D1) Min(D2) Min(Di) Min(Dn-m+1) P1 … ... Pn-m+1 Matrix Profile: a vector of distance between each subsequence and its nearest neighbor

From Matrix Profile to Motif time series matrix profile A pair of minimum points The Matrix Profile has two minimum points. This pair of minimum points correspond to the 1st motif in the time series. (the closest pair of subsequences in the time series)

𝑑 𝑖,𝑗 = 2𝑚 1− 𝑄𝑇 𝑖,𝑗 −𝑚 𝜇 𝑖 𝜇 𝑗 𝑚 𝜎 𝑖 𝜎 𝑗 Question: How to compute Matrix Profile very fast? Answer: We have an O(n2) time, O(n) space algorithm called STOMP to evaluate it. To see how it works, let us first introduce an important formula: Dot product of the ith window and the jth window. Once we know 𝑄𝑇 𝑖,𝑗 , it takes O(1) time to compute 𝑑 𝑖,𝑗 . 𝑑 𝑖,𝑗 = 2𝑚 1− 𝑄𝑇 𝑖,𝑗 −𝑚 𝜇 𝑖 𝜇 𝑗 𝑚 𝜎 𝑖 𝜎 𝑗 We precompute and store the means and stds in O(n) space.

The relationship between 𝑄𝑇 𝑖,𝑗 and 𝑄𝑇 𝑖+1,𝑗+1 … 𝒕 𝒊 𝒕 𝒊+𝟏 𝒕 𝒊+𝟐 𝒕 𝒊+𝒎−𝟏 𝒕 𝒊+𝒎 𝑄𝑇 𝑖,𝑗 = × × × … × + + + + … 𝒕 𝒋 𝒕 𝒋+𝟏 𝒕 𝒋+𝟐 𝒕 𝒋+𝒎−𝟏 𝒕 𝒋+𝒎 … 𝒕 𝒊 𝒕 𝒊+𝟏 𝒕 𝒊+𝟐 𝒕 𝒊+𝒎−𝟏 𝒕 𝒊+𝒎 × × × × + + + + 𝑄𝑇 𝑖+1,𝑗+1 = … 𝒕 𝒋 𝒕 𝒋+𝟏 𝒕 𝒋+𝟐 𝒕 𝒋+𝒎−𝟏 𝒕 𝒋+𝒎 𝑄𝑇 𝑖+1,𝑗+1 = 𝑄𝑇 𝑖,𝑗 − 𝑡 𝑖 𝑡 𝑗 + 𝑡 𝑖+𝑚 𝑡 𝑗+𝑚 𝑶 𝟏 time complexity!

STOMP Algorithm: Computing the ith line d1,1 d1,2 … d1,n-m+1 d2,1 d2,2 d2,n-m+1 di,1 di,2 di,n-m+1 dn-m+1,1 dn-m+1,2 dn-m+1,n-m+1 QTi-1,1 QTi-1,2 … QTi-1,n-m QTi-1,n-m+1 QTi,1 QTi,2 QTi,3 … QTi,n-m+1 min min min min Distance Profile di,1 di,2 di,3 … di,n-m+1 P1 P2 … Pn-m+1 Update if Smaller Matrix Profile P1 P2 P3 … Pn-m+1 We pre-calculate QTx,1 and QT1,x (x=1,2,3,…,n-m+1). Then iterate through i=2, 2, 3, …, n-m+1.

Porting the algorithm to GPU First Kernel Launch: Update Pi to Pn-m+1 … QTi-1,i-1 QTi-1,i … QTi-1,n-m QTi-1,n-m+1 … QTi,i QTi,i+1 … QTi,n-m+1 QTi-1,1 QTi-1,2 … QTi-1,n-m QTi-1,n-m+1 di,i di,i+1 … di,n-m+1 QTi,1 QTi,2 QTi,3 … QTi,n-m+1 Update if Smaller Optimize Pi Pi+1 … Pn-m+1 di,1 di,2 di,3 … di,n-m+1 Second Kernel Launch: Evaluate Final Value of Pi Update if Smaller di,i+1 di,i+2 … di,n-m+1 P1 P2 P3 … Pn-m+1 min dmin Update if Smaller Pi

Comparison of STAMP, STOMP and GPU-STOMP For a fix subsequence length m=256: time Algorithm n 217 218 219 220 STAMP 15.1 min 1.17 hours 5.4 hours 24.4 hours STOMP 4.21 min 0.3 hours 1.26 hours 5.22 hours GPU-STOMP 10 sec 18 sec 46 sec 2.5 min For large data, and for the very first time in the literature, 100,000,000 Algorithm m | n 2000 | 17,279,800 400 | 100,000,000 STAMP (estimated) 36.5 weeks 25.5 years STOMP (estimated) 8.4 weeks 5.4 years GPU-STOMP 9.27 hours 12.13 days

Comparing the speed of STOMP with existing algorithms For a time series of length 2 18 : CPU time(memory usage) Algorithm m 512 1,024 2,048 4,096 STOMP 501s (14MB) 506s (14MB) 490s (14MB) Quick-Motif 27s (65MB) 151s (90MB) 630s (295MB) 695s (101MB) MK 2040s (1.1GB) N/A (>2GB) Note: the time and space cost of STOMP is independent of how the data looks.

Case Study I: Parameter Setting There is only one parameter to set: the subsequence length m, however, the result is not sensitive to it… raw seismograph data matrix profiles 0min 30min m=4000 m=2000 m=1000

Case Study II: The Benefit of Using Matrix Profile for Motif Discovery 1st motif is a pair of sensor defects 1996 2009 5st motif is a pair of matching seismology patterns 1996, ID:30104990 2009, ID:371327705 1000 2000 3000

Case Study III: Earthquake Swarms The matrix profile of a seven-minute snippet from a seismograph recording at Mount St Helens “... so regularly that we dubbed them ‘drumbeats’. The period between successive drumbeats shifted slowly with time, but was 30–300 seconds” * We are not only providing a linear-space algorithm that is much faster than all existing motif-discovery algorithms; we are actually providing much more information than just the top k motifs in the time series with STOMP.

Case Study III: Penguin Telemetry 7.5 hours of recording at 40Hz took GPU-STOMP only 2.5 minute to run. 514,000 524,000 -0.1 0.1 0.2 Y-axis magnetometry 1000 2000

Summary We introduced STOMP and GPU-STOMP, the first algorithm that is capable to discover motifs for the longest time series in the literature, 100,000,000. The algorithm costs only linear space and the speed is independent of how the data looks like. Matrix Profile provides the information of the nearest neighbors of all subsequences in the time series. STOMP can discover much more than just motifs. Paper, code and datasets available at: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html

Questions?