At Last! Time Series Joins, Motifs, Discords and Shapelets at Interactive Speeds  Eamonn Keogh With Yan Zhu, Chin-Chia Michael Yeh, Abdullah Mueen with.

Slides:



Advertisements
Similar presentations
1 Normal Probability Distributions. 2 Review relative frequency histogram 1/10 2/10 4/10 2/10 1/10 Values of a variable, say test scores In.
Advertisements

Mining Time Series.
Algebra Problems… Solutions Algebra Problems… Solutions © 2007 Herbert I. Gross Set 4 By Herb I. Gross and Richard A. Medeiros next.
Section 2.3 Gauss-Jordan Method for General Systems of Equations
Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.
Jessica Lin, Eamonn Keogh, Stefano Loardi
Time Series Bitmap Experiments This file contains full color, large scale versions of the experiments shown in the paper, and additional experiments which.
Data Mining.
CHAPTER 6 Statistical Analysis of Experimental Data
Applying the Standard Normal Distribution to Problems
05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.
Copyright © Cengage Learning. All rights reserved.
An Unbiased Distance-based Outlier Detection Approach for High Dimensional Data DASFAA 2011 By Hoang Vu Nguyen, Vivekanand Gopalkrishnan and Ira Assent.
Time Series Anomaly Detection Experiments This file contains full color, large scale versions of the experiments shown in the paper, and additional experiments.
Copyright © Cengage Learning. All rights reserved. 1 Functions and Limits.
Rounding Off Whole Numbers © Math As A Second Language All Rights Reserved next #5 Taking the Fear out of Math.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Sequencing a genome and Basic Sequence Alignment
Mining Time Series.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
If I were to count to a million, how long would it take? By Vincent Sapone.
1 CS 260 Winter 2014 Eamonn Keogh’s Presentation of Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu,
Abdullah Mueen Eamonn Keogh University of California, Riverside.
Semi-Supervised Time Series Classification & DTW-D REPORTED BY WANG YAWEN.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.
NSF Career Award IIS University of California Riverside Eamonn Keogh Efficient Discovery of Previously Unknown Patterns and Relationships.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Chapter 14 Section 3 - Slide 1 Copyright © 2009 Pearson Education, Inc. AND.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
INTEGRALS 5. INTEGRALS In Chapter 3, we used the tangent and velocity problems to introduce the derivative—the central idea in differential calculus.
BASIC GRAPHS Physics with Technology. Scatter Plot  When doing a lab, we often graph a series of points. This is called a scatter plot. Scatter plot.
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
5.3 Trigonometric Graphs.
Calculus, Section 1.4.
Matrix Profile Examples
Math For People Who Don’t Like Math
Confidence Interval Estimation
Debugging Intermittent Issues
Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets Chin-Chia Michael Yeh, Yan.
Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one Hundred Million Barrier for Time Series Motifs and Joins Yan Zhu, Zachary Zimmerman,
The Sine Rule The Cosine Rule
How to use… [matrixProfile, profileIndex, motifIndex, discordIndex] = interactiveMatrixProfile(data, subLen); Input data: input time series subLen: subsequence.
Supervised Time Series Pattern Discovery through Local Importance
Debugging Intermittent Issues
E-Commerce Theories & Practices
A Time Series Representation Framework Based on Learned Patterns
Copyright 2012, 2008, 2004, 2000 Pearson Education, Inc.
Linear graphs.
Time Series Chains: A New Primitive for Time Series Data Mining
Legislative Influence Detector
Baselining PMU Data to Find Patterns and Anomalies
How to Start This PowerPoint® Tutorial
CIS 4930/6930, Spring 2018 Experiment 1: Encounter Tracing using Bluetooth Due Date: Feb 19, beginning of class Ph.D. student lead: Mimonah Al-Qathrady.
We understand classification algorithms in terms of the expressiveness or representational power of their decision boundaries. However, just because your.
Copyright © Cengage Learning. All rights reserved.
Conjoint Analysis.
Introduction Previous lessons have demonstrated that the normal distribution provides a useful model for many situations in business and industry, as.
MIS2502: Data Analytics Clustering and Segmentation
How to Start This PowerPoint® Tutorial
TECHNIQUES OF INTEGRATION
Active Listening Day #1 Intro to Leadership CS 302 Lesson
Image Segmentation.
The Normal Curve Section 7.1 & 7.2.
Applying principles of computer science in a biological context
Solving Problems in Groups
The Pigeonhole Principle
Simple Case Studies Using MASS
1-10: Learning Goals Download for free at openupresources.org.
Intro to Inference & The Central Limit Theorem
Presentation transcript:

At Last! Time Series Joins, Motifs, Discords and Shapelets at Interactive Speeds  Eamonn Keogh With Yan Zhu, Chin-Chia Michael Yeh, Abdullah Mueen with contributions from Zachary Zimmerman, Nader Shakibay Senobari,, Gareth Funning, Philip Brisk, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau and Diego Silva

Outline In this talk I will introduce the Matrix Profile. I believe that the Matrix Profile will become the most cited and the most used time series data mining primitive introduced in the last decade. The Matrix Profile has implications for all shape-based time series data mining tasks, including: Classification, Clustering, Motif Discovery, Anomaly Detection, Joins, Density Estimation, Visualization, Semantic Segmentation and Rule Discovery. Among other things, the Matrix Profile allows time series batch operations to become truly interactive for the first time (Hench this talk) First, some boilerplate slides on time series…

The Ubiquity of Time Series Sensors on machines Stock prices Hand writing Humans measure stuff, and stuff keeps changing, thus we have time series everywhere. 50 100 150 200 250 300 350 400 450 0.5 1 In addition to the domain of biometric, there are a lot other domains that have time series. For example, sensors on the machines, the stock price, web clicks, etc. In the next slide, I want 200 400 600 800 1000 1200 Shapes Astronomy: star light curves Web clicks Political Forecasts Sound

What do we want to do with all this Time Series? The answer is… Everything! Classification, Clustering, Motif Discovery, Anomaly Detection, Joins, Density Estimation, Visualization, Semantic Segmentation and Rule Discovery. What is the umpire signaling? In the last decade the community has come to the conclusion that if you can just measure similarity meaningfully for your domain, you can solve all these problems (possibly too slowly to be practical) Therefore, computing similarity is typically the bottleneck for time series data mining. 100 200 300 400 500 600 700 Normal sequence Actor misses holster Briefly swings gun at target, but does not aim Laughing and flailing hand In addition to the domain of biometric, there are a lot other domains that have time series. For example, sensors on the machines, the stock price, web clicks, etc. In the next slide, I want How should we group these signals? PPG How is this man doing? (not well!)

Introduction to the Matrix Profile With the context explained, let us take a first look at the Matrix Profile We will begin by defining it (without discussing how we compute it) We will then show how it solves most time series problems In addition to the domain of biometric, there are a lot other domains that have time series. For example, sensors on the machines, the stock price, web clicks, etc. In the next slide, I want Finally, we will address the elephant in the room… ...the matrix profile seems to be much too expensive to compute to practical.

Intuition behind the Matrix Profile: Assume we have a time series T, lets start with a synthetic one... 500 1000 1500 2000 2500 3000 |T | = n = 3,000

Note that for most time series data mining tasks, we are not interested in any global properties of the time series, we are only interested in small local subsequences, of this length, m These subsequences might be about the length of individual heartbeats (for ECGs), individual days (for social media behavior), individual words (for speech analysis) etc m = 100 500 1000 1500 2000 2500 3000

I have created a companion “time series”, called a matrix profile (or just profile). The matrix profile at the ith location records the distance of the subsequence in T, at the ith location, to its nearest neighbor. For example, in the below, the subsequence starting at 921 happens to have a distance of 177.0 to its nearest neighbor (wherever it is). 200 177 500 1000 1500 2000 2500 3000 921

Another example. In the below, the subsequence starting at 378 happens to have a distance of 34.2 to its nearest neighbor (wherever it is). 200 34.1 500 1000 1500 2000 2500 3000 378

I have created another companion sequence, called a matrix profile index. In the following slides I won’t bother to show the matrix profile index, but be aware it exists, and it allows us to find the nearest neighbor to any subsequence in constant time. 200 34.1 500 1000 1500 2000 2500 3000 1373 1375 1389 … .. 368 378 234 matrix profile index (zoom in )

You may have realized that computing the matrix profile is very expensive! If a single Euclidian distance calculation takes 0.0001 seconds, then computing the matrix profile for tiny dataset below takes 7.5 minutes! We will come back to this issue later. ((3000 * 2999) / 2) * 0.0001 seconds = 7.49 minutes 200 34.1 500 1000 1500 2000 2500 3000

Overarching Claim  Given the Matrix Profile, then virtually every time series data mining task is either trivial or easy. In next few slides I will show examples for… Motif Discovery Anomaly Detection (Discord Discovery) Joins (Both self joins, and AB-Joins) ..but the same is true for Classification, Clustering, Semantic Segmentation, Visualization, Density Estimation and Rule Discovery.

The matrix profile has some interesting properties... First, the pair of lowest values (it must be a tying pair) are the time series motif. Other definitions of motif can be found quickly using the matrix profile (discussion omitted) 200 34.1 500 1000 1500 2000 2500 3000 I will show some other, more exciting examples of motifs later…

The matrix profile has some interesting properties... Second, the highest values corresponds to the time series discord (an anomaly) To see this, let us consider another dataset. Below is a slightly noisy sine wave. I have added an anomaly by taking the absolute value in the region between 1,000 and 1,200. What would the matrix profile look like for this time series? (next slide). 500 1000 1500 2000 2500 3000 Vipin Kumar performed an extensive empirical evaluation and noted that “..on 19 different publicly available data sets, comparing 9 different techniques (time series discords) is the best overall technique.”. V. Chandola, D. Cheboli, V. Kumar. Detecting Anomalies in a Time Series Database. UMN TR09-004

The matrix profile has some interesting properties... Second, the highest values corresponds to the time series discord (an anomaly). The matrix profile strongly encodes (“peaks at”) the anomaly. 500 1000 1500 2000 2500 3000 Vipin Kumar performed an extensive empirical evaluation and noted that “..on 19 different publicly available data sets, comparing 9 different techniques (time series discords) is the best overall technique.”. V. Chandola, D. Cheboli, V. Kumar. Detecting Anomalies in a Time Series Database. UMN TR09-004

Before Moving On I want to show you that the nice intuitive properties of the matrix profile are not limited to clean synthetic data. Let quickly us see examples in real data…. one example of discords (ECG data) one example motifs (Industrial data)

Matrix Profiles as Anomaly Detectors: 1 of 2 An anomaly, a premature ventricular contraction ECG qtdb/sel102 (excerpt) 500 1000 1500 2000 2500 3000 Let us use a matrix profile to see if we can spot this anomaly (next slide)

Matrix Profiles as Anomaly Detectors: 2 of 2 The alignment of the peak of the matrix profile and the ground truth is sharp and perfect! ECG qtdb/sel102 (excerpt) 18 16 14 12 10 8 6 matrix profile 4 2 500 1000 1500 2000 2500 3000

Motif Discovery: Industrial Data: 1 0.8 0.6 0.4 0.2 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 This is real industrial data I have worked on. However, I have changed some details to comply with an NDA. The data is about six months long, and is annotated (not shown) by the quality of the yield produced.

This is the original time series We ran the data through a tool that computes the matrix profile, then extracts the top three motifs sets, and the top three discords. This is the original time series Here is the matrix profile This is the top motif This is the second motif This is the third motif There are the three most unusual patterns

Note that there appear to be three regimes discovered An 8-degree ascending slope A 4-degree ascending slope A 0-degree constant slope (everything above this line is true, below this line is speculation or obfuscated for privacy) We can now ask are the regimes associated with yield quality, by looking up the yield numbers on the days in question. We find.. A = {bad, bad, fair, bad, fair, bad, bad} B = {bad, good, fair, bad, fair, good, fair} C = {good, good, good, good, good, good, good} So yes! This patterns appear to be precursors to the quality of yield (we have not fully teased out causality here). So now we can monitor for patterns “B” and “A” and sound an alarm if we see them, take action, and improve quality. 8 degrees 4 degrees 0 degrees

In passing, how long does this take? If done in a brute-force manner, doing this would take 144 days. Say each Euclidean distance comparison takes 0.0001 seconds. (500000 * ((500000 - 1) / 2) * 0.0001) * seconds =144.67 days 8 degrees 4 degrees 0 degrees

Generalizing to Joins A Matrix Profile can be seen as a self-join It is trivial to generalize it to an AB-join For every subsequence in A, find its closest subsequence in B Note that this is not symmetric in general Surprisingly, there is almost no work on time series joins. Let us see some trivial examples, then discuss useful applications

Can you see any common structure between the two time series below? Hint, it is probably about this length 10,000 20,000

The data is the 2nd MFCC of two songs, Under Pressure and Ice Ice Baby Queen-Bowie Vanilla Ice 10,000 20,000 A zoom-in of the best conserved region between the two time series (similarity join) 250 500 -3 -2 -1 1 2

What is different between the two time series? In the previous example I asked you to find “common structure between the two time series” Now I am going to ask you the opposite question. What is different between the two time series? Hint, it is probably about this length 100 200 300 400 500 600 700 800 UK US

Here the difference is due to a unique phrase that only appears in the USA version of the Harry Potter books. Closest Match ED = 2.8 since his first year at Hogwarts and owned a Fire.. since his first year at Hogwarts and owned on.. Furthest Match ED = 10.7 …indor house Quidditch team ever since his first ye… Harry had been on the Gryffindor House Quidditch te.. (1.6 seconds) 100 UK version : Harry was passionate about Quidditch. He had played as Seeker on the Gryffindor house Quidditch team ever since his first year at Hogwarts and owned a Firebolt, one of the best racing brooms in the world... USA version : Harry had been on the Gryffindor House Quidditch team ever since his first year at Hogwarts and owned one of the best racing brooms in the world, a Firebolt.

It is possible to convert DNA to time series. L.pneumophila Paris L. pneumophila Lens It is possible to convert DNA to time series. Here we converted two of the 180 known strains of Legionella, L. pneumophila Paris and L. pneumophila Lens, which consist of 3,503,504 and 3,345,567 bp respectively. On a hunch, lets flip one of them left to right, then join them… (next slide) 1,000,000 2,000,000 3,000,000

Here the dimensionality was 100,000!! Real-valued similarity joins normally scale very poorly in dimensionality. A dimensionality of 40 is much harder than a dimensionality of 20. Here the dimensionality was 100,000!! Moreover, they scale poorly on dataset size, here the data sizes are of 3,503,504 and 3,345,567. How was this possible? Lens: 1591412 to 1691411 bp Paris :1769196 to 1869195 bp (plotted in reverse) 100,000 200,000 1,000,000 2,000,000 3,000,000

The Utility of Joins I believe that time series joins are the killer app Given two insects.. or two patients, or two processing runs, or two space shuttle launches, or two golf swings, or two ad campaigns, or two medical interventions… What is conserved, what is different? 100 200 300 400 500 10,000 20,000 30,000 Approximately 14.4 minutes of insect telemetry

Computing the Matrix Profile Computing the Matrix Profile with a brute force algorithm takes O(n2m) We have an algorithm, STOMP, that takes O(n2). Because (recall the DNA example) m can be 100,000, this is a significant speed-up. But wait! There’s more! We can cast our algorithm in an anytime framework, making it even faster by a factor of about 100. Once the Matrix profile is computed, we can maintain it at 20Hz-plus forever (as an implication, this means we have invented the first exact online motif discovery algorithm, the first exact online discord discovery algorithm) It can trivially exploit hardware, such as GPUs, cloud computing etc. Lets put all this into perspective (next few slides)

Remember this example? We said it would take 144 days, if done in a brute-force manner. We did this in 4 seconds (cheap desktop machine). We can do 99% of the datasets people care about interactively. As the time series has about 500,000 datapoints, to produce the matrix profile we have to compute one hundred twenty-four billion, nine hundred ninety-nine million, seven hundred fifty thousand pairwise calculations.

Remember this example? We said it would take 144 days, if done in a brute-force manner, we did this in 4 seconds (cheap desktop). We can do 99% of the datasets people care about interactively. As the time series has about 500,000 datapoints, to produce the matrix profile we have to compute one hundred twenty-four billion, nine hundred ninety-nine million, seven hundred fifty thousand pairwise calculations. That sounds like a lot, but we have recently done four hundred ninety-nine quadrillion, nine hundred ninety-nine trillion, nine hundred ninety-nine billion, five hundred million pairwise comparisons. This is surely the largest exact join ever attempted.

Conclusions (to be followed by one more example) We have introduced a new data structure for time series, the Matrix Profile. We have heard the overarching claim: Once you have the Matrix Profile, all time series data mining tasks become trivial. We have seen examples in Motif Discovery, Anomaly Detection and Joins, and you will take my word for Classification, Clustering,, Density Estimation, Visualization, Semantic Segmentation and Rule Discovery (or ask the see the cool examples) We heard (in passing) about STAMP, STOMP and STAMPi a family of algorithms for computing the Matrix Profile very quickly. Let us conclude with one final example, that highlights typical interactions with data….

Penguin Telemetry Case Study This is a very common situation. We are given a data “dump”, with almost no context. How can we make sense of this data? We can interactively explore it with our Matrix Profile tool…. 1 2 3 4 5

Penguin Telemetry With just five minutes of “playing” with the data, I found a stunning regularity. A few seconds before each dive, the penguin performs this “shark-fin” like behavior. Thus, I have found a precursor rule… 1000 2000 Now I am done

Questions?