Download presentation
Presentation is loading. Please wait.
Published byJeremy Ellis Modified over 6 years ago
1
Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova Nurjahan Begum, Yifei Ding, Hoang Anh Dau Diego Furtado Silva, Abdullah Mueen, Eamonn Keogh
2
Motivation Given a time series, T and a desired subsequence length, m
3
Motivation Given a time series, T and a desired subsequence length, m
We can use sliding window of length m to extract all subsequences of length m |T|-m+1 …
4
Motivation Given a time series, T and a desired subsequence length, m
We can then compute the pairwise distance among these subsequences and store them to a matrix 7.6952 7.7399 … 7.7106 |T|-m+1 … 4
5
Motivation Given a time series, T and a desired subsequence length, m
We can visualize the matrix by plot the matrix as a image where blue is more similar subsequences and red is more dissimilar subsequences |T|-m+1 … 5
6
Motivation Given a time series, T and a desired subsequence length, m
Different information about the time series can be extract from the matrix (e.g., pair with smallest distance is the motif, checkerboard kernel can be used to segment the time series [a]) |T|-m+1 … 6 [a] J Foote, Automatic Audio Segmentation Using A Measure of Audio Novelty
7
Motivation Given a time series, T and a desired subsequence length, m
However, since the space complexity is O( T −m+1 2 ), which is quadratic in T , the corresponding matrix for a time series with 1 million doubles requires 4,000 GB to store |T|-m+1 … 7
8
Problem statement Given a time series, T and a desired subsequence length, m m Most part of the matrix contains information that is repeated or irrelevant for tasks in time series data mining What kind of information should we extract from the matrix? How do we compactly storing it? |T|-m+1
9
What kind of information should be extracted from the matrix?
set of all subsequences set of corresponding nearest neighbor ⁞ ⁞ Who is each subsequence’s nearest neighbor |T|-m+1
10
What kind of information should be extracted from the matrix?
set of all subsequences set of corresponding nearest neighbor 2.88 0.90 1.61 6.04 5.69 1.23 1.40 ⁞ ⁞ Who is each subsequence’s nearest neighbor How far away from their corresponding nearest neighbor |T|-m+1
11
How to store the information? (1 of 2)
The distance to the corresponding nearest neighbor of each subsequence can be stored in a vector called matrix profile time series, T matrix profile, P
12
How to store the information? (1 of 2)
The distance to the corresponding nearest neighbor of each subsequence can be stored in a vector called matrix profile Ti time series, T matrix profile, P m The matrix profile value at this location i is the distance between Ti and its nearest neighbor
13
How to store the information? (1 of 2)
The distance to the corresponding nearest neighbor of each subsequence can be stored in a vector called matrix profile time series, T matrix profile, P Local minimums are corresponding to motifs
14
How to store the information? (2 of 2)
The index of corresponding nearest neighbor of each subsequence is also stored in a vector called matrix profile index Ti time series, T matrix profile index, I m … 192 193 194 195 196 The matrix profile index value at this location i is the index of Ti‘s nearest neighbor
15
How to store the information? (2 of 2)
The index of corresponding nearest neighbor of each subsequence is also stored in a vector called matrix profile index Ti T194 time series, T matrix profile index, I m … 192 193 194 195 196 It turns out that Ti‘s nearest neighbor is T194
16
How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m m inf Matrix profile is initialized as inf vector This is just a toy example, so the values and the vector length does not fit the time series shown above
17
How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Ti m inf At the first iteration, a subsequence T i is randomly selected from T
18
How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Ti m inf We compute the distances between T i and every subsequences from T (time complexity = O(|T|log(|T|))) We them put the distances in a vector based on the position of the subsequences 3 2 5 4 1 9 8 6 The distance between T i and T 1 (first subsequence) is 3
19
How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Ti m inf We compute the distances between T i and every subsequences from T (time complexity = O(|T|log(|T|))) We them put the distances in a vector based on the position of the subsequences 3 2 5 4 1 9 8 6 Let say T i happen to be the third subsequences, therefore the third value in the distance vector is 0
20
How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Ti m inf Matrix profile is updated by apply elementwise minimum to these two vectors min 3 2 5 4 1 9 8 6
21
How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Ti m 3 inf Matrix profile is updated by apply elementwise minimum to these two vectors min 3 2 5 4 1 9 8 6
22
How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Ti m 3 2 inf Matrix profile is updated by apply elementwise minimum to these two vectors min 3 2 5 4 1 9 8 6 Because T i is the third subsequences and the distance between oneself is unimportant, we skip it
23
How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Ti m 3 2 inf 5 4 1 9 8 6 After we finish update matrix profile for the first iteration 3 2 5 4 1 9 8 6
24
How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Tj m 3 2 inf 5 4 1 9 8 6 In the second iteration, we randomly select another subsequence Tj and it happens to be the 12th subsequences
25
How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Tj m 3 2 inf 5 4 1 9 8 6 Once again, we compute the distance between Tj and every subsequences of T 2 3 1 4 6 5 8 9
26
How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Tj m 3 2 inf 5 4 1 9 8 6 min The same elementwise minimum 2 3 1 4 6 5 8 9
27
How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Tj m 2 inf 5 3 4 1 9 8 6 min The same elementwise minimum 2 3 1 4 6 5 8 9
28
How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Tj m 2 inf 5 3 4 1 9 8 6 min The same elementwise minimum 2 3 1 4 6 5 8 9
29
How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Tj m 2 1 5 3 4 9 8 6 min The same elementwise minimum 2 3 1 4 6 5 8 9
30
How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Tj m 2 1 5 3 4 9 8 6 min 2 3 1 4 6 5 8 9 We repeat the two steps (distance computation and update) until we have used every subsequences
31
How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Tj m 2 1 5 3 4 9 8 6 min 2 3 1 4 6 5 8 9 There are |T| subsequences and the distance computation is O(|T|log(|T|)) The overall time complexity is O(|T|2log(|T|))
32
How to compute matrix profile? (2 of 2)
It is exact It is simple and parameter-free It is fast O(n2 log n) or O(n2) if not anytime It is space efficient O(n) It allows anytime algorithms (the quality of matrix profile improves over iteration) It can leverage hardware (embarrassingly parallelizable) It is incrementally maintainable
33
Motif mining with matrix profile: repeated earthquakes
Seismologists are interested in finding repeated earthquakes in long sequence of seismometer readings as the repeated earthquakes could be years apart Excerpt of a seismic time series from Mammoth Lakes, California
34
Motif mining with matrix profile: repeated earthquakes
Seismologists are interested in finding repeated earthquakes in long sequence of seismometer readings as the repeated earthquakes could be years apart
35
Discord mining with matrix profile: abnormal heartbeat detection
Identifying abnormal heartbeat from ECG reading is important for diagnosis patients Maximum value in matrix profile indicates discord
36
Discord mining with matrix profile: abnormal heartbeat detection
Identifying abnormal heartbeat from ECG reading is important for diagnosis patients Premature ventricular contraction
37
Common subsequence mining with matrix profile: music sampling detection (1 of 2)
Sometimes a musician “borrows” section of other music to generate new music The borrowed section is the common subsequence in both music and can be discovered using matrix profile Given two time series T a and T b , matrix profile can be compute by either finding the nearest neighbor of T a ’s subsequences in T b or vise versa
38
Common subsequence mining with matrix profile: music sampling detection (2 of 2)
39
Common subsequence mining with matrix profile: music sampling detection (2 of 2)
Most similar 5 secs subsequences
40
Common subsequence mining with matrix profile: music sampling detection (2 of 2)
Music 1: Under Pressure by Queen-David Bowie Music 2: Ice Ice Baby by Vanilla Ice Most similar 5 secs subsequences Under Pressure Ice Ice Baby
41
Segmentation with matrix profile: human activity (1 of 2)
Matrix profile and index can also be used for segmentation Given a time series, by examining its matrix profile index, the neighboring information can be extracted, and similar pattern will be neighbors The neighboring information can be visualized with arcs walkwalkwalkwalkwalkwalkwalkwalkrunrunrunrunrunrunrunrunrun
42
Segmentation with matrix profile: human activity (1 of 2)
Matrix profile and index can also be used for segmentation Given a time series, by examining its matrix profile index, the neighboring information can be extracted, and similar pattern will be neighbors The neighboring information can be visualized with arcs walkwalkwalkwalkwalkwalkwalkwalkrunrunrunrunrunrunrunrunrun 1 arc 3 arc 2 arc
43
Segmentation with matrix profile: human activity (1 of 2)
Matrix profile and index can also be used for segmentation Given a time series, by examining its matrix profile index, the neighboring information can be extracted, and similar pattern will be neighbors The neighboring information can be visualized with arcs walkwalkwalkwalkwalkwalkwalkwalkrunrunrunrunrunrunrunrunrun Arc counting curve
44
Segmentation with matrix profile: human activity (1 of 2)
Matrix profile and index can also be used for segmentation Given a time series, by examining its matrix profile index, the neighboring information can be extracted, and similar pattern will be neighbors The neighboring information can be visualized with arcs walkwalkwalkwalkwalkwalkwalkwalkrunrunrunrunrunrunrunrunrun Local minimum is a good split point Arc counting curve
45
Segmentation with matrix profile: human activity (2 of 2)
CMU Motion Capture database Local minimum is found with MATLAB’s peak finding function 5,000 10,000 Matrix Profile Arc Counting Curve One-dimension of multi-d time series: Subject 86, recording 4, dimension 30 W S P N W C T D P W Ground Truth
46
Conclusion http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Given a time series, T and a desired subsequence length, m m pairwise distance matrix Important (neighboring) information are extracted from O( T 2 ) pairwise distance matrix and stored in O( T ) matrix profile and matrix profile index matrix profile matrix profile index Tasks such as motif mining, discord mining, or segmentation can be easily performed with matrix profile and matrix profile index
47
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.