Download presentation
1
Exact indexing of Dynamic Time Warping
Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Presented By: Ankit Hirdesh Piyush Goswami
2
INTRODUCTION Time Series
collection of observations made sequentially in time Occur in Medical, business, scientific domain Finding out similarities between two time series is required in many time series data mining applications
3
CHALLENGES How do we define similarity ?
Need a method that allows elastic shifting of time axis to accommodate sequences that are similar but can be out of phase Large Amount of data How do we search quickly ?
4
SOLUTIONS Euclidean distance Dynamic Time Warping Aligned one to one
Cannot find similarity b/w out of phase signals Dynamic Time Warping Can be non-linearly aligned
5
WHAT IS TIME WARPING Q C Warping Path
6
DYNAMIC TIME WARPING (i,j) = d(qi,cj) + min{ (i-1,j-1) , (i-1,j ) , (i,j-1) } Three Basic Constraints of Time Warping Path should include beginning and ending Path should not have any jumps Path cannot go back in time
7
Global Constraints for Speedy Calculations
Limit the warping path wk = (i,j)k close to diagonal i.e. j-r i j+r where r is the “reach” Speed up the calculations – from O(n2) to O (n) Prevent pathological warpings Warping Window
8
Lower Bounding Both Euclidean and DTW metric highly demanding in terms of CPU and I/O time A lower bounding function can also speed up the similarity search by erasing sequences that could not possibly be a best match Must be fast Must be tightly bound
9
Existing Lower Bounding Techniques
Lower bounding measure by Kim et al. The maximum squared difference between the two sequences first (A), last (D), minimum (B) and maximum points (C) is returned as the lower bound. Lower bounding measure by Yi et al. The sum of the squared length of gray lines is returned as the lower bounding measure
10
Proposed Lower Bounding Method
Let us define two sequences: where r is the reach, U and L stand for Upper and Lower respectively. Also: A : Bounding Envelope for Sakoe – Chiba Band B: Bounding envelope for Ikatura parallelogram
11
Proposed Lower Bounding Method – LB_KEOGH
The query sequence Q is enclosed in the bounding envelope of U and L. The squared sum of the distances from every part of the candidate sequence C not falling within the bounding envelope, to the nearest orthogonal edge of the bounding envelope is returned as the lower bound. A and B mean same as previous slide. LB_KEOGH (Q,C) DTW (Q,C)
12
The tightness of the lower bound for each technique is proportional to the length of gray lines
LB_Kim LB_Yi LB_Keogh Sakoe-Chiba LB_Keogh Itakura
13
How to index Dynamic Time Warping
Piecewise Aggregate Approximation (PAA) Represent time series as sequence of box basis functions Reduce dimensionality from n to N, as time series may include large number of items, degrading performance of indexing Data divided into N equal sized frames Extremely fast to calculate
14
PAA continued PAA of U and L, denoted by Û and Ĺ .
15
Indexing Dynamic Time Warping
There are two time series data sets (Q and C) in length n, both are being divided into N dimension. C is a candidate sequence Q is a query sequence. Approximate the minimum bounding rectangle (R) in each dimension of candidate sequence C MINDIST (Q,R) = h1 h2 hi l1 l2 li MBR R = (L,H) L = {l1, l2, …, lN} H = {h1, h2, …, hN} MINDIST(Q,R)
16
K-Nearest neighbor search algorithm
Given query sequence Q and desired number of K time series neighbors from a set C Priority queue is used for storing the index in an increasing order of distance from Q Push root node of index into Q At each step Pop from top of queue If popped item is PAA point C, compute exact DTW(Q,C) and insert into temporary list ‘temp’ If index node, compute distance of each children from Q and push them into queue Move C from temp to result only when we are sure that it is one of K-NN of Q
17
Experimental Evaluation
Most comprehensive and detailed set of time series indexing experiments ever conducted Sakoe – Chiba Band with 10% width was used 32 datasets from various sources were taken. 50 sequences of length 256 were randomly extracted. Tightness of lower bound functions was compared by taking one sequence at a time and comparing with 49 others
18
Experimental Evaluation Contd..
Pruning power of the lower bounding functions was also compared similarly LB_Keogh was also evaluated against Linear Scan on the basis of Normalized CPU Cost
19
Conclusion This paper provides a way to speed up DTW by indexing
DTW allows us to do similarity matching between sequences which are out of phase. Euclidean space does not give us that privilege A new Lower Bounding function was proposed: LB_Keogh, which is superior than the ones seen previously Method to index time series using the proposed lower bounding function was showed
20
References Eamonn J. Keogh: Exact Indexing of Dynamic Time Warping. VLDB 2002: Slides for the above paper by same author (All colored pictures in the presentation are from the author’s slides) Slides from following class web page:
21
QUESTIONS?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.