FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space Department of Computer Sciences Florida Institute of Technology Stan Salvador and Philip Chan
Outline Dynamic Time Warping (DTW) Problem Statement Related Work for Speeding up DTW FastDTW Algorithm Evaluation of FastDTW Contributions Limitations and Future Work
Dynamic Time Warping (DTW) Aligns two time series by warping the time dimension Warping - expanding/contracting the time dimension
The Dynamic Time Warping Algorithm A dynamic programming approach Solutions to slightly smaller problems used to find larger solutions
The DTW Cost Matrix
Distance of Min-Cost Warp Path
Finding Min-Cost Warp Path
Advantages of DTW DTW is optimal An intuitive distance measurement Local variation in the time axis is common Handwriting Speech “Events” that start after varying delays
Disadvantages of DTW O(N2) time and space complexity Only practical for small data sets (<3,000) Time series are often very long Data mining requires a scalable DTW algorithm
Problem Statement We desire an efficient Dynamic Time Warping algorithm Linear time complexity Linear space complexity Warp path is needed in addition to warp distance Warp path must be nearly optimal
Does DTW Need to be Faster? “Myth 3: There is a need (and room) for improvements in the speed of DTW for data mining applications.” (Keogh today-9:45am) Keogh: many time series FastDTW: Long time series
Existing Methods to Speed Up DTW Constraints – only fill in part of the cost matrix Abstraction – sample the data before time warping
Constraints Sakoe-Chiba Band (Sakoe & Chiba 1978) Itakura Parallelogram (Itakura 1975) Still O(N2) if the window width is a function of input size (linear if the width is constant) Assumes a near-optimal warp path stays near the i=j axis Accuracy depends on the domain
(Keogh & Pazzani 2000), (Chu et al. 2002) Abstraction (Keogh & Pazzani 2000), (Chu et al. 2002) O(N) if N pts are sampled down to ≤ Assumptions Sampling preserves time series structure Small deviations from the optimal path cause little increase in warp-path distance
Our FastDTW Algorithm A multi-resolution approach inspired by a multi-level graph bisection algorithm (Karypis 1997) 3 key operations Coarsening – reduce the resolution of a time series Projection – use a low-res warp path as an initial solution at a higher resolution Refinement – Refine a projected warp path locally adjusting the path
Sample Run of FastDTW
FastDTW Algorithm Set the resolution to be the coarsest Find the initial path using regular DTW Repeat Double the resolution Project the path onto the finer resolution Find a path through the projected area (plus a small radius around the projected area) Until the original resolution is reached
Complexity O(N) time O(N) space Details in the paper
Evaluation Criteria Accuracy Efficiency The error of an approximate Time Warping algorithm: % error = where: approxDist – the warp path distance of the approximate algorithm optimalDist – the warp path distance of the DTW algorithm Efficiency Runtime (measured in seconds)
Evaluation Procedure (Accuracy) Data Sets – UCR Time Series Data Mining Archive (Keogh & Folias 2002), 3 groups used: Random – 45 unrelated time series (earthquakes, random walk, eeg, speech, etc.) Trace – 200 time series simulating nuclear power plant failure (4 classes) Gun – 200 time series of a gun being drawn and pointed (2 classes) Procedure Run FastDTW, Constraints (Sakoe-Chiba Band), and Data Abstraction on all pairs within a data set group, also vary the radius Record the average error of all three methods for a group of data and a radius
Average % Error (Accuracy) Radius 1 10 20 30 FastDTW 19.2% 8.6% 1.5% 0.8% 0.6% Abstraction 983.3% 547.9% 6.5% 2.8% 1.8% Band 2749.2% 2385.7% 794.1% 136.8% 9.3%
Error in Different Data Sets
Evaluation Procedure (Execution-time) Data Sets Synthetic sine waves with Gaussian noise 10 to 180,000 data points Procedure Run FastDTW and DTW on each data set, vary the radius for FastDTW Compare the Execution times
Execution Time
Summary of Contributions FastDTW – an approximation of DTW O(N) time and space complexity Scales well to long time series Accurate, 8.6% error if radius=1, 0.8% error if radius=20
Limitations and Future Work FastDTW does not always find an optimal solution Future Work Examine using different step sizes between resolutions Investigate search algorithms to help improve refinement Examine # of cells evaluated vs. accuracy between the FastDTW, Abstraction, and Band algorithms.
Questions? Thanks to those who helped with this research: Matt Mahoney (Florida Institute of Technology), Brian Buckley, Walter Schiefele (Interface & Control Systems) This research is partially supported by NASA
FastDTW Pseudocode Input: X, Y, radius Output: 1) A minimum distance warp path between X and Y 2) The warped path distance between X and Y 1| // The min size of the coarsest resolution. 2| Integer minTSsize = radius+2 3| 4| IF (|X|≤ minTSsize OR |Y|≤ minTSsize) 5| { 6| // Base Case: for a very small time series run the full DTW algorithm 7| RETURN DTW(X, Y) 8| } 9| ELSE 10| { 11| // Recursive Case: Project the warp path from a coarser resolution onto the current current resolution. 12| // Run DTW only along the projected path (and also radius cells from the projected path). 13| TimeSeries shrunkX = X.reduceByHalf() // Coarsening 14| TimeSeries shrunkY = Y.reduceByHalf() // Coarsening 15| 16| WarpPath lowResPath = FastDTW(shrunkX, shrunkY, radius) 17| 18| SearchWindow window = ExpandedResWindow(lowResPath, X, Y, radius) // Projection 19| 20| RETURN DTW(X, Y, window) // Refinement 21| }