University of Macau, Macau Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery Yuhong Li yb27407@umac.mo Department of Computer and Information Science University of Macau, Macau
Quick-Motif: What is Motif ? Most similar subsequence pair in a Time Series Applications A core subroutine for activity discovery, e.g., elder care, surveillance and sports training. Clustering enumerated motifs is more meaningful than clustering all the subsequences in a long time series.
Quick-Motif: Formal Definition time series subsequence s 𝑖 time series 𝑠 𝑖 𝑖+ℓ−1 𝑚−1 Timeline Exact Motif Discovery Input: time series 𝑠 and target motif length ℓ Output: most similar subsequence pair in terms of normalized Euclidean distance. Avoid trivial match Non-overlapping Adjacent subsequence pairs are expected to similar to each other naturally.
Quick-Motif: Naïve Solution Sliding window size = ℓ, Step size = 1 Subsequences of length ℓ Subsequences of length ℓ Test all subsequence pairs normalize … Motif most similar subsequence pair … … Time complexity is O( 𝑚 2 ℓ).
Quick-Motif: Existing Solutions Reference-based Index (MK) [Mueen & Keogh, SDM 2009] Good: Prune unpromising pairs by batches. Bad: 𝑂(ℓ) time distance computations. Smart Brute Force (SBF) [Mueen, ICDM 2013] Good: 𝑂(1) time distance computations. Bad: examine all subsequence pairs. … … ? 𝑂(ℓ) 𝑂(1)
Quick-Motif: Fast Distance Computation Incremental distance computation. 𝑠 0 𝑠 20 …… 𝑠 1 𝑠 21 𝑠 2 𝑠 22 𝑠 23 𝑠 3 𝑠 4 … 𝑠 24 𝑠 0 𝑠 1 𝑠 2 𝑠 3 𝑠 4 𝑠 20 𝑠 21 𝑠 22 9 subsequence pairs 𝑂 ℓ 16 subsequence pairs 𝑂(1) 𝑠 23 𝑠 24
Quick-Motif: Pruning of Subsequence Pairs Group every w consecutive subsequences as a PAA MBR. 𝑤 = 5 𝑓 2 𝑀 3 5 𝑀 1 5 minDist 𝑀 2 5 PAA feature space 𝑓 1 Minimum distance between two PAA MBRs Distance LBs. If distance LB is smaller than 𝑏𝑠𝑓 Further refinement.
Quick-Motif: Filter-and-Refinement Naïve Solution. Check the distance LBs for all 𝑤-MBR pairs. The time complexity is 𝑂( (𝑚/𝑤) 2 𝜙) , 𝜙 is the PAA dimensionality. How to Efficiently Find Surviving 𝑤-MBR Pairs? Enable batch pruning. Discover the true motif as soon as possible to improve the pruning ability.
Quick-Motif: Filter-and-Refinement Enable Batch Pruning Hierarchical Structure Offer reasonable grouping quality, thus good pruning ability. Can be constructed very efficiently. 𝑓 2 𝑀 8 𝑤 𝑀 1 𝑤 Level 2 𝑀 3 𝑤 𝑀 𝑟𝑜𝑜𝑡 𝑀 6 𝑤 Level 1 𝑀 5 𝑤 𝑀 0 𝑤 𝑀 𝑎 𝑀 𝑏 𝑀 𝑐 𝑀 7 𝑤 minDist 𝑀 4 𝑤 𝑀 2 𝑤 𝑀 4 𝑤 𝑀 6 𝑤 𝑀 0 𝑤 𝑀 2 𝑤 𝑀 7 𝑤 𝑀 5 𝑤 𝑀 3 𝑤 𝑀 1 𝑤 𝑀 8 𝑤 PAA feature space 𝑓 1 Hilbert curve sort list
Quick-Motif: Filter-and-Refinement Discover true motif as soon as possible Locality-based Search Strategy Level 2 𝑀 𝑟𝑜𝑜𝑡 Bad locality Level 1 𝑀 𝑎 𝑀 𝑏 𝑀 𝑐 Hilbert curve sort list Leaf nodes Good locality 𝑀 4 𝑤 𝑀 6 𝑤 𝑀 0 𝑤 𝑀 2 𝑤 𝑀 7 𝑤 𝑀 5 𝑤 𝑀 3 𝑤 𝑀 1 𝑤 𝑀 8 𝑤 Locality-based search vs Best-first search Locality-based Best-first Surviving pairs 0.1256M 0.1249M Heap size N/A 2.78M # pushes 11.73 M (queue) 6.75 M (heap) Resp. time 1.56 s 6.32 s
Quick-Motif: Experimental Evaluation Programming Language: C++ Machine: Ubuntu 12.04, 4GB RAM Datasets RW: Random generate. EEG: Reflect the activity of neurons, length 180204. ECG: The Koski ECG. Length 144002. EPG: Sequence that traces insect behaviour, length 106950 TAO: Sea surface temperatures, length 374071.
Quick-Motif: Performance Evaluation (a), Effect of ℓ on ECG (b), Effect of ℓ on EEG (c), Effect of ℓ on EPG (d), Effect of ℓ on TAO
Thanks Q A input hidden output