University of Macau, Macau

Slides:



Advertisements
Similar presentations
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Advertisements

Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Albert Gatt Corpora and Statistical Methods Lecture 13.
Nearest Neighbor Queries using R-trees
Yasuhiro Fujiwara (NTT Cyber Space Labs)
Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.
Branch and Bound Optimization In an exhaustive search, all possible trees in a search space are generated for comparison At each node, if the tree is optimal.
Avrilia Floratou, Sandeep Tata, and Jignesh M. Patel ICDE 2010 Efficient and Accurate Discovery of Patterns in Sequence Datasets.
Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han University of.
1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Indexing Network Voronoi Diagrams*
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Liang Jin (UC Irvine) Nick Koudas (AT&T) Chen Li (UC Irvine)
On Discovering Moving Clusters in Spatio-temporal Data Panos Kalnis National University of Singapore Nikos Mamoulis University of Hong Kong Spiridon Bakiras.
Probabilistic Similarity Search for Uncertain Time Series Presented by CAO Chen 21 st Feb, 2011.
Jessica Lin, Eamonn Keogh, Stefano Loardi
Reza Sherkat ICDE061 Reza Sherkat and Davood Rafiei Department of Computing Science University of Alberta Canada Efficiently Evaluating Order Preserving.
Finding Time Series Motifs on Disk-Resident Data
CPSC 322, Lecture 8Slide 1 Heuristic Search: BestFS and A * Computer Science cpsc322, Lecture 8 (Textbook Chpt 3.5) January, 21, 2009.
Detecting Time Series Motifs Under
6/29/20151 Efficient Algorithms for Motif Search Sudha Balla Sanguthevar Rajasekaran University of Connecticut.
Graphs & Graph Algorithms Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Scalable Text Mining with Sparse Generative Models
A Multiresolution Symbolic Representation of Time Series
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Efficient Parallel Set-Similarity Joins Using Hadoop Chen Li Joint work with Michael Carey and Rares Vernica.
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
Experiments An Efficient Trie-based Method for Approximate Entity Extraction with Edit-Distance Constraints Entity Extraction A Document An Efficient Filter.
PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.
Mining High Utility Itemset in Big Data
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
Experiments Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction Entity Extraction A Document An Efficient Filter.
Abdullah Mueen Eamonn Keogh University of California, Riverside.
Learning from observations
Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.
1 Presented by: Yuchen Bian MRWC: Clustering based on Multiple Random Walks Chain.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Chao-Yeh Chen and Kristen Grauman University of Texas at Austin Efficient Activity Detection with Max- Subgraph Search.
ICDE, San Jose, CA, 2002 Discovering Similar Multidimensional Trajectories Michail VlachosGeorge KolliosDimitrios Gunopulos UC RiversideBoston UniversityUC.
Challenges in Mining Large Image Datasets Jelena Tešić, B.S. Manjunath University of California, Santa Barbara
9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern.
Exact indexing of Dynamic Time Warping
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
University of Macau Discovering Longest-lasting Correlation in Sequence Databases Yuhong Li Department of Computer and Information Science.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
Page 1 A Platform for Scalable One-pass Analytics using MapReduce Boduo Li, E. Mazur, Y. Diao, A. McGregor, P. Shenoy SIGMOD 2011 IDS Fall Seminar 2011.
Time Series Sequence Matching Jiaqin Wang CMPS 565.
1 Complex Spatio-Temporal Pattern Queries Cahide Sen University of Minnesota.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Multi-dimensional Range Query Processing on the GPU Beomseok Nam Date Intensive Computing Lab School of Electrical and Computer Engineering Ulsan National.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Rethinking Choices for Multi-dimensional Point Indexing You Jung Kim and Jignesh M. Patel University of Michigan.
ITree: Exploring Time-Varying Data using Indexable Tree Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Brief Intro to Machine Learning CS539
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
RE-Tree: An Efficient Index Structure for Regular Expressions
Enumeration of Time Series Motifs of All Lengths
Time Series Filtering Time Series
Department of Computer Science University of York
Metaheuristic methods and their applications. Optimization Problems Strategies for Solving NP-hard Optimization Problems What is a Metaheuristic Method?
Time Series Filtering Time Series
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Presentation transcript:

University of Macau, Macau Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery Yuhong Li yb27407@umac.mo Department of Computer and Information Science University of Macau, Macau

Quick-Motif: What is Motif ? Most similar subsequence pair in a Time Series Applications A core subroutine for activity discovery, e.g., elder care, surveillance and sports training. Clustering enumerated motifs is more meaningful than clustering all the subsequences in a long time series.

Quick-Motif: Formal Definition time series subsequence s 𝑖 time series 𝑠 𝑖 𝑖+ℓ−1 𝑚−1 Timeline Exact Motif Discovery Input: time series 𝑠 and target motif length ℓ Output: most similar subsequence pair in terms of normalized Euclidean distance. Avoid trivial match  Non-overlapping Adjacent subsequence pairs are expected to similar to each other naturally.

Quick-Motif: Naïve Solution Sliding window size = ℓ, Step size = 1 Subsequences of length ℓ Subsequences of length ℓ Test all subsequence pairs normalize … Motif  most similar subsequence pair … … Time complexity is O( 𝑚 2 ℓ).

Quick-Motif: Existing Solutions Reference-based Index (MK) [Mueen & Keogh, SDM 2009] Good: Prune unpromising pairs by batches. Bad: 𝑂(ℓ) time distance computations. Smart Brute Force (SBF) [Mueen, ICDM 2013] Good: 𝑂(1) time distance computations. Bad: examine all subsequence pairs. … … ? 𝑂(ℓ) 𝑂(1)

Quick-Motif: Fast Distance Computation Incremental distance computation. 𝑠 0 𝑠 20 …… 𝑠 1 𝑠 21 𝑠 2 𝑠 22 𝑠 23 𝑠 3 𝑠 4 … 𝑠 24 𝑠 0 𝑠 1 𝑠 2 𝑠 3 𝑠 4 𝑠 20 𝑠 21 𝑠 22 9 subsequence pairs  𝑂 ℓ 16 subsequence pairs  𝑂(1) 𝑠 23 𝑠 24

Quick-Motif: Pruning of Subsequence Pairs Group every w consecutive subsequences as a PAA MBR. 𝑤 = 5 𝑓 2 𝑀 3 5 𝑀 1 5 minDist 𝑀 2 5 PAA feature space 𝑓 1 Minimum distance between two PAA MBRs  Distance LBs. If distance LB is smaller than 𝑏𝑠𝑓  Further refinement.

Quick-Motif: Filter-and-Refinement Naïve Solution. Check the distance LBs for all 𝑤-MBR pairs. The time complexity is 𝑂( (𝑚/𝑤) 2 𝜙) , 𝜙 is the PAA dimensionality. How to Efficiently Find Surviving 𝑤-MBR Pairs? Enable batch pruning. Discover the true motif as soon as possible to improve the pruning ability.

Quick-Motif: Filter-and-Refinement Enable Batch Pruning  Hierarchical Structure Offer reasonable grouping quality, thus good pruning ability. Can be constructed very efficiently. 𝑓 2 𝑀 8 𝑤 𝑀 1 𝑤 Level 2 𝑀 3 𝑤 𝑀 𝑟𝑜𝑜𝑡 𝑀 6 𝑤 Level 1 𝑀 5 𝑤 𝑀 0 𝑤 𝑀 𝑎 𝑀 𝑏 𝑀 𝑐 𝑀 7 𝑤 minDist 𝑀 4 𝑤 𝑀 2 𝑤 𝑀 4 𝑤 𝑀 6 𝑤 𝑀 0 𝑤 𝑀 2 𝑤 𝑀 7 𝑤 𝑀 5 𝑤 𝑀 3 𝑤 𝑀 1 𝑤 𝑀 8 𝑤 PAA feature space 𝑓 1 Hilbert curve sort list

Quick-Motif: Filter-and-Refinement Discover true motif as soon as possible  Locality-based Search Strategy Level 2 𝑀 𝑟𝑜𝑜𝑡 Bad locality Level 1 𝑀 𝑎 𝑀 𝑏 𝑀 𝑐 Hilbert curve sort list Leaf nodes Good locality 𝑀 4 𝑤 𝑀 6 𝑤 𝑀 0 𝑤 𝑀 2 𝑤 𝑀 7 𝑤 𝑀 5 𝑤 𝑀 3 𝑤 𝑀 1 𝑤 𝑀 8 𝑤 Locality-based search vs Best-first search Locality-based Best-first Surviving pairs 0.1256M 0.1249M Heap size N/A 2.78M # pushes 11.73 M (queue) 6.75 M (heap) Resp. time 1.56 s 6.32 s

Quick-Motif: Experimental Evaluation Programming Language: C++ Machine: Ubuntu 12.04, 4GB RAM Datasets RW: Random generate. EEG: Reflect the activity of neurons, length 180204. ECG: The Koski ECG. Length 144002. EPG: Sequence that traces insect behaviour, length 106950 TAO: Sea surface temperatures, length 374071.

Quick-Motif: Performance Evaluation (a), Effect of ℓ on ECG (b), Effect of ℓ on EEG (c), Effect of ℓ on EPG (d), Effect of ℓ on TAO

Thanks Q A input hidden output