Enumeration of Time Series Motifs of All Lengths

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch.
Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.
Word Spotting DTW.
Doruk Sart, Abdullah Mueen, Walid Najjar, Eamonn Keogh, Vit Niennatrakul 1.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
Introduction to Bioinformatics
74 th EAGE Conference & Exhibition incorporating SPE EUROPEC 2012 Automated seismic-to-well ties? Roberto H. Herrera and Mirko van der Baan University.
Two Technique Papers on High Dimensionality Allan Rempel December 5, 2005.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Mutual Information Mathematical Biology Seminar
Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.
Efficient Similarity Search in Sequence Databases Rakesh Agrawal, Christos Faloutsos and Arun Swami Leila Kaghazian.
A Study of Approaches for Object Recognition
Jierui Xie, Boleslaw Szymanski, Mohammed J. Zaki Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA {xiej2, szymansk,
Localized Key-Finding: Algorithms and Applications Ilya Shmulevich, Olli Yli-Harja Tampere University of Technology Tampere, Finland October 4, 1999.
Efficient Query Filtering for Streaming Time Series
Detecting Image Region Duplication Using SIFT Features March 16, ICASSP 2010 Dallas, TX Xunyu Pan and Siwei Lyu Computer Science Department University.
Reza Sherkat ICDE061 Reza Sherkat and Davood Rafiei Department of Computing Science University of Alberta Canada Efficiently Evaluating Order Preserving.
1. 2 General problem Retrieval of time-series similar to a given pattern.
Based on Slides by D. Gunopulos (UCR)
An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.
Finding Time Series Motifs on Disk-Resident Data
Detecting Time Series Motifs Under
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Programming Collective Intelligence by Toby.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Correlation Association between 2 variables 1 2 Suppose we wished to graph the relationship between foot length Height
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Data Extraction using Image Similarity CIS 601 Image Processing Ajay Kumar Yadav.
Abdullah Mueen Eamonn Keogh University of California, Riverside.
University of Macau, Macau
Discovering Deformable Motifs in Time Series Data Jin Chen CSE Fall 1.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Describing Relationships Using Correlations. 2 More Statistical Notation Correlational analysis requires scores from two variables. X stands for the scores.
k-Shape: Efficient and Accurate Clustering of Time Series
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Exact indexing of Dynamic Time Warping
Linear Correlation. PSYC 6130, PROF. J. ELDER 2 Perfect Correlation 2 variables x and y are perfectly correlated if they are related by an affine transform.
Clustering of Uncertain data objects by Voronoi- diagram-based approach Speaker: Chan Kai Fong, Paul Dept of CS, HKU.
Sparse Superpixel Unmixing of CRISM Hyperspectral Images 1 NASA / Caltech / JPL / Instrument Software and Science Data Systems Images courtesy NASA / Caltech.
CS654: Digital Image Analysis
Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.
Geometry Synthesis Ares Lagae Olivier Dumont Philip Dutré Department of Computer Science Katholieke Universiteit Leuven 10 August, 2004.
A Sampling-based Estimator for Top-k Selection Query Chung-Min ChenYibei Ling ICDE 2002 Presented by Kan Kin Fai.
Correlation and Regression Q560: Experimental Methods in Cognitive Science Lecture 13.
Learning Mid-Level Features For Recognition
Supervised Time Series Pattern Discovery through Local Importance
Time Series Filtering Time Series
William Norris Professor and Head, Department of Computer Science
School of Computer Science & Engineering
Robust Similarity Measures for Mobile Object Trajectories
4.2 The Lever How does a lever work?.
William Norris Professor and Head, Department of Computer Science
Department of Computer Science University of York
GPX: Interactive Exploration of Time-series Microarray Data
Consensus Partition Liang Zheng 5.21.
Time Series Filtering Time Series
Deviations between de novo or insertion trees and gold standard trees.
Levers What is the relationship between
Presentation transcript:

Enumeration of Time Series Motifs of All Lengths Abdullah Mueen Department of Computer Science University of new Mexico

Example: Repeating Pattern (Motif)   2000 4000 6000 8000 10000 10 20 30 100 200 300 400 500 600 700 800 Chiu et al. KDD 2003

Motivation: Enumerating Motifs Find the most similar pairs of time series at every lengths. Brown A E X et al. PNAS 2013;110:791-796

Goals: Enumerating Motifs  

Outline 1.Bounding correlation 2.Enumerating motifs of all lengths Intuitive Example Experimental Results Case Study: Activity Recognition 3.Conclusion

Pearson’s Correlation Coefficient  

Correlation Advantage: 1. Scale and Shift invariant 2. Linear scans to compute Disadvantage: 1. Don’t consider warping 2. Is not a metric

Relationship with Euclidean Distance    

Bounding Euclidean Distance   Values Changed 1 2 3 4 5 6 7 8 9 10 Without Normalization 1 2 3 4 5 -4 -3 -2 -1 With Normalization

Intuition Normalized Append 10 and re-normalize 2 3 4 5 -2 -1.5 -1 -0.5 0.5 1.5 1 2 3 4 5 -2 -1.5 -1 -0.5 0.5 1.5 1 2 3 4 5 -2 -1.5 -1 -0.5 0.5 1.5 Length 5 Length 4 Length 5

Bounding Euclidean Distance    

Bounding Euclidean Distance 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 105 10 15 20 25 30 35 2.1 2.2 2.3 2.4 2.6 2.7 2.8 2.9 20.5 21 21.5 22 22.5 23 23.5 24 24.5 Pairs in ascending order of distances Normalized Distance

Outline 1.Bounding correlation 2.Enumerating motifs of all lengths Intuitive Example Experimental Results Case Study: Activity Recognition 3.Conclusion

Intuition 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 -8.5 -8 -7.5 -7 x 103 145, 5410, 1.26 8345, 4211, 2.63 1655, 9461, 2.96 6531, 2501, 3.17 851, 1440, 3.73 2512, 3110, 3.98 1685, 9260, 4.57   145, 5410, 1.79 8345, 4211, 1.63 1655, 9461, 3.61 6531, 2501, 2.71 851, 1440, 3.83 2512, 3110, 4.18 1685, 9260, 4.27   8345, 4211, 1.63 145, 5410, 1.79 6531, 2501, 2.71 1655, 9461, 3.61 851, 1440, 3.83 2512, 3110, 4.18 1685, 9260, 4.27    

Intuition 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 -8.5 -8 -7.5 -7 x 103 8345, 4211, 1.63 145, 5410, 1.79 6531, 2501, 2.71 1655, 9461, 3.61 851, 1440, 3.83 8345, 4211, 1.23 145, 5410, 1.98 6531, 2501, 1.71 1655, 9461, 3.68 851, 1440, 3.61   8345, 4211, 1.23 6531, 2501, 1.71 145, 5410, 1.98 851, 1440, 3.61 1655, 9461, 3.68      

Outline 1.Bounding correlation 2.Enumerating motifs of all lengths Intuitive Example Experimental Results Case Study: Activity Recognition 3.Conclusion

Sanity Check White Noise (1) Length :87 Length :105 Length :299 (2) 1000 2000 3000 4000 5000 6000 -2 2 4 1380 1400 1420 1440 1460 -5 5 Length :87 1320 1340 1360 Length :105 600 700 800 Length :299 2200 2300 2400 (1) (2) (3) (4) White Noise http://www.cs.unm.edu/~mueen/Projects/MOEN/index.html

Experimental Results: Scalability 2 4 6 8 10 12 14 16 x 104 1 3 5 7 x 105 Data Length (n) Execution Time in Seconds Smart Brute Force EEG EOG Random Walk Iterative MK 9 18 Range of Lengths (maxLen-minLen+1) x 102

Outline 1.Bounding correlation 2.Enumerating motifs of all lengths Intuitive Example Experimental Results Case Study: Activity Recognition 3.Conclusion

Activity Recognition A B C E F D 0.5 1 1.5 2 2.5 3 0/2 2/4 1/4 1/2 0/3 x 104 A B C E F D 0.5 1 1.5 2 2.5 3 0/2 2/4 1/4 1/2 0/3 0/4 Hip Hand Arm Leg x y z Step Action A Side steps with no arm movement B Rock steps sideways without arm movement C Rock steps sideways with arm movement D Side steps with arm movement E Side steps with arms up in the air F Standing still with head bopping H. Pohl et al. SMC 2010

Thank You

Backup Slides

Experimental Results 7 Execution Time in Seconds 5 3 1 K c 4 6 8 10 12 14 2 n=10k n=20k n=40k n=80k n=160k x 103 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 3 5 7 c x 102

Sample Output http://www.cs.unm.edu/~mueen/Projects/MOEN/index.html 3960 3980 4000 4020 4040 4060 4080 4100 4120 Length :186 1.634 1.636 1.638 1.64 1.642 1.644 1.646 1.648 1.65 x 104 -5 5 5260 5280 5300 5320 5340 5360 5380 5400 5420 5440 Length :187 9100 9120 9140 9160 9180 9200 9220 9240 9260 3450 3500 3550 3600 3650 Length :255 8800 8850 8900 8950 9000 7050 7100 7150 7200 7250 7300 7350 7400 Length :373 9600 9650 9700 9750 9800 9850 9900 1000 2000 3000 5000 6000 7000 8000 10000 -8.5 -8 -7.5 -7 x 103 http://www.cs.unm.edu/~mueen/Projects/MOEN/index.html

Time Series Join Best Match Lengths x1.5x10-3 Correlation 100 200 300 400 500 600 700 800 0.5 1 1.5 2 Lengths Best Match Correlation Length-adjusted Correlation

Motif Covering Locations of the First Occurrences Covering Motifs 50 100 150 200 250 300 350 400 2000 4000 6000 8000 Length Covering Motifs Locations of the First Occurrences