Analysis of Constrained Time-Series Similarity Measures

Slides:



Advertisements
Similar presentations
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Advertisements

Mining Time Series Data CS240B Notes by Carlo Zaniolo UCLA CS Dept A Tutorial on Indexing and Mining Time Series Data ICDM '01 The 2001 IEEE International.
Fast Algorithms For Hierarchical Range Histogram Constructions
Yasuhiro Fujiwara (NTT Cyber Space Labs)
Relevance Feedback Retrieval of Time Series Data Eamonn J. Keogh & Michael J. Pazzani Prepared By/ Fahad Al-jutaily Supervisor/ Dr. Mourad Ykhlef IS531.
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Patch to the Future: Unsupervised Visual Prediction
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
Energy Characterization and Optimization of Embedded Data Mining Algorithms: A Case Study of the DTW-kNN Framework Huazhong University of Science & Technology,
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Mining Time Series.
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics.
Indexing Time Series. Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
Making Time-series Classification More Accurate Using Learned Constraints © Chotirat “Ann” Ratanamahatana Eamonn Keogh 2004 SIAM International Conference.
Reza Sherkat ICDE061 Reza Sherkat and Davood Rafiei Department of Computing Science University of Alberta Canada Efficiently Evaluating Order Preserving.
Based on Slides by D. Gunopulos (UCR)
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
Using Relevance Feedback in Multimedia Databases
Part II – TIME SERIES ANALYSIS C2 Simple Time Series Methods & Moving Averages © Angel A. Juan & Carles Serrat - UPC 2007/2008.
A Multiresolution Symbolic Representation of Time Series
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Time Series I.
Pattern Matching with Acceleration Data Pramod Vemulapalli.
Exact Indexing of Dynamic Time Warping
Wavelets Series Used to Solve Dynamic Optimization Problems Lizandro S. Santos, Argimiro R. Secchi, Evaristo. C. Biscaia Jr. Programa de Engenharia Química/COPPE,
FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space
Multimedia and Time-series Data
Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.
Qualitative approximation to Dynamic Time Warping similarity between time series data Blaž Strle, Martin Možina, Ivan Bratko Faculty of Computer and Information.
S DTW: COMPUTING DTW DISTANCES USING LOCALLY RELEVANT CONSTRAINTS BASED ON SALIENT FEATURE ALIGNMENTS K. Selçuk Candan Arizona State University Maria Luisa.
Comparing Audio Signals Phase misalignment Deeper peaks and valleys Pitch misalignment Energy misalignment Embedded noise Length of vowels Phoneme variance.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
K. Selçuk Candan, Maria Luisa Sapino Xiaolan Wang, Rosaria Rossini
A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.
Mining Time Series.
The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for.
ICDE, San Jose, CA, 2002 Discovering Similar Multidimensional Trajectories Michail VlachosGeorge KolliosDimitrios Gunopulos UC RiversideBoston UniversityUC.
k-Shape: Efficient and Accurate Clustering of Time Series
Exact indexing of Dynamic Time Warping
Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.
Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering.
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
August 24-29, 2015Bohinj, Slovenia1 FAP - applications in research and education Zoltan Geler, Vladimir Kurbalija, Miloš Radovanović Mirjana Ivanović University.
Cultural Differences and Similarities in Emotion Recognition Vladimir Kurbalija, Mirjana Ivanović, Miloš Radovanović, Zoltan Geler, Dejan Mitrović, Weihui.
NSF Career Award IIS University of California Riverside Eamonn Keogh Efficient Discovery of Previously Unknown Patterns and Relationships.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
VLADIMIR KURBALIJA DEPARTMENT OF MATHEMATICS AND INFORMATICS FACULTY OF SCIENCE UNIVERSITY OF NOVI SAD SERBIA Tutorial on FAP (Framework for Analysis and.
A Time Series Representation Framework Based on Learned Patterns
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Mining and Processing Biomedical Data
Instance Based Learning
Supervised Time Series Pattern Discovery through Local Importance
4.7 TIME ALIGNMENT AND NORMALIZATION
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,
A Time Series Representation Framework Based on Learned Patterns
School of Computer Science & Engineering
Robust Similarity Measures for Mobile Object Trajectories
Time Series Data and Moving Object Trajectory
4.7 TIME ALIGNMENT AND NORMALIZATION
Scale-Space Representation for Matching of 3D Models
Finding Periodic Discrete Events in Noisy Streams
Presentation transcript:

Analysis of Constrained Time-Series Similarity Measures Vladimir Kurbalija, Miloš Radovanović, Zoltan Geler, Mirjana Ivanović Department of Mathematics and Informatics Faculty of Science University of Novi Sad Serbia

Agenda Introduction Related Work Experimental Evaluation Computational Times The Change of 1NN Graph Conclusions and Future Work

Time Series Time-series (TS) consists of sequence of values or events obtained over repeated measurements of time Time-series analysis (TSA) comprises methods that attempt to understand such time series To understand the underlying context of the data points, or to make forecasts

Applications and Task Types stock market analysis, economic and sales forecasting, observation of natural phenomena, scientific and engineering experiments, medical treatments etc. Task Types indexing, classification, clustering, prediction, segmentation, anomaly detection, etc.

Important Concepts Pre-processing transformation, Time-series representation Similarity/distance measure

Pre-processing Transformation “Raw” time series usually contain some distortions The presence of distortions can seriously deteriorate the indexing problem Some of the most common pre-processing tasks are: offset translation, amplitude scaling, removing linear trend, removing noise etc.

Time-series Representation Time series are generally high-dimensional data Many techniques have been proposed: Discrete Fourier Transformation (DFT) Singular Value Decomposition (SVD) Discrete Wavelet Transf. (DWT) Piecewise Aggregate Approximation (PAA) Adaptive Piecewise Constant Approx. (APCA) Symbolic Aggregate approX. (SAX) Indexable Piecewise Linear Approx. (IPLA) Spline Representation etc.

Similarity/distance Measure Similarity-based retrieval is used in all a fore mentioned task types The distance between time series needs to be carefully defined in order to reflect the underlying (dis)similarity (based on shapes and patterns). There is a number of distance measures: Lp distance (Lp) - Eucledian Distance (for p=2) Dynamic Time Warping (DTW) distance based on Longest Common Subsequence (LCS) Edit Distance with Real Penalty (ERP) Edit Distance on Real sequence (EDR) Sequence Weighted Alignment model (Swale) [31], etc.

SimilarityMeasures Many of these similarity measures are based on dynamic programming (DTW, LCS, ERP, EDR...) The computational complexity of dynamic programming algorithms is quadratic The usage of global constraints such as the Sakoe-Chiba band and the Itakura parallelogram can significantly speed up the calculation of similarities The usage of global constraints can improve the accuracy of classification

Our Research Dynamic Time Warping (DTW) and Longest Common Subsequence measure (LCS) the speed-up gained from these constraints the change of the 1-nearest neighbor graph with respect to the change of the constraint size FAP (Framework for Analysis and Prediction) http://perun.pmf.uns.ac.rs/fap/ UCR Time Series Repository http://www.cs.ucr.edu/~eamonn/time_series_data/

Agenda Introduction Related Work Experimental Evaluation Computational Times The Change of 1NN Graph Conclusions and Future Work

Euclidean Metric Most intuitive metric for time series, and as a consequence very commonly used Very fast –computation complexity is linear Very brittle and sensitive to small translations across the time axis

Dynamic Time Warping (DTW) Generalization of Euclidian measure Allows elastic shifting of the time axis where in some points time “warps” Computes the distance by finding an optimal path in the matrix of distances of two time series

Longest Common Subsequence (LCS) Different methodology Similarity between two time series is expressed as a length of the longest common subsequence of both time series

Global Constraints DTW and LCS are based on dynamic programming – the algorithms search for the optimal path in the search matrix Global constraints narrow the search path in the matrix which results in a significant decrease in the number of performed calculations

Agenda Introduction Related Work Experimental Evaluation Computational Times The Change of 1NN Graph Conclusions and Future Work

Quality of Similarity Measures Quality of similarity measures is usually evaluated indirectly By assessment of different classifier accuracy Simple 1-nearest classifier (1NN) gives among the best results for time-series data The accuracy of 1NN directly reflects the quality of a similarity measure We report the calculation times for unconstrained and constrained DTW and LCS We focus on the 1NN graph and its change with regard to the change of constraints

Experimental Evaluation The unconstrained measure and a measure with the following constraints: 75%, 50%, 25%, 20%, 15%, 10%, 5%, 1% and 0% of the size of the time series Smaller constraints have more interesting behavior Set of experiments was conducted on 38 datasets from UCR Time Series Repository The length of time series varies from 24 to 1882 depending of the data set The number of time series per data set varies from 60 to 9236.

Computational Times The efficiency of calculating the distance matrix The distance matrix for one data set is the matrix where element (i,j) contains the distance between i-th and j-th time series The calculation of the distance matrix is a time-consuming operation All experiments are performed on AMD Phenom II X4 945 with 3GB RAM

Computational Times - DTW

Computational Times - LCS

Computational Times Introduction of global constraints in both measures significantly speeds up the process of distance matrix computation Direct consequence of a faster similarity measure It is known for DTW that smaller values of constraints can give more accurate classification The average constraint size, which gives the best accuracy, for all datasets is 4% of the time-series length LCS measure is still not well investigated

The Change of 1NN Graph The nearest neighbor graph is a directed graph where each time series is connected with its nearest neighbor graph for unconstrained measures (DTW and LCS) and for measures with the following constraints: 75%, 50%, 25%, 20%, 15%, 10%, 5%, 1% and 0% of the length of time series The change of nearest neighbor graphs is tracked as the percentage of time series (nodes in the graph) that changed their nearest neighbor compared to the nearest neighbor in the unconstrained measure

The Change of 1NN Graph - DTW

The Change of 1NN Graph - LCS

The Change of 1NN Graph Both measures behave in a similar manner when the constraint is narrowed 1NN graph remains the same until the size of the constraint is narrowed to approximately 20%, and after that the graph starts to change significantly All datasets (for both measures) reach high percentages of difference (over 50%) for small constraint sizes (5-10%) Constrained measures represent qualitatively different measures than the unconstrained ones

Agenda Introduction Related Work Experimental Evaluation Computational Times The Change of 1NN Graph Conclusions and Future Work

Conclusions We examined the influence of global constraints on two most representative elastic measures for time series: DTW and LCS Through an extensive set of experiments we showed that the usage of global constraints can significantly reduce the computation time We demonstrated that the constrained measures are qualitatively different than their unconstrained counterparts For DTW it is known that the constrained measures are more accurate, while for LCS this issue is still open.

Future Work To investigate the accuracy of the constrained LCS measure for different values of constraints To explore the influence of global constraints on the computation time and 1NN graphs of other elastic measures like ERP, EDR, Swale, etc. The constrained variants of these elastic measures should also be tested with respect to classification accuracy

Thank you for your attention FAP site: http://perun.pmf.uns.ac.rs/fap/