Pattern Matching with Acceleration Data Pramod Vemulapalli.

Slides:

Advertisements

Similar presentations

Indexing Time Series Based on original slides by Prof. Dimitrios Gunopulos and Prof. Christos Faloutsos with some slides from tutorials by Prof. Eamonn.

Advertisements

Choosing Distance Measures for Mining Time Series Data

SAX: a Novel Symbolic Representation of Time Series

A Theory For Multiresolution Signal Decomposition: The Wavelet Representation Stephane Mallat, IEEE Transactions on Pattern Analysis and Machine Intelligence,

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Mining Time Series Data CS240B Notes by Carlo Zaniolo UCLA CS Dept A Tutorial on Indexing and Mining Time Series Data ICDM '01 The 2001 IEEE International.

Presented by Xinyu Chang

Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.

Efficient Anomaly Monitoring over Moving Object Trajectory Streams joint work with Lei Chen (HKUST) Ada Wai-Chee Fu (CUHK) Dawei Liu (CUHK) Yingyi Bu (Microsoft)

Mining Time Series.

Indexing Time Series. Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time.

Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.

Efficient Similarity Search in Sequence Databases Rakesh Agrawal, Christos Faloutsos and Arun Swami Leila Kaghazian.

Data Mining: Concepts and Techniques Mining time-series data.

1. 2 General problem Retrieval of time-series similar to a given pattern.

Based on Slides by D. Gunopulos (UCR)

A Multiresolution Symbolic Representation of Time Series

A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.

Data Mining – Intro.

Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.

1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.

Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.

Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.

Exact Indexing of Dynamic Time Warping

OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.

Mining Time Series Data

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.

Data Mining Chun-Hung Chou

Multimedia and Time-series Data

Analysis of Constrained Time-Series Similarity Measures

Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.

A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.

Mining Time Series.

Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.

Online Kinect Handwritten Digit Recognition Based on Dynamic Time Warping and Support Vector Machine Journal of Information & Computational Science, 2015.

Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti University of Cyprus Song Lin

Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.

ICDE, San Jose, CA, 2002 Discovering Similar Multidimensional Trajectories Michail VlachosGeorge KolliosDimitrios Gunopulos UC RiversideBoston UniversityUC.

k-Shape: Efficient and Accurate Clustering of Time Series

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

Exact indexing of Dynamic Time Warping

Intelligent DataBase System Lab, NCKU, Taiwan Josh Jia-Ching Ying 1, Wang-Chien Lee 2, Tz-Chiao Weng 1 and Vincent S. Tseng 1 1 Department of Computer.

Efficient Data Compression in Location Based Services Yuni Xia, Yicheng Tu, Mikhail Atallah, Sunil Prabhakar.

A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.

Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.

NSF Career Award IIS University of California Riverside Eamonn Keogh Efficient Discovery of Previously Unknown Patterns and Relationships.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.

WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.

WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.

High-Dimensional Data. Topics Motivation Similarity Measures Index Structures.

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.

The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.

Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.

Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)

Fast Subsequence Matching in Time-Series Databases.

Mining Data Streams with Periodically changing Distributions Yingying Tao, Tamer Ozsu CIKM’09 Supervisor Dr Koh Speaker Nonhlanhla Shongwe April 26,

Data Mining – Intro.

Supervised Time Series Pattern Discovery through Local Importance

Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.

A Time Series Representation Framework Based on Learned Patterns

DISTRIBUTED CLUSTERING OF UBIQUITOUS DATA STREAMS

Robust Similarity Measures for Mobile Object Trajectories

Time Series Data and Moving Object Trajectory

Data Warehousing and Data Mining

I don’t need a title slide for a lecture

Data Mining: Concepts and Techniques — Chapter 8 — 8

Data Mining: Concepts and Techniques — Chapter 8 — 8

Data Mining: Concepts and Techniques — Chapter 8 — 8

Data Pre-processing Lecture Notes for Chapter 2

Promising “Newer” Technologies to Cope with the

Presentation transcript:

Pattern Matching with Acceleration Data Pramod Vemulapalli

Outline  50 % Tutorial and 50 % Research Results  Basics  Literature Survey  Acceleration Data  Preliminary Results  Conclusions

What is A Time-Series Subsequence ? Time Series Time Series Subsequence

What is Time-series Subsequence Matching? Given a Query Signal Find the most “appropriate” match in a database

Applications for TSSM  Data Analytics  Scientific Data  Financial Data  Audio Data (Shazham on Iphone)  SETI Data  A lot of Time Series Data in this universe and in similar parallel universes …  Every time you ask questions such as these :  When is the last time I saw data like this ?  Is there any other data like this ?  Is this pattern a rarity or something that occurs frequently ?

Brute Force  Sliding Window Method Extract a Signal Compare With Template … ….. Store the Distance Metric (Euclidean) All metrics within a certain threshold indicate the results

History  Faloutsos 1994  Indexing  Preprocessing Extract a Signal Fourier Transform Fourier Transform Database

History  Faloutsos 1994  Matching  Post Processing  Find matches from above process and check for Euclidean distance criterion of the entire signal Database From Parseval’s theorem, if Euclidean distance between these coefficients exceeds given threshold, then euclidean distance between original signal is greater than the threshold

Subsequent Work  A number of subsequent papers followed this model  Discrete Fourier Transform 1994 (1)  Singular Value Decomposition 1994 (1)  Discrete Cosine Transform 1997 (2)  Discrete Wavelet Transform 1999 (3)  Piecewise Aggregate Approximation 2001 (4)  Locally Adaptive Piecewise Approximation 2001 (5) 1) C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast Subsequence Matching in Time-Series Databases. In SIGMOD Conference, ) F. Korn, H. V. Jagadish, and C. Faloutsos. Efficiently supporting ad hoc queries in large datasets of time sequences. In SIGMOD ) K. pong Chan and A. W.-C. Fu. Efficient Time Series Matching by Wavelets. In ICDE, ) E. J. Keogh, K. Chakrabarti, S. Mehrotra, and M. J.Pazzani. Locally Adaptive Dimensionality Reductionfor Indexing Large Time Series Databases. In SIGMOD Conference, ) E. J. Keogh, K. Chakrabarti, M. J. Pazzani, and S. Mehrotra. Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowl. Inf. Syst., 3(3), 2001.

Drawbacks: Euclidean Distance Metric  Not robust to temporal distortion  Not robust to outliers  Example :  Something that can account for temporal distortion

DTW based Matching  Previous Work  Dynamic Time Warping 1994 (1) ....  Longest Common Subsequence 2002 (2)  Edit Distance Based Penalty 2004 (3)  Edit Distance on Real Sequence 2005 (4)  Exact Indexing of Dynamic Time Warping 2004 (5) 1) D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In KDD Workshop, ) M. Vlachos, D. Gunopulos, and G. Kollios. Discovering similar multidimensional trajectories. In ICDE, ) L. Chen and R. T. Ng. On the marriage of lp-norms and edit distance. In VLDB, ) L. Chen, M. T. ¨Ozsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In SIGMOD Conference, ) Eamonn Keogh and Chotirat Ann Ratanamahatana. Exact Indexing of Dynamic Time Warping. Knowledge and Information Systems: An International Journal (KAIS). DOI /s May 2004.

Drawbacks: Dynamic Time Warping  Performs Amplitude Matching: Not robust to amplitude distortion  Computationally expensive (especially for longer query signals )

Recent Trends (Hard to predict)  Local Patterns for Matching (Robust to Amplitude and Temporal Distortion)  Landmarks 2000(Smooth a signal and break it at its extrema) (1)  Perceptually Important Points (Sliding Window of Different Sizes) 2007 (2)  Spade 2007 (Break a time signal into smaller pieces) (3)  Shapelets 2010 (Sliding Window of Different Sizes) (4) 1. Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases, Proceedings of the 16th International Conference on Data Engineering, p.33, February 28-March 03, T.C. Fu, F.L. Chung, R. Luk and C.M. Ng, Stock time series pattern matching: template-based vs. rule-based approaches, Engineering Applications of Artificial Intelligence 20 (3) (2007), pp. 347–364 3.Y. Chen, M. A. Nascimento, B. C. Ooi, and A. K. H. Tung. SpADe: On Shape-based Pattern Detection in Streaming Time Series. In ICDE, Ye, Lexiang, and Keogh, Eamonn. Time series shapelets: a novel technique that allows accurate, interpretable and fast classification, Data Mining and Knowledge Discovery 2010.

Drawbacks of Current Methods  (Brute Force) ^ 2  Extract local patterns and perform usual matching  Has only been used for small datasets for specific data mining problems  Something that captures the robustness of local patterns and doesnot use the traditional sliding window methods for matching  Redundant Matching  Larger sized patterns also contain smaller sized patterns  Something that tries to isolate information content in different bands and matches the information content in each band.

Acceleration Data

 A large amount of vehicle data has been collected.  Acceleration Data  Vehicle Service Records  No GPS data !  Some of these vehicles were in convoys and some were independent  Problem: Group the vehicles based on acceleration data to perform other data mining tasks  Vehicles that travelled in convoys or on the same roads must have similar acceleration

Same Road = Same Acceleration ?  Acceleration Data  Route  Driver Behavior  Traffic Conditions Has a consistent effect ? ?

Same Road = Same Acceleration ?  Acceleration Data  Route  Driver Behavior  Traffic Conditions Constant Variable

Which time series subsequence matching technique to use ?  Local pattern matching : Robust to Amplitude and Temporal Distortion  Very memory intensive especially for large query sets  Avoid Sliding Window  Very computationally intensive  Isolate Information Content

Isolate Information Content ?  Take a wavelet transform  Obtain dyadic frequency band  Better frequency resolution at lower frequencies  Better time resolution at higher frequencies

Avoid Sliding Window?  Take a wavelet transform  Take Wavelet Maxima  Maxima can be used to completely reconstruct the signal  Maxima are a stable and unique representation of a signal  Avoid sliding window by just trying to match the wavelet maxima from signals 1) Mallat, S., A Wavelet Tour of Signal Processing. New York : Academic, ) S.Zhong, S.Mallat and., "Characterization of signals from multiscale edges." 1992, Issue IEEE Transactions on Pattern Analysis and Machine Intelligence. 3) C.J.Lennard, C.J.Kicey and., "Unique reconstruction of band-limited signals by a Mallat-Zhong Wavelet Transform." s.l. : Birkhäuser Boston, 1997, Issue Journal of Fourier Analysis and Applications.

Compare Wavelet Maxima ?  Create feature vector that encodes relative distances of the maxima  Common vision technique  Encode the distance by incorporating the necessary invariance  More Invariance =>  More robust to noise  Less unique for matching  Increase Uniqueness by encoding many points  Lesser robustness to outliers

Multi Scale Extrema Features  Matching Process

Preliminary Test: Find most appropriate feature for acceleration data  Collect data in convoy formation  Use data from one of the vehicles to create database  Data from other vehicles is used as Query Data  Non Convoy Case  Use this data as query data  GPS data is used as position reference in both cases

Results:

Results

Conclusions & Future Work  Multiscale Extrema Features work better with Non- Convoy Data  Euclidean distance measure works well with convoy data for short query lengths  Analyze the performance of DTW methods  Use different feature encoding methods  Go beyond neighboring points  Advantages with respect to short time series clustering