A Time Series Representation Framework Based on Learned Patterns

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

SAX: a Novel Symbolic Representation of Time Series

Random Forest Predrag Radenković 3237/10

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.

Pattern Analysis Prof. Bennett Math Model of Learning and Discovery 2/14/05 Based on Chapter 1 of Shawe-Taylor and Cristianini.

Mining Time Series.

Uncertainty Representation. Gaussian Distribution variance Standard deviation.

Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.

TRADING OFF PREDICTION ACCURACY AND POWER CONSUMPTION FOR CONTEXT- AWARE WEARABLE COMPUTING Presented By: Jeff Khoshgozaran.

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Jessica Lin, Eamonn Keogh, Stefano Loardi

Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn.

Dynamic Time Warping Applications and Derivation

Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.

1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.

Classification and Prediction: Regression Analysis

Ensemble Learning (2), Tree and Forest

Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.

Data Mining Techniques

Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.

DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Mean-shift and its application for object tracking

Analysis of Constrained Time-Series Similarity Measures

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.

Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Mining Time Series.

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

Discovering Deformable Motifs in Time Series Data Jin Chen CSE Fall 1.

Pattern Discovery of Fuzzy Time Series for Financial Prediction -IEEE Transaction of Knowledge and Data Engineering Presented by Hong Yancheng For COMP630P,

Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.

Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois.

MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.

Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.

Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

NSF Career Award IIS University of California Riverside Eamonn Keogh Efficient Discovery of Previously Unknown Patterns and Relationships.

Data Mining and Decision Support

Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

VizTree Huyen Dao and Chris Ackermann. Introducing example

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

ITree: Exploring Time-Varying Data using Indexable Tree Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.

What Is Cluster Analysis?

k-Nearest neighbors and decision tree

School of Computer Science & Engineering

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Supervised Time Series Pattern Discovery through Local Importance

Neural networks (3) Regularization Autoencoder

Machine Learning Basics

A Time Series Representation Framework Based on Learned Patterns

Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.

Multivariate Methods Berlin Chen

Neural networks (3) Regularization Autoencoder

Multivariate Methods Berlin Chen, 2005 References:

Presentation transcript:

A Time Series Representation Framework Based on Learned Patterns Mustafa Gokce Baydogan● George Runger* Didem Yamak" ● Boğaziçi University * Arizona State University " DeVry University CIDSE, Arizona State University, 10/11/2013

Outline Time series data mining Motivation Representing time series Measuring similarity Learning a pattern-based representation Pattern (relationship) discovery Learned pattern similarity (LPS) Computational experiments and results Conclusions and future work

Time series data mining What is time series? A numeric (nominal) time series is a sequence of observations of a numeric (nominal) property over time The output of an Electrocardiography (ECG) recorder with time represented on the x-axis voltage represented on the y-axis 3

Time series data mining Motivations People measure things, and things (with rare exceptions) change over time Time series are everywhere ECG Heartbeat Stock Images from E. Keogh. A decade of progress in indexing and mining large time series databases. In VLDB, page 1268, 2006. 4 4

Time series data mining Motivations Other types of data can be converted to time series. Everything is about the representation. Example: Recognizing words An example word “Alexandria” from the dataset of word profiles for George Washington's manuscripts. A word can be represented by two time series created by moving over and under the word Images from E. Keogh. A quick tour of the datasets for VLDB 2008. In VLDB, 2008. 5 5

Time series data mining Tasks Clustering Classification Rule Discovery 50 1000 150 2000 2500 20 40 60 80 100 120 140 A B C Motif Discovery All tasks requires a representation Most of them requires a similarity measure Query by Content Anomaly Detection 6

Time series classification A supervised learning problem aimed at labeling temporally structured univariate (or multivariate) sequences of certain (or variable) length. 7

Challenges Local patterns are important Translations and dilations (warping) Time of the peaks may change (two peaks are observed earlier for blue series) Observed four peaks are related to certain event in the manufacturing process Problem occurred over a shorter time interval Indication of a problem

Challenges Time series are usually noisy Multivariate time series (MTS) Relation of patterns within the series and interactions between series may be important High-dimensionality

Bag-of-words Originated from document classification approaches Bag-of-words is also referred as Bag-of-features in computer vision Bag-of-instances in multiple instance learning (MIL) Bag-of-frames in audio and speech recognition Used for many computer vision problems “This is a book not a pencil”

Earlier work Time series classification A Bag-of-Features Framework to Classify Time Series* Works on univariate time series Segments subsequences of random length from random locations Extracts simple features (mean, slope, variance) over intervals Trains a supervised learner on subsequence representation to generate class probability estimates (CPE) for each subsequence Aggregates CPE of subsequences for each series to generate a time series representation in a BoF framework Fast and provides the best results with few parameters *Mustafa Gokce Baydogan, George Runger, Eugene Tuv, "A Bag-of-Features Framework to Classify Time Series," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, no.11, pp.2796-2802, Nov. 2013 11

Earlier work Time series classification Multivariate Time Series Classification with Learned Discretization* Works on both univariate and multivariate series Trains a supervised learner on the observed values to learn a symbolic representation for time series Does not generate features explicitly Has similarities to work to be described today Aggregates the symbols using a bag-of-words (BoW) representation Without generation of motifs (2-mers, k-mers, etc.) Considers relationship across multiple variables in a straightforward manner (for multivariate series) Simple, fast, performs better than compared to existing approaches without setting of many parameters *Mustafa Gokce Baydogan, George Runger, Eugene Tuv, "Multivariate Time Series Classification with Learned Discretization," to Data Mining and Knowledge Discovery (received major revision on August 7th 2013) 12

Approaches for time series analysis Time series representation To reduce high-dimensionality noise To capture trends, shapes and patterns Provide more information compared to exact values of each time series data point Time series similarity To capture and reflect the underlying similarity Important for a variety of DM tasks such as similarity search, classification, clustering, etc.

Time series representation But the method of trees is different from that used previously for time series * Allows lower bounding for similarity computations

Time series similarity Popular (No parameter) Intuitive Fast computation Performs POORLY Very popular (No parameter) Handles warping (Accurate) Hard to beat May perform POORLY (long series with noise) Handles warping (Accurate) Too many parameters to tune Computationally not efficient

A popular similarity measure Dynamic time warping (DTW) Strong solution known for time series problems in a variety of domains The sequences ”warped” non-linearly in the time dimension to measure similarity independent of certain non-linear variations in the time dimension Alignment of time series by DTW recognizes the similarity of the series better 16 16

Representations based on trees Overview of regression trees A regression tree is a kind of additive model of the form ki is constant and Di is the disjoint partitions defined by the tree Models of this type are sometimes called piecewise constant regression models partition the predictor space in a set of regions and fit a constant value within each region. Find Di that minimizes SSE of m(x) in a recursive manner 17

Previous work on Tree-based time series representation A regression tree-based approach has been used to learn a representation (Geurts, 2001) A simple piecewise constant model Your data matrix

A new representation approach Predicting (forecasting) a segment Time series segment of length L Your data matrix Forecast ∆=50 (gap) time units forward

A new representation approach Learned patterns Time series is 128 units long Predictor segment 1-60 Response segment 51-111

A new representation approach Multiple segments Extract all possible segments of length L<T L=16 (segment length) where T=27 (TS length) Series 1 Series n Series N Concatenate over all time series 21

A new representation approach based on regression trees Build J trees with depth D Selection of a random predictor column Introduces multiple random ∆ values Works well for regression (P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine Learning, 63(1):3-42, 2006.) 22

A pattern-based representation Tree #1 Tree #2 Tree #3 Tree #J ……… ………………... ……………… Aggregate the information over all trees for prediction (i.e. denoising) Each terminal node defines a basis 2. pattern-based representation (a vector of size R times J *) ……………… *Assuming each tree has R terminal nodes 23

Similarity measure Learned Pattern Similarity (LPS) Time series is represented by * Penalizes the number of mismatches Series with mismatching observations in the patterns are different Robust to noise Implicitly works on the discrete values Robust to warping Representation learning handles the problem of warping *Assuming each tree has R terminal nodes 24

Similarity measure Learned Pattern Similarity (LPS) The computations are similar to Euclidean distance Fast Allows for bounding schemes Early abandon Similarity search: Find the reference time series that is most similar to query series Keep record of the best distance found so far Stop computing distance for a reference series if current distance is larger than best-so-far Known to improve the testing time (query time) significantly 25

Learned Pattern Similarity (LPS) Experiments 45 univariate time series datasets from UCR database* Compared to popular NN classifiers with different distance measures Euclidean DTW (Constrained and unconstrained version) SpADe Sparse Spatial Sample Kernels (SSSK) Addition of difference series Taking trend information into consideration A multivariate time series extension Parameters Cross-validation to set parameters for each dataset Segment length (L) (0.25, 0.5, 0.75) factor of time series length Depth of trees (4,6,8) Number of trees=150 Not important if set large enough *http://www.cs.ucr.edu/~eamonn/time_series_data/

Univariate datasets Health Energy Robotics Astronomy Astronomy Chemistry Gesture recognition

LPS Sensitivity analysis Illustration over 6 datasets (L=0.5xT) Multiple depth (D) and number of trees (J) levels 28

Better or comparable results than DTW based approaches Average error rates over 10 replications Scatter plot of error rates LPS versus DTW with no windows DTW with best window (constrained) SpADe LPS w/o difference series Better or comparable results than DTW based approaches LPS performs better than Euclidean distance and SSSK (not shown) 29

LPS Computational complexity Training complexity is O(JNTD) Linear to time series length and number of training series Memory efficient S is not generated explicitly. Only two columns are used at each split decision Testing complexity is Tree traversal to generate the representation -> O(TJD) Similarity computation (worst case) -> O(NJ2D) StarLightCurves dataset (N=1000, T=1024) 30

LPS Multivariate time series While training, randomly select one univariate time series and a target segment Complexity does not change More trees with larger depth may be required 31

LPS Multivariate time series uWaveGestureLibrary Gesture recognition task* Acceleration of hand on x, y and z axis Classify gestures (8 different types of gestures) Same parameters result in error rate of 0.022 * 32

LPS Conclusions and future work A new approach for time series representation Captures relations between and within the series Features learned within the algorithm (not pre-specified) Handles nominal and missing values Handles warping by representation learning Scalable (also allows for parallel implementation) Training complexity is linear to time series length and number of training series Training took at most 6 minutes over 45 datasets (single thread, J=150, D=8, N=1800, T=750) There is still space for improving the implementation SpADe did not return a result for a week of run times Our similarity search takes less than a millisecond Fast and accurate results with few parameters

LPS Conclusions and future work Proposed representation has some relations to deep learning This approach can be extended to many data mining tasks (for both univariate and multivariate time series and images) such as Denoising (in progress) Forecasting (in progress) Anomaly detection (in progress) Clustering (in progress) Indexing … LPS package is provided on http://www.mustafabaydogan.com/learned-pattern-similarity-lps.html 34

Questions and Comments? Thanks! Questions and Comments? LPS package is provided on http://www.mustafabaydogan.com/learned-pattern-similarity-lps.html