Presentation is loading. Please wait.

Presentation is loading. Please wait.

Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.

Similar presentations


Presentation on theme: "Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation."— Presentation transcript:

1 Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation 10/13/2012 7th INFORMS Workshop on Data Mining and Health Informatics (DM-HI 2012)

2 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix Outline 2  Time series classification Problem definition Motivation  Feature learning for multivariate time series classification  Computational experiments and results  Conclusions and future work

3 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix 3 Time Series Classification  Time series classification is a supervised learning problem The input consists of a set of training examples and associated class labels, Each example is formed by one or more time series (numerical or nominal variables) Predict the class of the new (test) series

4 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix Motivations  People measure things, and things (with rare exceptions) change over time Time series are everywhere Consider a patient’s medical record  test values  observations  actions and related responses 4 ECG Heartbeat Stock

5 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix Motivations  Other types of data can be converted to time series. Everything is about the representation.  Example: Recognizing words 5 An example word “Alexandria” from the dataset of word profiles for George Washington's manuscripts. Images from E. Keogh. A quick tour of the datasets for VLDB 2008. In VLDB, 2008. A word can be represented by two time series created by moving over and under the word

6 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix Challenges  Local patterns are important Translations and dilations (warping) 6 Problem occurred over a shorter time interval Time of the peaks may change (two peaks are observed earlier for blue series) Observed 4 peaks are related to certain event in the manufacturing process Indication of a problem

7 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix Challenges  Multivariate time series (MTS) Relation of patterns within the series and interactions between series are important High dimensionality 7

8 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix Approaches 8  Instance-based methods  Predict based on the similarity to the training time series  Requires a similarity measure (distance measure)  Euclidean distance  ….  Dynamic Time Warping (DTW) distance is known to be strong solution [1]  Handles translations and dilations by matching observations  Feature-based methods  Predict a test instance based on a model trained on extracted feature vectors  Requires feature extraction methods and a supervised learner (i.e. decision tree, support vector machine, etc.) to be trained on the extracted features

9 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix Instance-based methods  Disadvantages May not be suitable for real time applications [3]  DTW has a time complexity of O(n 2 ) (it is a shortest path problem)  the complexity reduces to O(n) using a lower bound (LB_Keogh [8]) where n is the length of the time series but still not tractable for real time applications Not scalable with large number of training samples and variables,  Requires storage of the training time series  Not suitable for resource limited environments (i.e. sensors) Does not consider the interaction between the variables 9

10 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix Feature-based methods  Time series are represented by the features generated. Shape-based features  Mean, variance, slope … Wavelet features  Coefficients … Linear predictive coding (LPC) features …  Global features provide a compact representation of the series (such as global mean/variance)  Local features are important  Features from time series segments (intervals) 10 mean

11 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix Feature learning for time series classification 11  A framework to learn a feature representation for multivariate time series (MTS) classification called symbolic MTS -> S-MTS  Difficult to represent MTS, it has been studied in different fields such as statistics, signal processing and control theory, [14] provides an extensive review The aim is to obtain a rectangular representation by transforming the MTS to a fixed number of columns using rectangularization methods  PCA, SVD  Clustering Modification of similarity-based approaches  Both approaches (rectangularization and similarity-based) have problems with handling warping large dimensional feature spaces the interaction within and between the time series nominal and missing values

12 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix S-MTS Learned discretization 12 ABADF BBCF DDDCAA

13 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix MTS representation and classification  A tree-based ensemble is used to learn features (a symbolic representation) For each tree  The number of symbols is determined by the number of terminal nodes (fixed to R) Every tree generates a symbolic representation number of trees =  Frequency of the symbols over each tree is concatenated to obtain final representation (bag-of-words) 13 ABADF BBCF DDDCAA 2 A1 B 1 D1 F

14 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix S-MTS Experiments  45 univariate time series datasets from UCR database 20 of them compared to bag-of-patterns (BOP) approach due to [16]  document classification techniques based on Symbolic Aggregate approXimation (SAX)  15 multivariate time series datasets from UCI machine learning repository, CMU motion capture database and UCR database Cross-validation to compare existing studies  A nested cross-validation scheme (as proposed by [15]) is used Performance on test data to compare nearest neighbor classifiers  Parameters Cross-validation for parameter setting 14 Number of trees Number of terminal nodes (symbols)

15 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix S-MTS - Performance on 45 univariate time series datasets 15

16 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix 16

17 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix S-MTS Performance on 15 MTS datasets 17 Sign language Speech Gesture Motion Handwriting

18 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix S-MTS Cross-validation results 18

19 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix S-MTS Comparison based on test error rates 19

20 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix  Two step approach New symbolic representation of time series  Relations between and within the series  Features learned within the algorithm (not pre-specified)  Nominal and missing values Warping is handled by using the bag-of-words representation  Scalable (allows for parallel implementation) Training complexity of the random forest classifier Classification takes less than a millisecond  Best current results, potentially better results with document classification approaches (use of N-grams etc.) addition of lag variables to initial feature set S-MTS Conclusions and future work 20 The code of S-MTS and the datasets will be provided in http://www.mustafabaydogan.com/multivariate-time-series-discretization-for- classification.html

21 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix Thanks! Questions and Comments? 21

22 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix References 22

23 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix References (continued) 23

24 Mustafa Gokce Baydogan, George Runger and Eugene Tuv DM-HI 2012, Phoenix References (continued) 24


Download ppt "Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation."

Similar presentations


Ads by Google