A Time Series Representation Framework Based on Learned Patterns Mustafa Gokce Baydogan● George Runger* Didem Yamak† ● Boğaziçi University * Arizona State University † DeVry University 10/5/2013 8th INFORMS Workshop on Data Mining and Health Informatics (DM-HI 2013)
Outline Time series data mining Motivation Representing time series Measuring similarity Learning a pattern-based representation Pattern (relationship) discovery Learned pattern similarity (LPS) Computational experiments and results Conclusions and future work
Time Series Data Mining Motivations People measure things, and things (with rare exceptions) change over time Time series are everywhere Consider a patient’s medical record test values observations actions and related responses ECG Heartbeat Stock
Time Series Data Mining Motivations Other types of data can be converted to time series. Everything is about the representation. Example: Recognizing words An example word “Alexandria” from the dataset of word profiles for George Washington's manuscripts. A word can be represented by two time series created by moving over and under the word Images from E. Keogh. A quick tour of the datasets for VLDB 2008. In VLDB, 2008.
Challenges Local patterns are important Translations and dilations (warping) Observed four peaks are related to certain event in the manufacturing process Indication of a problem Time of the peaks may change (two peaks are observed earlier for blue series) Problem occurred over a shorter time interval
Challenges Time series are usually noisy Multivariate time series (MTS) Relation of patterns within the series and interactions between series may be important High-dimensionality
Motivations Time series representation Time series similarity To reduce high-dimensionality noise To capture trends, shapes and patterns As they provide more information compared to exact values of each time series data point Time series similarity Accurate Handle warping Fast
Time series representation * Allows lower bounding for similarity computations
Time series similarity Popular (No parameter) Intuitive Fast computation Performs bad Very popular (No parameter) Handles warping (Accurate) Hard to beat May perform bad (long series with noise) Handles warping (Accurate) Too many parameters to tune Computationally not efficient
Learning a pattern-based representation A regression tree-based approach is used to learn a representation Earlier (Geurts, 2001), Your data matrix
A new learning approach Predicting (forecasting) a segment Your data matrix Forecast ∆ (gap) time units forward
Representation Learned patterns Time series is 128 units long Predictor segment 1-60 Response segment 51-111
Multiple segments Concatenate for all time series to create 1. Randomly, select a response segment (column) of length L 2. Build a regression tree At each split decision, select a random predictor column (one segment at each time)* Multiple random ∆ levels Build J trees with depth D *Known to work well for regression P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine Learning, 63(1):3-42, 2006.
Multiple segments (cont.) Tree #1 Tree #2 Tree #3 Tree #J ……… ………………... ……………… Aggregate the information over all trees for prediction (i.e. denoising) Each terminal node defines a basis 2. pattern-based representation (a vector of size RxJ) ……………… 14
Similarity measure Learned Pattern Similarity (LPS) Time series is represented by Suppose be kth entry of then* Penalizes the number of mismatches Series with mismatching observations in the patterns are different Robust to noise Implicitly works on the discrete values Robust to warping Representation learning handles the problem of warping *Assuming each tree has R terminal nodes 15
Similarity measure (cont.) The computations are similar to Euclidean distance Fast Allows for bounding schemes Early abandon Similarity search: Find the reference time series that is most similar to query series Keep record of the best distance found so far Stop computing distance for a reference series if current distance is larger than best-so-far Known to improve the testing time (query time) significantly 16
S-MTS Experiments 45 univariate time series datasets from UCR database* Compared to popular NN classifiers with different distance measures Euclidean DTW (Constrained and unconstrained version) SpADe Sparse Spatial Sample Kernels (SSSK) Addition of difference series Taking trend information into consideration A multivariate time series extension If time permits Parameters Cross-validation to set parameters for each dataset Segment length (L) (0.25, 0.5, 0.75) factor of time series length Depth of trees (4,6,8) Number of trees=150 Not important if set large enough *http://www.cs.ucr.edu/~eamonn/time_series_data/
Univariate datasets Health Energy Robotics Astronomy Astronomy Chemistry Gesture recognition
Parameters Illustration over 6 datasets (L=0.5xT) 19
Average error rates over 10 replications
Multivariate time series While training, randomly select one univariate time series and a target segment Find splits over randomly selected predictor segments of randomly selected univariate time series Complexity does not change More trees with larger depth may be required uWaveGestureLibrary the accelerometer readings in three dimensions (i.e. x, y and z) Same parameters result in error rate of 0.022 21
LPS Conclusions and future work A new approach for time series representation Captures relations between and within the series Features learned within the algorithm (not pre-specified) Handles nominal and missing values Handles warping by representation learning Scalable (also allows for parallel implementation) Training complexity: O(JNTD) Linear to time series length and number of training series Training took at most 6 minutes for 45 datasets (single thread, J=150, D=8, N=1800, T=750) SpADe did not return a result for a week of run Similarity search takes less than a millisecond Fast and accurate results with few parameters
LPS Conclusions and future work This approach can be extended to many data mining tasks (for both univariate and multivariate time series and images) such as Denoising (in progress) Forecasting (in progress) Anomaly detection (in progress) Clustering (in progress) Indexing … LPS package is provided on http://www.mustafabaydogan.com/learned-pattern-similarity-lps.html 23
Questions and Comments? Thanks! Questions and Comments? LPS package is provided on http://www.mustafabaydogan.com/learned-pattern-similarity-lps.html