Discovering Deformable Motifs in Time Series Data Jin Chen CSE 891-001 2012 Fall 1.

Slides:

Advertisements

Similar presentations

Yinyin Yuan and Chang-Tsun Li Computer Science Department

Advertisements

Learning deformable models Yali Amit, University of Chicago Alain Trouvé, CMLA Cachan.

Word Spotting DTW.

Medical Image Registration Kumar Rajamani. Registration Spatial transform that maps points from one image to corresponding points in another image.

Darya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no , pages

Dynamic Bayesian Networks (DBNs)

Sharing Features among Dynamical Systems with Beta Processes

Patch to the Future: Unsupervised Visual Prediction

Gibbs sampling for motif finding in biological sequences Christopher Sheldahl.

Introduction to Data-driven Animation Jinxiang Chai Computer Science and Engineering Texas A&M University.

Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

A Study of Approaches for Object Recognition

Dimensional reduction, PCA

Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.

Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.

Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.

Based on Slides by D. Gunopulos (UCR)

Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Dynamic Time Warping Applications and Derivation

1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.

Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.

Radial Basis Function Networks

Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04/03 | SDM 09 / Travel-Time Prediction Travel-Time Prediction using Gaussian Process.

Function Approximation for Imitation Learning in Humanoid Robots Rajesh P. N. Rao Dept of Computer Science and Engineering University of Washington,

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Motif finding with Gibbs sampling CS 466 Saurabh Sinha.

Yaomin Jin Design of Experiments Morris Method.

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Line detection Assume there is a binary image, we use F(ά,X)=0 as the parametric equation of a curve with a vector of parameters ά=[α 1, …, α m ] and X=[x.

Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.

Multifactor GPs Suppose now we wish to model different mappings for different styles. We will add a latent style vector s along with x, and define the.

The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

CS Statistical Machine learning Lecture 24

1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.

Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.

Flat clustering approaches

 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.

Continuous Representations of Time Gene Expression Data Ziv Bar-Joseph, Georg Gerber, David K. Gifford MIT Laboratory for Computer Science J. Comput. Biol.,10, ,

Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:

VizTree Huyen Dao and Chris Ackermann. Introducing example

Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Methods of multivariate analysis Ing. Jozef Palkovič, PhD.

Yun, Hyuk Jin. Theory A.Nonuniformity Model where at location x, v is the measured signal, u is the true signal emitted by the tissue, is an unknown.

1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.

Hidden Markov Models BMI/CS 576

Bayesian Semi-Parametric Multiple Shrinkage

Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets Chin-Chia Michael Yeh, Yan.

Supervised Time Series Pattern Discovery through Local Importance

Learning Sequence Motif Models Using Expectation Maximization (EM)

Clustering (3) Center-based algorithms Fuzzy k-means

A Time Series Representation Framework Based on Learned Patterns

Dynamical Statistical Shape Priors for Level Set Based Tracking

The Functional Space of an Activity Ashok Veeraraghavan , Rama Chellappa, Amit Roy-Chowdhury Avinash Ravichandran.

Hidden Markov Models Part 2: Algorithms

Bayesian Models in Machine Learning

Transformation-invariant clustering using the EM algorithm

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Presentation transcript:

Discovering Deformable Motifs in Time Series Data Jin Chen CSE Fall 1

Background Time series data are collected in multiple domains, including surveillance, pose tracking, ICU patient monitoring and finance. These time series often contain motifs - “segments that repeat within and across different series”. Discovering these repeated segments can provide primitives that are useful for domain understanding and as higher-level, meaningful features that can be used to segment time series or discriminate among time series data from different groups. Therefore, we need a tool to explain the entire data in terms of repeated, warped motifs interspersed with non-repeating segments. 2

Motif Detection Approaches A motif is defined via a pair of windows of the same length that are closely matched in terms of Euclidean distance. Such pairs are identified via a sliding window approach followed by random projections to identify highly similar pairs that have not been previously identified. However, this method is not geared towards finding motifs that can exhibit significant deformation. Mueen et al., SDM

Motif Detection Approaches Discover regions of high density in the space of all subsequences via clustering. These works define a motif as a vector of means and variances over the length of the window, a representation that also is not geared to capturing deformable motifs. To addresses deformation, dynamic time warping is used to measure warped distance. However, motifs often exhibit structured transformations, where the warp changes gradually over time. Minnen et. al. AAAI

Motif Detection Approaches The work of focus on developing a probabilistic model for aligning sequences that exhibit variability. However, these methods rely on having a segmentation of the time series into corresponding motifs. This assumption allows relatively few constraints on the model, rendering them highly under- constrained in certain settings. Listgarten et. al. NIPS

Motif Detection Approaches 6

Continuous Shape Template Model (CSTM) for Deformable Motif Discovery CSTM is a hidden, segmental Markov model, in which each state either generates a motif or samples from a non-repeated random walk. A motif is represented by smooth continuous functions that are subject to non-linear warp and scale transformations. CSTM learns both the motifs and their allowed warps in an unsupervised way from unsegmented time series data. A demonstration on three distinct real-world domains shows that CSTM achieves considerably better performance than previous methods. 7

CSTM – Generative Model CSTM assumes that the observed time series is generated by switching between a state that generates nonrepeating segments and states that generate repeating (structurally similar) segments. Motifs are generated as samples from a shape template that can undergo nonlinear transformations such as shrinkage, amplification or local shifts. The transformations applied at each observed time t for a sequence are tracked via latent states, the distribution over which is inferred. Simultaneously, the canonical shape template and the likelihood of possible transformations for each template are learned from the data. The random-walk state generates trajectory data without long-term memory. Thus, these segments lack repetitive structural patterns. 8

Canonical Shape Template (CST) Each shape template, indexed by k, is represented as a continuous function s k (l) where l in (0,L k ] and L k is the length of the k th template. In many domains, motifs appear as smooth functions. A promising representation is piecewise Bezier splines. Shape templates of varying complexity are intuitively represented by using fewer or more pieces. A third order Bezier curve is parameterized by four points p i (i in {0,1,2,3}), and over two dimensions, where the 1 st dimension is the time t and the 2 nd dimension is the signal value. 9

Shape Transformation Model - Warping Motifs are generated by non-uniform sampling and scaling of s k. Temporal warp can be introduced by moving slowly or quickly through s k. The allowable temporal warps are specified as an ordered set {w 1, …,w n } of time increments that determines the rate at which we advance through s k. A template-specific warp transition matrix M k w specifies the probability of transitions between warp states. To generate an observation series y 1, …, y T, let w t in {w 1,…,w n } be the random variable tracking the warp and t be the position within the template s k at time t. Then, y t+1 would be generated from the value s k (P t+1 ) where P t+1 = P t +w t+1 and w t+1 ~ M k w (w t ) 10

Shape Transformation Model - Rescale The set of allowable scaling coefficients are maintained as the set {c 1,…, c n }. Let ϕ t+1 in {c 1, …, c n } be the sampled scale value at time t+1, sampled from the scale transition matrix M k ϕ. Thus, the observation y t+1 would be generated around the value ϕ t+1 s k (P t+1 ), a scaled version of the value of the motif at P t+1, where ϕ t+1 ~M k ϕ (ϕ t ). An additive noise value v t+1 ~ N(0, σ) models small shifts. In summary, putting together all three possible deformations, we have that y t+1 = v t+1 +ϕ t+1 s k (P t+1 ). 11

Non-repeating Random Walk (NRW) The NRW model is used to capture data not generated from the templates. If this data has different noise characteristics, the task becomes simpler as the noise characteristics can help disambiguate between motif generated segments and NRW segments. The generation of smooth series can be modeled using an autoregressive process. 12

Template Translations Transitions between generating NRW data and motifs from the CSTs are modeled via a transition matrix T of size (K+1)X(K+1) where the number of CSTs is K. The random variable t tracks the template for an observed series. Transitions into and out of templates are only allowed at the start and end of the template, respectively. Thus, when the position within the template is at the end, we have that K t ~T(K t−1 ), otherwise K t = K t−1. For T, we fix the self-transition parameter for the NRW state as λ, a pre- specified input. Different settings of λ allows control over the proportion of data assigned to motifs versus NRW. 13

Learning the Model The canonical shape templates, their template-specific transformation models, the NRW model, the template transition matrix and the latent states for the observed series are all inferred from the data using hard EM. Coordinate ascent is used to update model parameters in the M- step. In the E-step, given the model parameters, Viterbi is used for inferring the latent trace. 14

Learning the Model 15

Results 16

Results 17

Results 18

Results 19

Conclusions Warp invariant signatures can be used for a forward lookup within beam pruning to significantly speed up inference when K, the number of templates is large. A Bayesian nonparametric prior is another approach that could be used to systematically control the number of classes based on model complexity. A different extension could build a hierarchy of motifs, where larger motifs are comprised of multiple occurrences of smaller motifs, thereby possibly providing an understanding of the data at different time scales. This work can serve as a basis for building nonparametric priors over deformable multivariate curves. 20