Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.

Slides:

Advertisements

Similar presentations

Part 2: Unsupervised Learning

Advertisements

1 Gesture recognition Using HMMs and size functions.

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Computer vision: models, learning and inference Chapter 8 Regression.

CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct

Dimension reduction (1)

Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.

An Overview of Machine Learning

Supervised Learning Recap

3D Human Body Pose Estimation from Monocular Video Moin Nabi Computer Vision Group Institute for Research in Fundamental Sciences (IPM)

Hidden Markov Models Theory By Johan Walters (SR 2003)

Hidden Markov Models in NLP

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University.

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.

Pattern Recognition and Machine Learning

Principal Component Analysis

Lecture 5: Learning models using EM

SLAM: Simultaneous Localization and Mapping: Part I Chang Young Kim These slides are based on: Probabilistic Robotics, S. Thrun, W. Burgard, D. Fox, MIT.

Conditional Random Fields

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University.

Gaussian Process Dynamical Models JM Wang, DJ Fleet, A Hertzmann Dan Grollman, RLAB 3/21/2007.

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a generative framework for spectral clustering Amos Storkey, Tom.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

A Unifying Review of Linear Gaussian Models

Crash Course on Machine Learning

Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Presented By Wanchen Lu 2/25/2013

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

Isolated-Word Speech Recognition Using Hidden Markov Models

Computer vision: models, learning and inference Chapter 19 Temporal models.

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.

GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

CS Statistical Machine learning Lecture 24

EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.

Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Lecture 2: Statistical learning primer for biologists

Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.

Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

Ch 12. Continuous Latent Variables ~ 12

Probability Theory and Parameter Estimation I

Computer vision: models, learning and inference

Statistical Models for Automatic Speech Recognition

Probabilistic Models with Latent Variables

Filtering and State Estimation: Basic Concepts

Statistical Models for Automatic Speech Recognition

Principal Component Analysis

Probabilistic Surrogate Models

Presentation transcript:

Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1

Graphical ModelsKernel Methods Big Picture Question Dependent variables Hidden variables High dimensional Nonlinear Multimodal High dimensional Nonlinear Multimodal Dependent variables Hidden variables Combine the best of graphical models and kernel methods? 2

Hidden Markov Models (HMMs) … … Video sequence Music High-dimensional features Hidden variables Unsupervised learning High-dimensional features Hidden variables Unsupervised learning 3

Notation … … Observation Transition Prior 4

Previous Work on HMMs Expectation maximization [Dempster et al. 77] : – Maximum likelihood solution Local maxima Curse of dimensionality Singular value decomposition (SVD) for surrogate hidden states No local optima Consistent Spectral HMMs [Hsu et al. 09, Siddiqi et al. 10], Subspace Identification [Van Overschee and De Moor 96] 5

Input output Variable elimination: Predictive Distributions of HMMs = Observable Operator [Jaeger 00] Observable Operator [Jaeger 00] 6

Input output Variable elimination (matrix representation): Predictive Distributions of HMMs … 7

Key observation: need not recover : Observable representation of HMM where are singular vectors of joint probability of sequence pairs [Hsu et al. 09] Only need to estimate O, A x and π up to invertible transformation S 8

Observable representation for HMM pairs triplets singletons sequence Thin SVD of C 2,1, get principal left singular vectors U 9

Observable representation for HMMs pairs triplets singletons Works only for discrete case 10

Key Objects in Graphical Models Marginal distributions Joint distributions Conditional distributions Sum rule Product rule Use kernel representation for distributions, do probabilistic inference in feature space 11

Embedding distributions Summary statistics for distributions : Pick a kernel, and generate a different summary statistic Mean Covariance expected features Probability P(y 0 ) 12

Embedding distributions One-to-one mapping from to for certain kernels (RBF kernel) Sample average converges to true mean at 13

Embedding joint distributions Embedding joint distributions using outer-product feature map is also the covariance operator Recover discrete probability with delta kernel Empirical estimate converges at 14

Embedding Conditionals For each value X=x conditioned on, return the summary statistic for Some X=x are never observed 15

Embedding conditionals Conditional Embedding Operator avoid data partition 16

Conditional Embedding Operator Estimation via covariance operators [Song et al. 09] Gaussian case: covariance matrix instead Discrete case: joint over marginal Empirical estimate converges at 17

Sum and Product Rules Probabilistic Relation Hilbert Space Relation Sum Rule Product Rule Total Expectation Conditional Embedding Linearity 18

Hilbert Space HMMs … … 19

Hilbert space HMMs pairs triplets singletons 20

Experiment Video sequence prediction Slot car sensor measurement prediction Speech classification Compare with discrete HMMs learned by EM [Dempster et al. 77], spectral HMM [Sajid et al. 10], and Linear dynamical system approach (LDS) [Sajid et al. 08] 21

Predicting Video Sequences Sequence of grey scale images as inputs Latent space dimension 50 22

Predicting Sensor Time-series Inertial unit: 3D acceleration and orientation Latent space dimension 20 23

Audio Event Classification Mel-Frequency Cepstral Coefficients features Varying latent space dimension 24

Summary Represent distributions in feature spaces, reason using Hilbert space sum and product rules Extends HMMs nonparametrically to domains with kernels Kernelize belief propagation, CRF and general graphical models with hidden variables? 25