Other Models for Time Series. The Hidden Markov Model (HMM)

Slides:



Advertisements
Similar presentations
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Advertisements

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Hidden Markov Model 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction – Markov Chain – Hidden Markov Model (HMM) Formal Definition of HMM & Problems Estimate.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Introduction to Hidden Markov Models
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models in NLP
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
… Hidden Markov Models Markov assumption: Transition model:
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Lecture 5: Learning models using EM
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Face Recognition Using Embedded Hidden Markov Model.
Timothy and RahulE6886 Project1 Statistically Recognize Faces Based on Hidden Markov Models Presented by Timothy Hsiao-Yi Chin Rahul Mody.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Doug Downey, adapted from Bryan Pardo,Northwestern University
Scenario Generation for the Asset Allocation Problem Diana Roman Gautam Mitra EURO XXII Prague July 9, 2007.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Dynamic Time Warping Applications and Derivation
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Speech Technology Lab Ƅ ɜ: m ɪ ŋ ǝ m EEM4R Spoken Language Processing - Introduction Training HMMs Version 4: February 2005.
Maximum Likelihood Estimation
Isolated-Word Speech Recognition Using Hidden Markov Models
Gaussian Mixture Model and the EM algorithm in Speech Recognition
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
THE HIDDEN MARKOV MODEL (HMM)
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
HMM - Part 2 The EM algorithm Continuous density HMM.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CS Statistical Machine learning Lecture 24
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models.
Hidden Markov Autoregressive Models
Hidden Markov Model LR Rabiner
CONTEXT DEPENDENT CLASSIFICATION
LECTURE 15: REESTIMATION, EM AND MIXTURES
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Other Models for Time Series

The Hidden Markov Model (HMM)

A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations {Y t |t  T } = {Y 1, Y 2,..., Y T }

The sequence of states {X 1, X 2,..., X T } form a Markov chain moving amongst the M states {1, 2, …, M}. The observation Y t comes from a distribution that is determined by the current state of the process X t. (or possibly past observations and past states). The states, {X 1, X 2,..., X T }, are unobserved (hence hidden).

A Markov Chain The probability that the Markov Chain goes into state j at time t + 1 given the sequence of states up to time t, depends only on the state at time t and not on how it arrived there

The behavior of a Markov Chain is described by: the transition probability matrix where and the initial state probability vector where

Some basic problems: from the observations {Y 1, Y 2,..., Y T } 1.Determine the sequence of states {X 1, X 2,..., X T }. 2.Determine (or estimate) the parameters of the stochastic process that is generating the states and the observations.;

Examples

Example 1 A person is rolling two sets of dice (one is balanced, the other is unbalanced). He switches between the two sets of dice using a Markov transition matrix. The states are the dice. The observations are the numbers rolled each time.

Balanced Dice

Unbalanced Dice

Example 2 The Markov chain is two state. The observations (given the states) are independent Normal. Both mean and variance dependent on state.

Example 3 –Dow Jones

Daily Changes Dow Jones

Hidden Markov Model??

Bear and Bull Market?

Speech Recognition When a word is spoken the vocalization process goes through a sequence of states. The sound produced is relatively constant when the process remains in the same state. Recognizing the sequence of states and the duration of each state allows one to recognize the word being spoken.

The interval of time when the word is spoken is broken into small (possibly overlapping) subintervals. In each subinterval one measures the amplitudes of various frequencies in the sound. (Using Fourier analysis). The vector of amplitudes Y t is assumed to have a multivariate normal distribution in each state with the mean vector and covariance matrix being state dependent.

Hidden Markov Models for Biological Sequence Consider the Motif: [AT][CG][AC][ACGT]*A[TG][GC] Some realizations: ACA---ATG TCAACTATC ACAC--AGC AGA---ATC ACCG--ATC

A.8 C G T.2 A C.8 G.2 T A.8 C.2 G T A C1.0 G T A C G.2 T.8 A C.8 G.2 T A.2 C.4 G.2 T Hidden Markov model of the same motif : [AT][CG][AC][ACGT]*A[TG][GC]

Profile HMMs Begin End

Computing Likelihood Let  ij = P[X t+1 = j|X t = i] and  = (  ij ) = the M  M transition matrix. Let = P[X 1 = i] and = the initial distribution over the states.

Now assume that P[Y t = y t |X 1 = i 1, X 2 = i 2,..., X t = i t ] = P[Y t = y t | X t = i t ] = p(y t | ) = Then P[X 1 = i 1,X 2 = i 2..,X T = i T, Y 1 = y 1, Y 2 = y 2,..., Y T = y T ] = P[X = i, Y = y] =

Therefore P[Y 1 = y 1, Y 2 = y 2,..., Y T = y T ] = P[Y = y]

In the case when Y 1, Y 2,..., Y T are continuous random variables or continuous random vectors, Let f(y| ) denote the conditional distribution of Y t given X t = i. Then the joint density of Y 1, Y 2,..., Y T is given by = f(y 1, y 2,..., y T ) = f(y) where = f(y t | )

Efficient Methods for computing Likelihood The Forward Method Consider

The Backward Procedure

Prediction of states from the observations and the model:

The Viterbi Algorithm (Viterbi Paths) Suppose that we know the parameters of the Hidden Markov Model. Suppose in addition suppose that we have observed the sequence of observations Y 1, Y 2,..., Y T. Now consider determining the sequence of States X 1, X 2,..., X T.

Recall that P[X 1 = i 1,..., X T = i T, Y 1 = y 1,..., Y T = y T ] = P[X = i, Y = y] = Consider the problem of determining the sequence of states, i 1, i 2,..., i T, that maximizes the above probability. This is equivalent to maximizing P[X = i|Y = y] = P[X = i,Y = y] / P[Y = y]

The Viterbi Algorithm We want to maximize P[X = i, Y = y] = Equivalently we want to minimize U(i 1, i 2,..., i T ) Where ln (P[X = i, Y = y]) = = - U(i 1, i 2,..., i T )

Minimization of U(i 1, i 2,..., i T ) can be achieved by Dynamic Programming. This can be thought of as finding the shortest distance through the following grid of points. By starting at the unique point in stage 0 and moving from a point in stage t to a point in stage t+1 in an optimal way. The distances between points in stage t and points in stage t+1 are equal to:

Stage 0Stage 1Stage 2Stage T-1Stage T... Dynamic Programming

By starting at the unique point in stage 0 and moving from a point in stage t to a point in stage t+1 in an optimal way. The distances between points in stage t and points in stage t+1 are equal to:

Let Theni 1 = 1, 2, …, M and i t+1 = 1, 2, …, M; t = 1,…, T-2

Finally

Summary of calculations of Viterbi Path 1. i 1 = 1, 2, …, M 2. i t+1 = 1, 2, …, M; t = 1,…, T-2 3.

Summary of Prediction of states from the observations and the model:

Estimation of Parameters of a Hidden Markov Model If both the sequence of observations Y 1, Y 2,..., Y T and the sequence of States X 1, X 2,..., X T is observed Y 1 = y 1, Y 2 = y 2,..., Y T = y T, X 1 = i 1, X 2 = i 2,..., X T = i T, then the Likelihood is given by:

the log-Likelihood is given by:

In this case the Maximum Likelihood estimates are: = the MLE of  i computed from the observations yt where X t = i.

MLE (states unknown) If only the sequence of observations Y 1 = y 1, Y 2 = y 2,..., Y T = y T are observed then the Likelihood is given by:

It is difficult to find the Maximum Likelihood Estimates directly from the Likelihood function. The Techniques that are used are 1. The Segmental K-means Algorithm 2. The Baum-Welch (E-M) Algorithm

The Segmental K-means Algorithm In this method the parameters are adjusted to maximize where is the Viterbi path

Consider this with the special case Case: The observations {Y 1, Y 2,..., Y T } are continuous Multivariate Normal with mean vector and covariance matrix when, i.e.

1.Pick arbitrarily M centroids a 1, a 2, … a M. Assign each of the T observations y t (kT if multiple realizations are observed) to a state i t by determining : 2.Then

3. And 4.Calculate the Viterbi path (i 1, i 2, …, i T ) based on the parameters of step 2 and 3. 5.If there is a change in the sequence (i 1, i 2, …, i T ) repeat steps 2 to 4.

The Baum-Welch (E-M) Algorithm The E-M algorithm was designed originally to handle “Missing observations”. In this case the missing observations are the states {X 1, X 2,..., X T }. Assuming a model, the states are estimated by finding their expected values under this model. (The E part of the E-M algorithm).

With these values the model is estimated by Maximum Likelihood Estimation (The M part of the E-M algorithm). The process is repeated until the estimated model converges.

The E-M Algorithm Let denote the joint distribution of Y,X. Consider the function: Starting with an initial estimate of. A sequence of estimates are formed by finding to maximize with respect to.

The sequence of estimates converge to a local maximum of the likelihood.

In the case of an HMM the log-Likelihood is given by:

Recall and Expected no. of transitions from state i.

Let Expected no. of transitions from state i to state j.

The E-M Re-estimation Formulae Case 1: The observations {Y 1, Y 2,..., Y T } are discrete with K possible values and

Case 2: The observations {Y 1, Y 2,..., Y T } are continuous Multivariate Normal with mean vector and covariance matrix when, i.e.

Measuring distance between two HMM’s Let and denote the parameters of two different HMM models. We now consider defining a distance between these two models.

The Kullback-Leibler distance Consider the two discrete distributions and ( and in the continuous case) then define

and in the continuous case:

These measures of distance between the two distributions are not symmetric but can be made symmetric by the following:

In the case of a Hidden Markov model. where The computation of in this case is formidable

Juang and Rabiner distance Let denote a sequence of observations generated from the HMM with parameters: Let denote the optimal (Viterbi) sequence of states assuming HMM model.

Then define: and