Download presentation
Published byRichard Rogers Modified over 9 years ago
1
Pattern Recognition and Machine Learning-Chapter 13: Sequential Data
Affiliation: Kyoto University Name: Kevin Chien, Dr. Oba Shigeyuki, Dr. Ishii Shin Date: Dec. 9, 2011
2
Origin of Markov Models
Idea Origin of Markov Models
3
Why Markov Models IID data not always possible. Illustrate future data (prediction) dependent on some recent data, using DAGs where inference is done by sum-product algorithm. State Space (Markov) Model: Latent Variables Discrete latent: Hidden Markov Model Gaussian latent: Linear Dynamical Systems Order of Markov Chain: data dependence 1st order: Current observation depends only on previous 1 observation
4
State Space Model Latent variable Zn forms a Markov chain. Each Zn contributes to its observation Xn. As order grows #parameter grows, to organize this we use State Space Model Zn-1 and Zn+1 is now independent given Zn (d-separated)
5
For understanding Markov Models
Terminologies For understanding Markov Models
6
Terminologies Markovian Property: stochastic process that probability of a transition is dependent only on present state and not on the manner in which the current state is reached. Transition diagram for same variable different state
7
Terminologies (cont.) F is bounded above and below by g asymptotically
(review)Zn+1 and Zn-1 is d-separated given Zn: means given we block Zn’s outgoing edges there is no path from Zn+1 and Zn-1 =>independent [Big_O_notation, Wikipedia, Dec. 2011]
8
Formula and motivation
Markov Models Formula and motivation
9
Hidden Markov Models (HMM)
Zn discrete multinomial variable Transition probability matrix Sum of each row =1 P(staying in present state) is non-zero Counting non-diagonals K(K-1) parameters
10
Hidden Markov Models (cont.)
Emission (transition) probability with parameters governing the distribution homogeneous model: latent variable share the same parameter A Sampling data is simply noting the parameter values while following transitions with emission probability.
11
HMM, Expect. Max. for max. likelihood
Likelihood function: marginalizing over latent variables Start with initial model parameters for Evaluate Defining Likelihood function results
12
HMM: forward-backward algorithm
2 stage message passing in tree for HMM, to find marginals p(node) efficiently Here the marginals are Assume p(xk|zk), p(zk|zk-1),p(z1) known X=(x1,..,xn), xi:j=(xi,xi+1,..,xj) Goal compute p(zk|x) Forward part: compute p(zk, x1:k) for every k=1,..,n Backward part: compute p(xk+1:n|zk) for every k=1,…,n
13
HMM: forward-backward algorithm (cont.)
P(zk|x) ∝ p(zk,x) = p(xk+1:n|zk,x1:k) p(zk,x1:k) Where xk+1:n and x1:k are d-separated given zk so P(zk|x) ∝ p(zk,x) = p(xk+1:n|zk) p(zk,x1:k) Now we can do EM algorithm and Baum-Welch algorithm to estimate parameter values Sample from posterior z given x. Most likely z with Viterbi algorithm xk+1:n
14
HMM forward-backward algorithm: Forward part
Compute p(zk,x1:k) p(zk,x1:k)=∑(all values of zk-1) p(zk,zk-1,x1:k) = ∑(all values of zk-1) p(xk|zk,zk-1,x1:k-1)p(zk|zk-1,x1:k-1)p(zk-1,x1:k-1) mm…look like a recursive function, if p(zk,x1:k) is labeled αk(zk) then zk-1,x1:k-1 and xk d-separated given zk zk and xk-1 d-separated given zk-1 So αk(zk)=∑(all values of zk-1) p(xk|zk)p(zk|zk-1) αk-1(zk-1) xk+1:n For k=2,..,n Emission prob. transition prob. recursive part
15
HMM forward-backward algorithm: Forward part (cont.)
α1(z1)=p(z1,x1)=p(z1)p(x1|z1) If each z has m states then computational complexity is Θ(m) for each zk for one k Θ(m2) for each k Θ(nm2) in total xk+1:n
16
HMM forward-backward algorithm: Backward part
Compute p(xk+1:n|zk) for all zk and all k=1,..,n-1 p(xk+1:n|zk)=∑(all values of zk+1) p(xk+1:n,zk+1|zk) =∑(all values of zk+1) p(xk+2:n|zk+1,zk,xk+1)p(xk+1|zk+1,zk)p(zk+1|zk) mm…look like a recursive function, if p(xk+1:n|zk) is labeled βk(zk) then zk,xk+1 and xk+2:n d-separated given zk+1 zk and xk+1 d-separated given zk+1 So βk(zk) =∑(all values of zk+1) βk+1(zk+1) p(xk+1|zk+1)p(zk+1|zk) xk+1:n For k=1,..,n-1 recursive part Emission prob. transition prob.
17
HMM forward-backward algorithm: Backward part (cont.)
βn(zn) =1 for all zn If each z has m states then computational complexity is same as forward part Θ(nm2) in total xk+1:n
18
HMM: Viterbi algorithm
Max-sum algorithm for HMM, to find most probable sequence of hidden states for a given observation sequence X1:n Example: transform handwriting images into text Assume p(xk|zk), p(zk|zk-1),p(z1) known Goal: compute z*= argmaxz p(z|x) Given x=x1:n, z=z1:n Given lemma f(a)≥0 ∀a and g(a,b) ≥0 ∀a,b then Maxa,b f(a)g(a,b) = maxa[f(a) maxb g(a,b)] maxz p(z|x) ∝ maxz p(z,x)
19
HMM: Viterbi algorithm (cont.)
μk(zk)=maxz1:k p(z1:k,x1:k) =maxz1:k p(xk|zk)p(zk|zk-1) …..f(a) part p(z1:k-1,x1:k-1) ....g(a,b) part mm…look like a recursive function, if we can make max to appear in front of p(z1:k-1,x1:k-1). Use lemma - by setting a=zk-1, b=z1:k-2 =maxzk-1[p(xk|zk)p(zk|zk-1) maxz1:k-2 p(z1:k-1,x1:k-1)] =maxzk-1[p(xk|zk) p(zk|zk-1) μk-1(zk-1) ] For k=2,…,n
20
HMM: Viterbi algorithm (finish up)
μk(zk)=maxzk-1 p(xk|zk) p(zk|zk-1) μk-1(zk-1) μ1(z1)= p(x1,z1)=p(z1)p(x1|z1) Same method to get maxz μn(zn)=maxz p(x,z) We can get max value, to get max sequence, compute recursive equation bottom-up while remembering values (μk(zk) looks at all paths of μk-1(zk-1)) For k=2,…,n
21
Additional Information
Excerpt of equations and diagrams from [Pattern Recognition and Machine Learning, Bishop C.M.] page Excerpt of equations from Mathematicalmonk, Youtube LLC, Google Inc., (ML 14.6 and 14.7) various titles, July 2011
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.