Download presentation
Presentation is loading. Please wait.
1
Hidden Markov Models
2
Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
3
Hidden Markov Model A HMM is a quintuple (S, E, S : {s 1 …s N } are the values for the hidden states E : {e 1 …e T } are the values for the observations probability distribution of the initial state transition probability matrix emission probability matrix X t+1 XtXt X t-1 e t+1 etet e t-1 X1X1 e1e1 XTXT eTeT
4
Inferences with HMM Filtering: P(x t |e 1:t ) Given an observation sequence, compute the probability of the last state. Decoding: argmax x 1:t P(x 1:t |e 1:t ) Given an observation sequence, compute the most likely hidden state sequence. Learning: argmax P (e 1:t ) where =( ) are parameters of the HMM Given an observation sequence, find out which transition probability and emission probability table assigns the observations the highest probability. Unsupervised learning
5
Filtering P(X t+1 |e 1:t+1 ) = P(X t+1 |e 1:t, e t+1 ) =P(e t+1 |X t+1, e 1:t ) P(X t+1 |e 1:t )/P(e t+1 |e 1:t ) =P(e t+1 |X t+1 ) P(X t+1 |e 1:t )/P(e t+1 |e 1:t ) P(X t+1 |e 1:t ) = x t P(X t+1 |x t, e 1:t ) P(x t |e 1:t ) Same form. Use recursion
6
Filtering Example
7
Viterbi Algorithm Compute argmax x 1:t P(x 1:t |e 1:t ) Since P(x 1:t |e 1:t ) = P(x 1:t, e 1:t )/P(e 1:t ), and P(e 1:t ) remains constant when we consider different x 1:t argmax x 1:t P(x 1:t |e 1:t )= argmax x 1:t P(x 1:t, e 1:t ) Since the Markov chain is a Bayes Net, P(x 1:t, e 1:t )=P(x 0 ) i=1,t P(x i |x i-1 ) P(e i |x i ) Minimize – log P(x 1:t, e 1:t ) =–logP(x 0 ) + i=1,t (–log P(x i |x i-1 ) –log P(e i |x i ))
8
Viterbi Algorithm Given a HMM (S, E, and observations o 1:t, construct a graph that consists 1+tN nodes: One initial node N node at time i. The jth node at time i represent X i =s j. The link between the nodes X i-i =s j and X i =s k is associated with the length –log P(X i =s k | X i-1 =s j-1 )P(e i |X i =s k )
9
The problem of finding argmax x 1:t P(x 1:t |e 1:t ) becomes that of finding the shortest path from x 0 =s 0 to one of the nodes x t =s t.
10
Example
11
Baum-Welch Algorithm The previous two kinds of computation needs parameters =( ). Where do the probabilities come from? Relative frequency? But the states are not observable! Solution: Baum-Welch Algorithm Unsupervised learning from observations Find argmax P (e 1:t )
12
Baum-Welch Algorithm Start with an initial set of parameters 0 Possibly arbitrary Compute pseudo counts How many times the transition from X i-i =s j to X i =s k occurred? Use the pseudo counts to obtain another (better) set of parameters 1 Iterate until P 1 (e 1:t ) is not bigger than P (e 1:t ) A special case of EM (Expectation-Maximization)
13
Pseudo Counts Given the observation sequence e 1:T, the pseudo counts of the link from X t =s i to X t+1 =s j is the probability P(X t =s i,X t+1 =s j |e 1:T ) X t =s i X t+1 =s j
14
Update HMM Parameters Add P(X t =s i,X t+1 =s j |e 1:T ) to count(i,j) Add P(X t =s i |e 1:T ) to count(i) Add P(X t =s i |e 1:T ) to count(i,e t ) Updated a ij = count(i,j)/count(i); Updated b je t =count(j,e t )/count(j)
15
P(X t =s i,X t+1 =s j |e 1:T ) =P(X t =s i,X t+1 =s j, e 1:t, e t+1, e t+2:T )/ P(e 1:T ) =P(X t =s i, e 1:t )P(X t+1 =s j |X t =s i )P(e t+1 |X t+1 =s j ) P(e t+2:T |X t+1 =s j )/P(e 1:T ) =P(X t =s i, e 1:t ) a ij b je t+1 P(e t+2:T |X t+1 =s j )/P(e 1:T ) = i (t) a ij b je t β j (t+1)/P(e 1:T )
16
Forward Probability
17
Backward Probability
18
X t =s i X t+1 =s j t-1 tt+1t+2 i (t) j (t+1) a ij b je t
19
P(X t =s i |e 1:T ) =P(X t =s i, e 1:t, e t+1:T )/P(e 1:T ) =P(e t+1:T | X t =s i, e 1:t )P(X t =s i, e 1:t )/P(e 1:T ) = P(e t+1:T | X t =s i )P(X t =s i |e 1:t )P(e 1:t )/P(e 1:T ) = i (t) β i (t)/P(e t+1:T |e 1:t )
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.