Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS.

Similar presentations


Presentation on theme: "Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS."— Presentation transcript:

1 Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS 10.5 Continuous observation Densities in HMMS

2 10.1 Discrete-Time Markov Process (1) A system at any time may be in one of a set of N distinct states indexed by {1,2,…,N}. The system undergoes a change of state (possibly back to the same state) according to a set of probability with the state. Time is represented by t=1,2,…, the state at t by q t.

3 Discrete-Time Markov Model (2) The discrete-time, first order, Markov chain is defined as following : P[q t =j|q t-1 =i] = a ij, 1<=i,j<=N where a ij >= 0 and Σ j=1 N a ij = 1 The example of weather. It is called observable Markov Model. Its every state corresponds an observable event.

4 Discrete-Time Markov Model (3) By giving the state-transition matrix, it can answer a lot of questions. (1) What is the probability of a sequence of weather : Calculate P(O|Model) by O = (sun,sun,sun,rain,rain,sun,cloudy,sun) (2) O = ( i, i, …i, j!=i ) 1 2 d d+1

5 Discrete-Time Markov Model (4) What is the probability that the system is at state i for first d instances. p i (d)=(a ii ) d-1 (1-a ii ) d i = 1/(1-a ii )

6 Hidden Markov Model (1) Extension : the observation is a probability function of the state. So there is a doubly embedded stochastic process : the underlying stochastic process is not directly observable(it is hidden), but can be observed only through another set of stochastic processes that produce the sequence of observations. The name comes.

7 Hidden Markov Model (2) If P(H)=P(T)=0.5, What is the probability that the next 10 tosses will produce the sequence (HHTHTTHTH)? Or (HHHHHHHHHH)? What is the probability that 5 of the next 10 tosses will be tails? Coin-toss Model If an observation is given, there could be a lot of different models which produce that sequence with different probability.

8 Hidden Markov Model (3) The Urn-and-Ball Model There are a couple of urns in which there are many balls with different colors. If an observation sequence is given, there are a lot of interpretation for it. Here the urns are the states, the balls of different color could be the observable events

9 Hidden Markov Model (4) Elements of an HMM (1) state set q = {q 1, q 2, …, q N } or {1, 2, …, N}for short; N is the number of states. (2) observation symbol set V = (v 1, v 2, …, v M ); M is the number of observation symbols. (3) The state-transition probability distribution A = { a ij }

10 Hidden Markov Model (5) a ij = P[q t+1 =j|q t =i] 1<=i,j<=N (4) The observation symbol distribution B={b j (k)} where b j (k) = P[o t =v k |q t =j] 1<=k<=M (5) The initial state distribution π={π i } where π i = P[q 1 =i] 1<=i<=N Sometime the model is presented by λ = ( A, B, π)

11 Hidden Markov Model (6) If an HMM is given, it could be used as a generator to give an observation sequence O=(o 1, o 2, …, o T ) where T is the number of observations, o i is one of the symbols from V (discrete case)

12 10.3 Three Basic Problems of HMMS (1) Problem 1 (Evaluating) Given the observation sequence O and model λ, how to effectively calculate P(O|λ)? Problem 2 (Optimazing or Decoding) Given the observation sequence O and model λ, how to chose an optimal state sequence q = (q 1, q 2, …, q T ) ?

13 Three Basic Problems of HMMS (2) Problem 3 (Training) How to adjust the model parameters λ = ( A, B, π) to maximize P(O|λ)? The solution to problem 1 In fact, all possible state sequence will contribute to P(O|λ). If a state sequence is : q = (q 1, q 2, …, q T )

14 Three Basic Problems of HMMS (3) P(O|q,λ) = b q1 (o 1 )b q2 (o 2 )…b qT (o T ) P(q|λ) = π q1 a q1q2 a q2q3 …a qT-1qT P(O,q|λ) = P(O|q,λ) P(q|λ) P(O|λ) = Σ q P(O|q,λ) P(q|λ) = Σ ai π q1 b q1 (o 1 )a q1q2 b q2 (o 2 ) a q2q3 … a qT- 1qT b qT (o T ) This computation needs order of 2TN T calculations, and it is infeasible. For N=5, T=100, there are about 10 72 computations. A more efficient procedure is required to solve problem 1.

15 Three Basic Problems of HMMS (4) The Forward Procedure Define α t (i) = P(o 1, o 2, …, o t, q t =i|λ) is the probability of the partial observation sequence, o 1, o 2, …, o t, (until time t) and state i at time t, given the model λ. The Iterative procedure is as following : (1) Initialization α 1 (i) = π i b i (o 1 ) 1<=i<=N (2) Iteration α t+1 (j) = [Σ j=1 N α t (i)a ij ] b j (o t+1 ), t=1~T- 1 (3) TerminationP(O|λ) = Σ i=1 N α T (i) This procedure requires N 2 T calculations rather than 2TN T.

16 Three Basic Problems of HMMS (5) The Backward Procedure Define β t (i) = P(o t+1, o t+2, …, o T | q t =i,λ) is the probability of the partial observation sequence from o t+1 to the end, given state i at time t and the model λ. The iterative procedure is as following : (1) Initialization β T (i) = 1, 1<=i<=N (2) Iteration β t (i) = Σ j=1 N a ij b j (o t+1 )β t+1 (j), t=1~T-1 (3) TerminationP(O|λ)= Σ i=1 N π i β 1 (i)b i (o 1 ) It also requires about N 2 T calculations.

17 Three Basic Problems of HMMS (6) Solution to problem 2 The first concept is how to define the ‘optimality’. The most widely used criterion is to find the single best state sequence ( path ) to maximize P(q|O,λ) which is equivalent to maximizing P(q,O|λ). The formal technique is based on dynamic programming methods and called Viterbi algorithm The Viterbi Algorithm Define δ t (i) = max P(q 1 q 2 …q t-1, q t =i, o 1 o 2 …o t |λ) for q 1 q 2 …q t-1 is the best score along a path, at time t which accounts for the first t observations and ends in state i.

18 Three Basic Problems of HMMS (7) δ t+1 (j) = max [δ t (i)a ij ] b j (o t+1 ) The iterative procedure : (1) Initialization δ 1 (i) = π i b i (o 1 ), ψ 1 (i)=0, i=1~N (2) Iterationδ t (j) = max [δ t-1 (i)a ij ] b j (o t ) for i,j=1~N,t=2~T ψ t (j)= argmax [δ t-1 (i)a ij ], for i,j=1~N,t=2~N (3) Termination P * = max i=1 N [δ T (i)] ψ T * = argmax i=1 N [δ T (i)] (4) Path backtracking q t * = ψ t+1 (q t+1 * ) t=T-1~1 An alterative Viterbi implementation uses logarithm to avoid underflow.

19 Three Basic Problems of HMMS (8) Solution to Problem 3 There is no analytic solution for that. Only iterative procedures are available, such as Baum-Welch method (or known as Expectation Maximization) Baum re-estimation procedure Define ξ t (i,j) = P(q t =i, q t+1 =j|O,λ) ξ t (i,j) = P(q t =i, q t+1 =j,O|λ)/P(O|λ) =α t (i)a ij b j (o t+1 )β t+1 (j)/ P(O|λ) = α t (i)a ij b j (o t+1 )β t+1 (j)/Σ i=1 N Σ j=1 N α t (i)a ij b j (o t+1 )β t+1 (j)

20 Three Basic Problems of HMMS (9) Define γ t (i) = P(q t =i|O,λ) is the probability of being in state i at time t, given O and λ. γ t (i) = P(q t =i,O|λ)/P(O|λ) = P(q t =i,O|λ)/Σ i=1 N P(q t =i,O|λ) = α t (i)β t (i)/ Σ i=1 N α t (i)β t (i) So γ t (i) = Σ j=1 N ξ t (i,j) If we sum γ t (i) over the time index t, we get the expected of times that state i is visited, or the expected number of transitions made from state i. The sum of ξ t (i,j) over t is the expected number of transitions from i to j.

21 Three Basic Problems of HMMS (10) So π j ’ = γ 1 (j), a ij ’ = Σ t=1 T-1 ξ t (i,j)/ Σ t=1 T-1 γ t (i) b j ’(k) = Σ t=1 T γ t ’(j) / Σ t=1 T γ t (j) t=1~T and the numerator only considers the cases that the observation o t is v k. These are the iterative formula for the model parameters. The initial parameters λ 0 could be even distributions. Then α t (i) and β t (i) (1<=i<=N, 1<=t<=T) could be calculated for all samples, ξ t (i,j) and γ t (i) also could be calculated, so λgets updated as above.

22 10.4 Types of HMMS (1) Full connection A will be an n x n square matrix and all elements of A are not zero. But there are some different types. For example : The left-right HHM model. In this model the state index will be increased. For this model, a ij = 0 for j<i and π i = 1 only for i=1, and a NN = 1, a Ni = 0 for i<N. There could be some other types : more transfer with skip

23 10.5 Continuous Observation Densities in HMMS (1) In previous discussion we suppose the observation are discrete symbols. We must consider the continuous case. In this case, b j (k) will become probability density b j (o). The most general representation of the pdf is a finite mixture of the form b j (o) = Σ k=1 M c jk N(o, μ jk,U jk ) M is the number of distributions of the mixture, c jk > 0 and Σ k=1 M c jk = 1, j=1~N

24 Continuous observation densities in HMMS (2) The re-estimation formulas are : c jk ’ = Σ t=1 T γ t (j, k) / Σ t=1 T Σ k=1 M γ t (j, k) μ jk ’ = Σ t=1 T γ t (j, k)*o t / Σ t=1 T γ t (j, k) U jk ’ = Σ t=1 T γ t (j, k)*(o-μ jk ) (o-μ jk )’/ Σ t=1 T γ t (j, k) whereγ t (j, k)=[α t (i)β t (i)/ Σ i=1 N α t (i)β t (i)] * [c jk N(o, μ jk,U jk )]/ Σ k=1 M c jk N(o, μ jk,U jk )]


Download ppt "Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS."

Similar presentations


Ads by Google