Download presentation
Presentation is loading. Please wait.
1
Hidden Markov Models K 1 … 2
2
Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech Recognition (ASR)
3
Example: The Dishonest Casino A casino has two dice: Fair die P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6 Loaded die P(1) = P(2) = P(3) = P(4) = P(5) = 1/10 P(6) = 1/2 Casino player switches back-&-forth between fair and loaded die once in a while Game: 1.You bet $1 2.You roll (always with a fair die) 3.Casino player rolls (maybe with fair die, maybe with loaded die) 4.Highest number wins $2
4
Question # 1 – Evaluation GIVEN A sequence of rolls by the casino player 12455264621461461361366616646616366163661636165 QUESTION How likely is this sequence, given our model of how the casino works? This is the EVALUATION problem in HMMs
5
Question # 2 – Decoding GIVEN A sequence of rolls by the casino player 12455264621461461361366616646616366163661636165 QUESTION What portion of the sequence was generated with the fair die, and what portion with the loaded die? This is the DECODING question in HMMs
6
Question # 3 – Learning GIVEN A sequence of rolls by the casino player 12455264621461461361366616646616366163661636165 QUESTION How “ loaded ” is the loaded die? How “ fair ” is the fair die? How often does the casino player change from fair to loaded, and back? This is the LEARNING question in HMMs
7
The dishonest casino model FAIRLOADED 0.05 0.95 P(1|F) = 1/6 P(2|F) = 1/6 P(3|F) = 1/6 P(4|F) = 1/6 P(5|F) = 1/6 P(6|F) = 1/6 P(1|L) = 1/10 P(2|L) = 1/10 P(3|L) = 1/10 P(4|L) = 1/10 P(5|L) = 1/10 P(6|L) = 1/2
8
Example: the dishonest casino Let the sequence of rolls be: O = 1, 2, 1, 5, 6, 2, 1, 6, 2, 4 Then, what is the likelihood of X= Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair? (say initial probs P(t=0,Fair) = ½, P(t=0,Loaded)= ½) ½ P(1 | Fair) P(Fair | Fair) P(2 | Fair) P(Fair | Fair) … P(4 | Fair) = ½ (1/6) 10 (0.95) 9 =.00000000521158647211 = 0.5 10 -9
9
Example: the dishonest casino So, the likelihood the die is fair in all this run is just 0.521 10 -9 OK, but what is the likelihood of X= Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded? ½ P(1 | Loaded) P(Loaded, Loaded) … P(4 | Loaded) = ½ (1/10) 8 (1/2) 2 (0.95) 9 =.00000000078781176215 = 7.9 10 -10 Therefore, it is after all 6.59 times more likely that the die is fair all the way, than that it is loaded all the way.
10
Example: the dishonest casino Let the sequence of rolls be: O = 1, 6, 6, 5, 6, 2, 6, 6, 3, 6 Now, what is the likelihood X = F, F, …, F? ½ (1/6) 10 (0.95) 9 = 0.5 10 -9, same as before What is the likelihood X= L, L, …, L? ½ (1/10) 4 (1/2) 6 (0.95) 9 =.00000049238235134735 = 0.5 10 -7 So, it is 100 times more likely the die is loaded
11
HMM Timeline Arrows indicate probabilistic dependencies. x’s are hidden states, each dependent only on the previous state. –The Markov assumption holds for the state sequence. o’s are observations, dependent only on their corresponding hidden state. time oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1
12
HMM Formalism An HMM can be specified by 3 matrices { are the initial state probabilities A = {a ij } are the state transition probabilities = Pr(x j |x i ) B = {b ik } are the observation probabilities = Pr(o k |x i ) oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1
13
Generating a sequence by the model Given a HMM, we can generate a sequence of length n as follows: 1.Start at state x i according to prob i 2.Emit letter o 1 according to prob b i (o 1 ) 3.Go to state x j according to prob a ij 4.… until emitting o T 1 2 N … 1 2 N … 1 2 K … … … … 1 2 N … o1o1 o2o2 o3o3 oToT 2 1 N 2 0 b 2o1 22
14
The three main questions on HMMs 1.Evaluation GIVEN a HMM , and a sequence O, FIND Prob[ O | ] 2.Decoding GIVEN a HMM , and a sequence O, FIND the sequence X of states that maximizes P[X | O, ] 3.Learning GIVEN a sequence O, FIND a model with parameters , A and B that maximize P[ O | ]
15
Problem 1: Evaluation Find the likelihood a sequence is generated by the model
16
oToT o1o1 otot o t-1 o t+1 Given an observation sequence and a model, compute the probability of the observation sequence Probability of an Observation
17
oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Let X = x 1 … x T be the state sequence.
18
Probability of an Observation
19
HMM – Evaluation (cont.) Why isn’t it efficient? –For a given state sequence of length T we have about 2T calculations –Let N be the number of states in the graph. –There are N T possible state sequences. –Complexity : O(2TN T ) –Can be done more efficiently by the forward-backward (F-B) procedure.
20
The Forward Procedure (Prefix Probs) oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 The probability of being in state i after generating the first t observations.
21
oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Forward Procedure
22
oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Forward Procedure
23
oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Forward Procedure
24
oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Forward Procedure
25
oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Forward Procedure
26
oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Forward Procedure
27
oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Forward Procedure
28
oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Forward Procedure
29
The Forward Procedure Initialization: Iteration: Termination: Computational Complexity: O(N 2 T)
30
oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Another Version: The Backward Procedure (Suffix Probs) Probability of the rest of the states given the first state
31
Problem 2: Decoding Find the best state sequence
32
Decoding Given an HMM and a new sequence of observations, find the most probable sequence of hidden states that generated these observations: In general, there is an exponential number of possible sequences. Use dynamic programming to reduce search space to O(n 2 T).
33
oToT o1o1 otot o t-1 o t+1 Viterbi Algorithm The state sequence which maximizes the probability of seeing the observations up to time t-1, landing in state j, and seeing the observation at time t. x1x1 x t-1 j
34
oToT o1o1 otot o t-1 o t+1 Viterbi Algorithm Initialization x1x1 x t-1 j
35
oToT o1o1 otot o t-1 o t+1 Viterbi Algorithm Prob. of ML state x1x1 x t-1 xtxt x t+1 Recursion Name of ML state
36
oToT o1o1 otot o t-1 o t+1 Viterbi Algorithm “Read out” the most likely state sequence, working backwards. x1x1 x t-1 xtxt x t+1 xTxT Termination
37
Lecture 4, Thursday April 10, 2003 Viterbi Training Initialization:Same as Baum-Welch Iteration: Perform Viterbi, to find the optimal state sequence. Calculate P(i,j) and i(t) according to the optimal state sequence. Calculate the new parameters A, B and . Until convergence Notes: In general, worse performance than Baum-Welch
38
Problem 3: Learning Re-estimate the parameters of the model based on training data
39
Learning by Parameter Estimation: Goal : Given an observation sequence, find the model that is most likely to produce that sequence. Problem: We don’t know the relative frequencies of hidden visited states. No analytical solution is known for HMMs. We will approach the solution by successive approximations.
40
The Baum-Welch Algorithm Find the expected frequencies of possible values of the hidden variables. Compute the maximum likelihood distributions of the hidden variables (by normalizing, as usual for MLE). Repeat until “convergence.” This is the Expectation-Maximization (EM) algorithm for parameter estimation. Applicable to any stochastic process, in theory. Special case for HMMs is called the Baum-Welch algorithm.
41
oToT o1o1 otot o t-1 o t+1 Arc and State Probabilities A B AAA BBBB Probability of traversing an arc From state i (at time t) to state j (at time t+1) Probability of being in state i at time t.
42
oToT o1o1 otot o t-1 o t+1 Aggregation and Normalization A B AAA BBBB Now we can compute the new MLEs of the model parameters.
43
The Baum-Welch Algorithm 1.Initialize A,B and ( Pick the best-guess for model parameters or arbitrary) 2.Repeat 3.Calculate and 4.Calculate and 5.Estimate, and 6.Until the changes are small enough
44
The Baum-Welch Algorithm – Comments Time Complexity: # iterations O(N 2 T) Guaranteed to increase the (log) likelihood of the model P( | O) = P(O, ) / P(O) = P(O | ) P( ) / P(O) Not guaranteed to find globally best parameters Converges to local optimum, depending on initial conditions Too many parameters / too large model - Overtraining
45
Application : Automatic Speech Recognition
46
Example (1)
47
Example (2)
48
Example (3)
49
Example (4)
51
Phones
52
Speech Signal Waveform Spectrogram
53
Speech Signal cont. Articulation
54
Feature Extraction Frame 1 Frame 2 Feature Vector X 1 Feature Vector X 2
55
Simple Features Extraction - LPC Speech difference equation for a p-th order filter : Want to minimize the mean squared prediction error The a k ’s create the feature vector.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.