1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul Hosom Lecture 8 January 26 The Viterbi Search Algorithm
2 Framework for HMMs What is likelihood of an observation sequence and state sequence, given the model? P(O, q | ) = P(O | q, ) P(q | ) What is the “best” valid observation sequence from time 1 to time T, given the model? At every time t, there are N possible states There are up to N T possible state sequences (for one second of speech with 3 states, N T = sequences) infeasible!! aaa ab acaaa aab aac aba abb abc aca acb acc bba bb bcbaa bab bac bba bbb bbc bca bcb bcc c ca cb cccaa cab cac cba cbb cbc cca ccb ccc
3 Viterbi Search: Formula Use inductive procedure Best sequence defined as: First iteration (t=1): Question 1: What is best score along a single path, up to time t, ending in state j?
4 Viterbi Search: Formula In general, for any value of t: change notation to say that we call state q t-1 by variable name “i” the first term now equals t-1 (k)
5 Viterbi Search: Formula In general, for any value of t: (continued…) now make 1 st order Markov assumption, and assumption that p(o t ) depends only on current state j and the model : q 1 through q t-2 have been removed from the equation (implicit in t-1 (k)):
6 Viterbi Search: Formula Keep in memory only t-1 (j) for all j. For each time t and state j, need (N multiply and compare) + (1 multiply) For each time t, need N × ((N multiply and compare) + (1 multiply)) To find best path, need O( N 2 T ) operations. This is much better than N T possible paths, especially for large T! Viterbi, A. J. “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm.” IEEE Transactions on Information Theory 13 (2), Apr. 1967, pp. 260–269. Forney, G. D. “The Viterbi algorithm.” Proc. IEEE. 61 (3), Mar. 1973, pp.268–278.
7 Viterbi Search Project Second project: Given an existing HMM, implement a Viterbi search to find likelihood of utterance and best state sequence. “Template” code is available to read in features, read in HMM values, provide some context and a starting point. The features will be given to you are “real,” in that they are 7 PLP coefficients plus 7 delta values from utterances of “yes” and “no” sampled every 10 msec. Also given to you is the logAdd() function, but you must implement the multi-dimensional GMM code (see formula from Lecture 5, slides 29-30). Assume diagonal covariance matrix. All necessary files (template, HMM, speech data files) located on the class web site.
8 Viterbi Search Project “Search” files with HMMs for “yes” and “no”, and print out final likelihood scores and most likely state sequences: input1.txt hmm_yes.10 input1.txt hmm_no.10 input2.txt hmm_yes.10 input2.txt hmm_no.10 input3.txt hmm_yes.10 input3.txt hmm_no.10 Then, use results to perform ASR… (1) is input1.txt more likely to be “yes” or “no”? (2) is input2.txt more likely to be “yes” or “no”? (3) is input3.txt more likely to be “yes” or “no”? Due on February 14th; send your source code and results (including final scores and most likely state sequences) to hosom at cslu ogi edu; late responses generally not accepted.
9 Viterbi Search Project Assume that any state can follow any other state; this will greatly simplify the implementation. Also assume that this is a whole-word recognizer, and that each word is recognized with a separate execution of the program. This will greatly simplify the implementation Print out both the score for the utterance and the most likely state sequence from t=1 to T When you read in the HMM, it will say that there are 7 states. The first state and the last state are “NULL” states, in that they don't emit any observations. A NULL state is entered into and exited from at the same time frame. NULL states are used to simplify the implementation of connecting HMMs to form words and sentences.
10 Viterbi Search Project For this project, the NULL state at the beginning of the HMM is used to define the values... in other words, at time zero, instead of using as the probability of starting out in a given state, use the probability of transitioning from the first NULL state to the given state. If you want, you may constrain the Viterbi search so that it must end at time T in the last non-NULL state. Or, you may constrain the code so that there must be a transition from a non-NULL state into the final NULL state after time T. Or, you can ignore the final NULL state and select the best state at time T. Whichever option you choose, this needs to be implemented in the code, and can not be specified using or transition probabilities. Other than that, feel free to ignore the NULL states, and in the main loop of the Viterbi search, only consider the middle 5 states.
11 Viterbi Search Project The transition probabilities are in the log domain. The mixture weights are NOT in the log domain. The means are NOT in the log domain. The covariance values are NOT in the log domain. These covariance values are the diagonal of a 14 14 matrix. The natural log (e) is used when computing log values. When I run my code on inputs 1, 2, and 3, I get values that are greater than and less than The correct answer does not always yield a positive value. The incorrect answer is always negative. Implement the GMM to be able to use multiple components, even though the HMMs you’ll be using have only one component.
12 Viterbi Search: Formula Example Given the following model of the weather: H M L state M state H state L P(sun) P(rain) (states indicate barometer pressure) M = 0.50 H = 0.20 L =
13 Viterbi Search: Formula Example What is the best score (maximum probability) at time t=2 for state M? Step 1: what are 1 (j) for all j? 1 (M)= 0.50 · 0.50= 0.25 1 (H)= 0.20 · 0.75= 0.15 1 (L)= 0.30 · 0.25= Step 2: what is max( ) ending in state j=M? M 1,M 2 = 0.25 · 0.4 = 0.10 H 1,M 2 = 0.15 · 0.2 = 0.03 L 1,M 2 = · 0.3 = If the observation sequence is: s s r s r (s=sun,r=rain) max M =0.10
14 Step 3: what is 2 (M)? 2 (M)= 0.10 · 0.50 = 0.05 Answer: best score at time 2, ending in state M, is 0.05 Next question: what’s the best score at time t=2 for all states? Answer: compute best scores for all states at time t, take max score. Viterbi Search: Formula Example
15 Viterbi Search: Formula Example what is the best score at time t=2 over all states? Step 1: what are 1 (j) for all j? 1 (M)= 0.50 · 0.50= 0.25 1 (H)= 0.20 · 0.75= 0.15 1 (L)= 0.30 · 0.25= Step 2: what are max( )? M 1,M 2 = 0.10 M 1,H 2 = M 1,L 2 = H 1,M 2 = 0.03 H 1,H 2 = H 1,L 2 = L 1,M 2 = L 1,H 2 = L 1,L 2 = If the observation sequence is: s s r s r (s=sun,r=rain), max M =0.10 max H =0.105 max L =0.075
16 Step 3: what is 2 (j) for all j? 2 (M)= 0.10 · 0.50 = 0.05 2 (H)= · 0.75 = 2 (L)= · 0.25 = Step 4: take maximum score at time 2: max( 2 (i)) = Answer: best score at time 2 is , in state H Viterbi Search: Formula Example
17 Viterbi Search: Algorithm Question 2: What is best state sequence along a single path, up to time t? Need to keep track of (best path up to time t) for each time & state t (j) = best state prior to state j at time t. ( =psi) Can use t (j) to trace back, from time = T to 1, the best path.
18 Viterbi Search: Algorithm (1) Initialization: (2) Recursion:
19 (3) Termination: Viterbi Search: Algorithm (4) Backtracking: Note 1: Usually this algorithm is done in log domain, to avoid underflow errors. Note 2: This assumes that any state is a valid end-of-utterance state. If only some states are valid end-of-utterance states, then maximization occurs over only those states.
20 Viterbi Search: Algorithm Example what is the best state sequence at time t=2 over all states? Step 1: what are 1 (j), 1 (j) for all j? 1 (M)= 0.50 · 0.50= 0.25 1 (M) = 0 1 (H)= 0.20 · 0.75= 0.15 1 (H) = 0 1 (L)= 0.30 · 0.25= 1 (L) = 0 Step 2: what are M 1,M 2 = 0.05 M 1,H 2 = M 1,L 2 = H 1,M 2 = H 1,H 2 = H 1,L 2 = L 1,M 2 = L 1,H 2 = L 1,L 2 = If the observation sequence is: s s r s r (s=sun,r=rain), 2 (M)=0.05 2 (H)= 2 (L)= 2 (M) =M 2 (H) =H 2 (L) =M (can take maximum before multiplying by b j (o t ) to save time)
21 Step 3: what are P * and q * T ? P * = q * 2 = H Step 4: backtracking: q * 1 = 2 (q * 2 ) q * 1 = 2 (H) q * 1 = H Answer: best state sequence is H H Viterbi Search: Algorithm Example
22 NOTE: best state sequence up to t=2 is H H best state sequence at t=1, up to t=2, is H best state sequence at t=1, up to t=1, is M same time, different “best” sequences! “best” state sequence can change as t increases (that’s why we need to backtrack) t (j) is the same for all times greater than t, but a ij may change the result of “max” operation at t+1 if a ij is same for all states, best state sequence does not require backtracking! Viterbi Search: Algorithm Example
23 Viterbi Search: Algorithm Example Given the following model of the weather: H M L state M state H state L P(sun) P(rain) (states indicate barometer pressure) M = 0.33 H = 0.33 L =
24 If the observation sequence is: s s r s r (s=sun,r=rain), what is best state sequence? Omit initial, transition probabilities, since they’re all same Viterbi Search: Algorithm Example 1 (M)=0.5 1 (H)=0.75 1 (L)=0.25 (s) 2 (M)=0.75·0.5 2 (H)=0.75·0.75 2 (L)=0.75·0.25 (s) 3 (M)= ·0.5 3 (H)= ·0.25 3 (L)= ·0.75 (r) 4 (M)= ·0.5 4 (H)= ·0.75 4 (L)= ·0.25 (s) 5 (M)= ·0.5 5 (H)= ·0.25 5 (L)= ·0.75 (r) State M State H State L 12345
25 Viterbi Search: Algorithm Example Given the following model of the weather: H M L state M state H state L P(sun) P(rain) (states indicate barometer pressure) M = 0.50 H = 0.20 L =
26 If the observation sequence is: s s r s r (s=sun,r=rain), what is best state sequence? Can’t omit initial, transition probabilities... Viterbi Search: Algorithm Example 1 (M)=0.5·0.5 1 (H)=0.2·0.75 1 (L)=0.3·0.25 2 (M)=0.25·0.4·0.5 2 (H)=0.15·0.7·0.75 2 (L)=0.25·0.3·0.25 3 (M)=0.05·0.4·0.5 3 (H)=0.08·0.7·0.25 3 (L)=0.05·0.3·0.75 4 (M)=0.010·0.4·0.5 4 (H)=0.014·0.7·0.75 4 (L)=0.011·0.6·0.25 5 (M)= ·0.4·0.5 5 (H)=0.007·0.7·0.25 5 (L)= ·0.6·0.75 State M State H State L 12345
27 The backtrace for 5 (L) is also interesting... Viterbi Search: Algorithm Example 1 (M)=0.5·0.5 1 (H)=0.2·0.75 1 (L)=0.3·0.25 2 (M)=0.25·0.4·0.5 2 (H)=0.15·0.7·0.75 2 (L)=0.25·0.3·0.25 3 (M)=0.05·0.4·0.5 3 (H)=0.08·0.7·0.25 3 (L)=0.05·0.3·0.75 4 (M)=0.010·0.4·0.5 4 (H)=0.014·0.7·0.75 4 (L)=0.011·0.6·0.25 5 (M)= ·0.4·0.5 5 (H)=0.007·0.7· 5 (L)= ·0.6·0.75 State M State H State L 12345
28 Viterbi Search: Speech Example Example: “hi” sil-h + ayh-ay + sil observed features: O = { } what is best state sequence for O, given ? A B
29 1. initialization: 1 (A)=1.0·0.76 1 (B)=0.0·0.28 1 (A)=0.76 1 (B)=0.0 1 (A)=0 1 (B)=0 2. recursion at time = 2: 2 (A)=0.76·0.3·0.45 2 (B)=0.76·0.7·0.38 2 (A)= 2 (B)= 2 (A)=A 2 (B)=A 3. recursion at time = 3: 3 (A)=0.10·0.3·0.51 3 (B)=0.20·1.0·0.51 3 (A)= 3 (B)= 3 (A)=A 3 (B)=B Viterbi Search: Speech Example
30 4. termination: P * = max[ 3 (j)] = q * T = argmax[ 3 (j)] = B 5. path backtracking: q * 2 = 3 (q * 3 ) = 3 (B) = B q * 1 = 2 (q * 2 ) = 2 (B) = A Answer: best state sequence is A B B Viterbi Search: Speech Example
31 Viterbi Search: Yet Another Example (part 1) P(A)=0.7 P(B)=0.3 P(A)=0.1 P(B)= 1 =0.5 2 = AABA t=0t=1t=2t=3t=4 0.5* * *0.4* *0.6* *0.2* *0.8* *0.4* *0.2* *0.6* *0.8* *0.4* *0.2* *0.6* *0.8*0.7 best state sequence = obs. =
32 Viterbi Search: Yet Another Example (part 2) P(A)=0.7 P(B)=0.3 P(A)=0.1 P(B)= 1 =0.5 2 = AABA t=0t=1t=2t=3t=4 0.5* * *0.9* *0.1* *0.2* *0.8* *0.9* *0.2* *0.1* *0.8* *0.9* *0.2* *0.1* *0.8*0.7 obs. = best state sequence =