1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

Speech Recognition with Hidden Markov Models Winter 2011
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Introduction to Hidden Markov Models
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Models Fundamentals and applications to bioinformatics.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
INTRODUCTION TO Machine Learning 3rd Edition
Visual Recognition Tutorial
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Dynamic Time Warping Applications and Derivation
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS.
Speech Recognition with Hidden Markov Models Winter 2011
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Counting CSC-2259 Discrete Structures Konstantin Busch - LSU1.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Data Compression Meeting October 25, 2002 Arithmetic Coding.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CS Statistical Machine learning Lecture 24
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
1 Strings and Languages Lecture 2-3 Ref. Handout p12-17.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
CSC-2259 Discrete Structures
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Inside of SQL Server Indexes
Statistical Models for Automatic Speech Recognition
Computational NeuroEngineering Lab
Introduction to Programming in C
Statistical Models for Automatic Speech Recognition
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul Hosom Lecture 8 January 26 The Viterbi Search Algorithm

2 Framework for HMMs What is likelihood of an observation sequence and state sequence, given the model? P(O, q | ) = P(O | q, ) P(q | ) What is the “best” valid observation sequence from time 1 to time T, given the model? At every time t, there are N possible states  There are up to N T possible state sequences (for one second of speech with 3 states, N T = sequences) infeasible!! aaa ab acaaa aab aac aba abb abc aca acb acc bba bb bcbaa bab bac bba bbb bbc bca bcb bcc c ca cb cccaa cab cac cba cbb cbc cca ccb ccc

3 Viterbi Search: Formula Use inductive procedure Best sequence defined as: First iteration (t=1): Question 1: What is best score along a single path, up to time t, ending in state j?

4 Viterbi Search: Formula In general, for any value of t: change notation to say that we call state q t-1 by variable name “i” the first term now equals  t-1 (k)

5 Viterbi Search: Formula In general, for any value of t: (continued…) now make 1 st order Markov assumption, and assumption that p(o t ) depends only on current state j and the model : q 1 through q t-2 have been removed from the equation (implicit in  t-1 (k)):

6 Viterbi Search: Formula Keep in memory only  t-1 (j) for all j. For each time t and state j, need (N multiply and compare) + (1 multiply) For each time t, need N × ((N multiply and compare) + (1 multiply)) To find best path, need O( N 2 T ) operations. This is much better than N T possible paths, especially for large T! Viterbi, A. J. “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm.” IEEE Transactions on Information Theory 13 (2), Apr. 1967, pp. 260–269. Forney, G. D. “The Viterbi algorithm.” Proc. IEEE. 61 (3), Mar. 1973, pp.268–278.

7 Viterbi Search Project Second project: Given an existing HMM, implement a Viterbi search to find likelihood of utterance and best state sequence. “Template” code is available to read in features, read in HMM values, provide some context and a starting point. The features will be given to you are “real,” in that they are 7 PLP coefficients plus 7 delta values from utterances of “yes” and “no” sampled every 10 msec. Also given to you is the logAdd() function, but you must implement the multi-dimensional GMM code (see formula from Lecture 5, slides 29-30). Assume diagonal covariance matrix. All necessary files (template, HMM, speech data files) located on the class web site.

8 Viterbi Search Project “Search” files with HMMs for “yes” and “no”, and print out final likelihood scores and most likely state sequences: input1.txt hmm_yes.10 input1.txt hmm_no.10 input2.txt hmm_yes.10 input2.txt hmm_no.10 input3.txt hmm_yes.10 input3.txt hmm_no.10 Then, use results to perform ASR… (1) is input1.txt more likely to be “yes” or “no”? (2) is input2.txt more likely to be “yes” or “no”? (3) is input3.txt more likely to be “yes” or “no”? Due on February 14th; send your source code and results (including final scores and most likely state sequences) to hosom at cslu  ogi  edu; late responses generally not accepted.

9 Viterbi Search Project Assume that any state can follow any other state; this will greatly simplify the implementation. Also assume that this is a whole-word recognizer, and that each word is recognized with a separate execution of the program. This will greatly simplify the implementation Print out both the score for the utterance and the most likely state sequence from t=1 to T When you read in the HMM, it will say that there are 7 states. The first state and the last state are “NULL” states, in that they don't emit any observations. A NULL state is entered into and exited from at the same time frame. NULL states are used to simplify the implementation of connecting HMMs to form words and sentences.

10 Viterbi Search Project For this project, the NULL state at the beginning of the HMM is used to define the  values... in other words, at time zero, instead of using  as the probability of starting out in a given state, use the probability of transitioning from the first NULL state to the given state. If you want, you may constrain the Viterbi search so that it must end at time T in the last non-NULL state. Or, you may constrain the code so that there must be a transition from a non-NULL state into the final NULL state after time T. Or, you can ignore the final NULL state and select the best state at time T. Whichever option you choose, this needs to be implemented in the code, and can not be specified using  or transition probabilities. Other than that, feel free to ignore the NULL states, and in the main loop of the Viterbi search, only consider the middle 5 states.

11 Viterbi Search Project The transition probabilities are in the log domain. The mixture weights are NOT in the log domain. The means are NOT in the log domain. The covariance values are NOT in the log domain. These covariance values are the diagonal of a 14  14 matrix. The natural log (e) is used when computing log values. When I run my code on inputs 1, 2, and 3, I get values that are greater than and less than The correct answer does not always yield a positive value. The incorrect answer is always negative. Implement the GMM to be able to use multiple components, even though the HMMs you’ll be using have only one component.

12 Viterbi Search: Formula Example Given the following model of the weather: H M L state M state H state L P(sun) P(rain) (states indicate barometer pressure)  M = 0.50  H = 0.20  L =

13 Viterbi Search: Formula Example What is the best score (maximum probability) at time t=2 for state M? Step 1: what are  1 (j) for all j?  1 (M)= 0.50 · 0.50= 0.25  1 (H)= 0.20 · 0.75= 0.15  1 (L)= 0.30 · 0.25= Step 2: what is max( ) ending in state j=M? M 1,M 2 = 0.25 · 0.4 = 0.10 H 1,M 2 = 0.15 · 0.2 = 0.03 L 1,M 2 = · 0.3 = If the observation sequence is: s s r s r (s=sun,r=rain) max M =0.10

14 Step 3: what is  2 (M)?  2 (M)= 0.10 · 0.50 = 0.05 Answer: best score at time 2, ending in state M, is 0.05 Next question: what’s the best score at time t=2 for all states? Answer: compute best scores for all states at time t, take max score. Viterbi Search: Formula Example

15 Viterbi Search: Formula Example what is the best score at time t=2 over all states? Step 1: what are  1 (j) for all j?  1 (M)= 0.50 · 0.50= 0.25  1 (H)= 0.20 · 0.75= 0.15  1 (L)= 0.30 · 0.25= Step 2: what are max( )? M 1,M 2 = 0.10 M 1,H 2 = M 1,L 2 = H 1,M 2 = 0.03 H 1,H 2 = H 1,L 2 = L 1,M 2 = L 1,H 2 = L 1,L 2 = If the observation sequence is: s s r s r (s=sun,r=rain), max M =0.10 max H =0.105 max L =0.075

16 Step 3: what is  2 (j) for all j?  2 (M)= 0.10 · 0.50 = 0.05  2 (H)= · 0.75 =  2 (L)= · 0.25 = Step 4: take maximum score at time 2: max(  2 (i)) = Answer: best score at time 2 is , in state H Viterbi Search: Formula Example

17 Viterbi Search: Algorithm Question 2: What is best state sequence along a single path, up to time t? Need to keep track of (best path up to time t) for each time & state  t (j) = best state prior to state j at time t. (  =psi) Can use  t (j) to trace back, from time = T to 1, the best path.

18 Viterbi Search: Algorithm (1) Initialization: (2) Recursion:

19 (3) Termination: Viterbi Search: Algorithm (4) Backtracking: Note 1: Usually this algorithm is done in log domain, to avoid underflow errors. Note 2: This assumes that any state is a valid end-of-utterance state. If only some states are valid end-of-utterance states, then maximization occurs over only those states.

20 Viterbi Search: Algorithm Example what is the best state sequence at time t=2 over all states? Step 1: what are  1 (j),  1 (j) for all j?  1 (M)= 0.50 · 0.50= 0.25  1 (M) = 0  1 (H)= 0.20 · 0.75= 0.15  1 (H) = 0  1 (L)= 0.30 · 0.25=  1 (L) = 0 Step 2: what are M 1,M 2 = 0.05 M 1,H 2 = M 1,L 2 = H 1,M 2 = H 1,H 2 = H 1,L 2 = L 1,M 2 = L 1,H 2 = L 1,L 2 = If the observation sequence is: s s r s r (s=sun,r=rain),  2 (M)=0.05  2 (H)=  2 (L)=  2 (M) =M  2 (H) =H  2 (L) =M (can take maximum before multiplying by b j (o t ) to save time)

21 Step 3: what are P * and q * T ? P * = q * 2 = H Step 4: backtracking: q * 1 =  2 (q * 2 ) q * 1 =  2 (H) q * 1 = H Answer: best state sequence is H H Viterbi Search: Algorithm Example

22 NOTE: best state sequence up to t=2 is H H best state sequence at t=1, up to t=2, is H best state sequence at t=1, up to t=1, is M same time, different “best” sequences! “best” state sequence can change as t increases (that’s why we need to backtrack)  t (j) is the same for all times greater than t, but a ij may change the result of “max” operation at t+1 if a ij is same for all states, best state sequence does not require backtracking! Viterbi Search: Algorithm Example

23 Viterbi Search: Algorithm Example Given the following model of the weather: H M L state M state H state L P(sun) P(rain) (states indicate barometer pressure)  M = 0.33  H = 0.33  L =

24 If the observation sequence is: s s r s r (s=sun,r=rain), what is best state sequence? Omit initial, transition probabilities, since they’re all same Viterbi Search: Algorithm Example  1 (M)=0.5  1 (H)=0.75  1 (L)=0.25 (s)  2 (M)=0.75·0.5  2 (H)=0.75·0.75  2 (L)=0.75·0.25 (s)  3 (M)= ·0.5  3 (H)= ·0.25  3 (L)= ·0.75 (r)  4 (M)= ·0.5  4 (H)= ·0.75  4 (L)= ·0.25 (s)  5 (M)= ·0.5  5 (H)= ·0.25  5 (L)= ·0.75 (r) State M State H State L 12345

25 Viterbi Search: Algorithm Example Given the following model of the weather: H M L state M state H state L P(sun) P(rain) (states indicate barometer pressure)  M = 0.50  H = 0.20  L =

26 If the observation sequence is: s s r s r (s=sun,r=rain), what is best state sequence? Can’t omit initial, transition probabilities... Viterbi Search: Algorithm Example  1 (M)=0.5·0.5  1 (H)=0.2·0.75  1 (L)=0.3·0.25  2 (M)=0.25·0.4·0.5  2 (H)=0.15·0.7·0.75  2 (L)=0.25·0.3·0.25  3 (M)=0.05·0.4·0.5  3 (H)=0.08·0.7·0.25  3 (L)=0.05·0.3·0.75  4 (M)=0.010·0.4·0.5  4 (H)=0.014·0.7·0.75  4 (L)=0.011·0.6·0.25  5 (M)= ·0.4·0.5  5 (H)=0.007·0.7·0.25  5 (L)= ·0.6·0.75 State M State H State L 12345

27 The backtrace for  5 (L) is also interesting... Viterbi Search: Algorithm Example  1 (M)=0.5·0.5  1 (H)=0.2·0.75  1 (L)=0.3·0.25  2 (M)=0.25·0.4·0.5  2 (H)=0.15·0.7·0.75  2 (L)=0.25·0.3·0.25  3 (M)=0.05·0.4·0.5  3 (H)=0.08·0.7·0.25  3 (L)=0.05·0.3·0.75  4 (M)=0.010·0.4·0.5  4 (H)=0.014·0.7·0.75  4 (L)=0.011·0.6·0.25  5 (M)= ·0.4·0.5  5 (H)=0.007·0.7·  5 (L)= ·0.6·0.75 State M State H State L 12345

28 Viterbi Search: Speech Example Example: “hi” sil-h + ayh-ay + sil observed features: O = { } what is best state sequence for O, given ? A B

29 1. initialization:  1 (A)=1.0·0.76  1 (B)=0.0·0.28  1 (A)=0.76  1 (B)=0.0  1 (A)=0  1 (B)=0 2. recursion at time = 2:  2 (A)=0.76·0.3·0.45  2 (B)=0.76·0.7·0.38  2 (A)=  2 (B)=  2 (A)=A  2 (B)=A 3. recursion at time = 3:  3 (A)=0.10·0.3·0.51  3 (B)=0.20·1.0·0.51  3 (A)=  3 (B)=  3 (A)=A  3 (B)=B Viterbi Search: Speech Example

30 4. termination: P * = max[  3 (j)] = q * T = argmax[  3 (j)] = B 5. path backtracking: q * 2 =  3 (q * 3 ) =  3 (B) = B q * 1 =  2 (q * 2 ) =  2 (B) = A Answer: best state sequence is A B B Viterbi Search: Speech Example

31 Viterbi Search: Yet Another Example (part 1) P(A)=0.7 P(B)=0.3 P(A)=0.1 P(B)=  1 =0.5  2 = AABA t=0t=1t=2t=3t=4 0.5* * *0.4* *0.6* *0.2* *0.8* *0.4* *0.2* *0.6* *0.8* *0.4* *0.2* *0.6* *0.8*0.7 best state sequence = obs. =

32 Viterbi Search: Yet Another Example (part 2) P(A)=0.7 P(B)=0.3 P(A)=0.1 P(B)=  1 =0.5  2 = AABA t=0t=1t=2t=3t=4 0.5* * *0.9* *0.1* *0.2* *0.8* *0.9* *0.2* *0.1* *0.8* *0.9* *0.2* *0.1* *0.8*0.7 obs. = best state sequence =