Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

Angelo Dalli Department of Intelligent Computing Systems
Learning HMM parameters
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Hidden Markov Models By Marc Sobel. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Modeling.
Automatic Speech Recognition II  Hidden Markov Models  Neural Network.
Hidden Markov Model 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction – Markov Chain – Hidden Markov Model (HMM) Formal Definition of HMM & Problems Estimate.
Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.
Introduction to Hidden Markov Models
Tutorial on Hidden Markov Models.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models in NLP
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Apaydin slides with a several modifications and additions by Christoph Eick.
INTRODUCTION TO Machine Learning 3rd Edition
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Ch 13. Sequential Data (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Kim Jin-young Biointelligence Laboratory, Seoul.
1 Probabilistic Reasoning Over Time (Especially for HMM and Kalman filter ) December 1 th, 2004 SeongHun Lee InHo Park Yang Ming.
Timothy and RahulE6886 Project1 Statistically Recognize Faces Based on Hidden Markov Models Presented by Timothy Hsiao-Yi Chin Rahul Mody.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
CS Statistical Machine learning Lecture 24
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Pattern Recognition and Machine Learning-Chapter 13: Sequential Data
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION DIGITAL SPEECH PROCESSING HOMEWORK #1 DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION Date: Oct, Revised.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
MACHINE LEARNING 16. HMM. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Modeling dependencies.
Hidden Markov Models BMI/CS 576
Hidden Markov Models Part 2: Algorithms
LECTURE 15: REESTIMATION, EM AND MIXTURES
Hidden Markov Models By Manish Shrivastava.
CSCI 5582 Artificial Intelligence
Presentation transcript:

Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.

Page 2 Marquette University Overview  Intro: The problem with sequential data  Markov chains  Hidden Markov Models  Key HMM algorithms  Evaluation  Alignment  Training / parameter estimation  Examples / applications

Big Picture View of Statistical Models Basic Gaussian HMMs

Page 4 Marquette University Nonstationary sequential data

Page 5 Marquette University Historical Method: Dynamic Time Warping  DTW is a dynamic path search versus template  Can solve using Dynamic Programming

Page 6 Marquette University Alternative: Sequential modeling Use a Markov Chain (state machine) S1S2S3 Data State Distribution Models State Machine

Page 7 Marquette University Markov Chains (discrete-time & state)  A Markov chain is a discrete-time discrete-state Markov Process. The likelihood of the current RV going to any new state is determined solely by the current state, called a transition probability  Note: since transition probabilities are fixed, there is also a time-invariance assumption. (Also false of course, but useful)

Page 8 Marquette University Graphical representation  Markov chain parameters include  Transition probability values a ij  Initial state probabilities  1  2  3 a 11 a 22 a 33 a 13 a 23 a 12 S1S2S3 a 21 a 31 a 32

Page 9 Marquette University Example: Weather Patterns  Probability of Rain, Clouds, or Sunshine modeled as a Markov chain: A = Note: A matrix of this form (square, row sum=1) is called a stochastic matrix.

Page 10 Marquette University Two-step probabilities If it’s raining today, what’s the probability of it raining two days from now?  Need two-step probabilities. Answer = 0.7* * *0.1 =.58  Can also get these directly from A 2 : A 2 =

Page 11 Marquette University Steady-state  The N-step probabilities can be gotten from A N, so A is sufficient to determine the likelihoods of all possible sequences.  What’s the limiting case? Does it matter if it was raining 1000 days ago? A 1000 =

Page 12 Marquette University Probability of state sequence  The probability of any state sequence is given by:  Training: Learn the transition probabilities by keeping count of the state sequences in the training data.

Page 13 Marquette University Weather classification  Using a Markov chain for classification:  Train one Markov chain model for each class ex: A weather transition matrix for each city; Milwaukee, Phoenix, and Miami  Given a sequence of state observations, identify which is the most likely city by choosing the model that gives the highest overall probability.

Page 14 Marquette University Hidden states & HMMs  What if you can’t directly observe states?  But… there are measures/observations that relate to the probability of different states.  States hidden from view = Hidden Markov Model.

Page 15 Marquette University General Case HMM s i : state i a ij : P(s i  s j ) o t : output at time t b j (o t ) : P (o t | s j ) Initial:  1  2  3 b 1 (o t )b 2 (o t ) b 3 (o t )b 4 (o t )

Page 16 Marquette University Weather HMM  Extend Weather Markov Chain to HMM’s  Can’t see if it’s raining, cloudy, or sunny.  But, we can make some observations:  Humidity H  Temperature T  Pressure P  How do we calculate …  Probability of an observation sequence under a model  How do we learn …  State transition probabilities for unseen states  Observation probabilities in each state

Page 17 Marquette University Observation models  How do we characterize these observations?  Discrete/categorical observations: Learn probability mass function directly.  Continuous observations: Assume a parametric model.  Our Example: Assume a Gaussian distribution  Need to estimate the mean and variance of the humidity, temperature and pressure for each state (9 means and 9 variances, for each city model)

Page 18 Marquette University HMM classification  Using a HMM for classification:  Training: One HMM for each class  Transition matrix plus state means and variances (27 parameters) for each city  Classification: Given a sequence of observations:  Evaluate P(O|model) for each city (Much harder to compute for HMM than for Markov Chain)  Choose the model that gives the highest overall probability.

Page 19 Marquette University Using for Speech Recognition a 22 a 33 a 44 a 24 a 34 a 23 a 12 a 45 a 35 a 13 S1 S2S3S4S5 Start StateEnd State b 2 ()b 3 ()b 4 () States represent beginning, middle, end of a phoneme Gaussian Mixture Model in each state

Page 20 Marquette University Fundamental HMM Computations  Evaluation: Given a model and an observation sequence O = (o 1, o 2, …, o T ), compute P(O | ).  Alignment: Given and O, compute the ‘correct’ state sequence S = (s 1, s 2, …, s T ), such as S = argmax S { P (S |O, ) }.  Training: Given a group of observation sequences, find an estimate of, such as ML = argmax { P (O | ) }.

Page 21 Marquette University Evaluation: Forward/Backward algorithm  Define  i (t) = P(o 1 o 2..o t, s t =i | )  Define  i (t) = P(o t+1 o t+2..o T | s t =i, ) Each of these can be implemented efficiently via dynamic programming recursions starting at t=1 (for  ) and t=T (for  ). By putting the forward & backward together:

Page 22 Marquette University Forward Recursion 1.Initialization 2.Recursion 3.Termination

Page 23 Marquette University Backward recursion 1.Initialization 2.Recursion 3.Termination

Page 24 Marquette University Note: Computation improvement  Direct computation: P(O | ) = the sum of the observation probabilities for all possible state sequences = N T. Time complexity = O(T N T )  F/B algorithm: For each state at each time step do a maximization over all state values from the previous time step: Time Complexity = O(T N 2 )

Page 25 Marquette University From  i (t) and  i (t) : One-State Occupancy probability Two-state Occupancy probability

Page 26 Marquette University Alignment: Viterbi algorithm To find single most likely state sequence S, use Viterbi dynamic programming algorithm: 1.Initialization: 2.Recursion: 3.Termination:

Page 27 Marquette University Training We need to learn the parameters of the model, given the training data. Possibilities include:  Maximum a Priori (MAP)  Maximum Likelihood (ML)  Minimum Error Rate

Page 28 Marquette University Expectation Maximization Expectation Maximization(EM) can be used for ML estimation of parameters in the presence of hidden variables. Basic iterative process: 1.Compute the state sequence likelihoods given current parameters 2.Estimate new parameter values given the state sequence likelihoods.

Page 29 Marquette University EM Training: Baum-Welch for Discrete Observations (e.g. VQ coded) Basic Idea: Using current and F/B equations, compute state occupation probabilities. Then, compute new values:

Page 30 Marquette University  Update equations for Gaussian distributions:  GMMs are similar, but need to incorporate mixture likelihoods as well as state likelihoods

Page 31 Marquette University Toy example: Genie and the urns  There are N urns in a nearby room; each contains many balls of M different colors.  A genie picks out a sequence of balls from the urns and shows you the result. Can you determine the sequence of urns they came from?  Model as HMM: N states, M outputs  probabilities of picking from an urn are state transitions  number of different colored balls in each urn makes up the probability mass function for each state.

Page 32 Marquette University Working out the Genie example  There are three baskets of colored balls  Basket one: 10 blue and 10 red  Basket two: 15 green, 5 blue, and 5 red  Basket three: 10 green and 10 red  The genie chooses from baskets at random  25% chance of picking from basket one or two  50% chance of picking from basket three

Page 33 Marquette University Genie Example Diagram

Page 34 Marquette University Two Questions  Assume that the genie reports a sequence of two balls as {blue, red}.  Answer two questions:  What is the probability that a two ball sequence will be {blue, red}?  What is the most likely sequence of baskets to produce the sequence {blue, red}?

Page 35 Marquette University Probability of {blue, red} for Specific Basket Sequence

Page 36 Marquette University Probability of {blue,red} What is the total probability of {blue,red}?  Sum(matrix values)= What is the most likely sequence of baskets visited?  Argmax(matrix values) = {Basket 1, Basket 3}  Corresponding max likelihood =

Page 37 Marquette University Viterbi method Best path ends in state 3, coming previously from state 1.

Page 38 Marquette University a 22 a 33 a 44 a 24 a 34 a 23 a 12 a 45 a 35 a 13 S1 S2S3S4S5 Start StateEnd State a 22 a 33 a 44 a 24 a 34 a 23 a 12 a 45 a 35 a 13 S1 S2S3S4S5 Start StateEnd State Composite Models  Training data is at sentence level, generally not annotated at sub-word (HMM model) level.  Need to be able to form composite models from a sequence of word or phoneme labels.

Page 39 Marquette University Viterbi and Token Passing fd b a c d e c... Recognition Network Best Sentence bccd... Word Graph Viterbi Token Passing

Page 40 Marquette University HMM Notation Discrete HMM Case:

Page 41 Marquette University Continuous HMM Case:

Page 42 Marquette University Multi-mixture, multi-observation case: