Isolated-Word Speech Recognition Using Hidden Markov Models

Slides:



Advertisements
Similar presentations
Speech Recognition with Hidden Markov Models Winter 2011
Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Angelo Dalli Department of Intelligent Computing Systems
Supervised Learning Recap
Automatic Speech Recognition II  Hidden Markov Models  Neural Network.
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Introduction to Hidden Markov Models
Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models in NLP
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
On Recognizing Music Using HMM Following the path craved by Speech Recognition Pioneers.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Timothy and RahulE6886 Project1 Statistically Recognize Faces Based on Hidden Markov Models Presented by Timothy Hsiao-Yi Chin Rahul Mody.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Dynamic Time Warping Applications and Derivation
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
7-Speech Recognition Speech Recognition Concepts
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
HMM - Part 2 The EM algorithm Continuous density HMM.
CS Statistical Machine learning Lecture 24
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Classification of melody by composer using hidden Markov models Greg Eustace MUMT 614: Music Information Acquisition, Preservation, and Retrieval.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Chapter 3: Maximum-Likelihood Parameter Estimation
Deep Feedforward Networks
Hidden Markov Models.
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Statistical Models for Automatic Speech Recognition
Hidden Markov Models Part 2: Algorithms
Statistical Models for Automatic Speech Recognition
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
CONTEXT DEPENDENT CLASSIFICATION
EE513 Audio Signals and Systems
LECTURE 15: REESTIMATION, EM AND MIXTURES
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Isolated-Word Speech Recognition Using Hidden Markov Models 6.962 Week 10 Presentation Irina Medvedev Massachusetts Institute of Technology April 19, 2001

Outline Markov Processes, Chains, Models Isolated-Word Speech Recognition ¤ Feature Analysis ¤ Unit Matching ¤ Training ¤ Recognition Conclusions

Markov Process The process, x(t), is first-order Markov if, for any set of ordered times, , The current value of a Markov process depends all of the memory necessary to predict the future The past does not add any additional information about the future

Markov Process first-order density : transition density : The transition probability density provides an important statistical description of a Markov process and is defined as A complete specification of a Markov process consists of first-order density : transition density :

Fully Connected Markov Model Markov Chains A Markov chain can be used to describe a system which, at any time, belongs to one of N distinct states, At regularly spaced times, the system may stay in the same state or transition to a different state State at time t is denoted by qt Fully Connected Markov Model

Markov Chains The state transition probabilities are made according to a set of probabilities associated with each state These probabilities are stored in the state transition matrix where N is the number of states in the Markov chain. The state transition probabilities are and have the properties and

Hidden Markov Models Hidden Markov Models (HMMs) are used when the states are not observable events. Instead, the observation is a probabilistic function of the state rather than the state itself The states are described by a probability model The HMM is a doubly embedded stochastic process

HMM Example: Coin Toss How do we build an HMM to explain the observed sequence of head and tails? Choose a 2-state model Several possibilities exist 1-coin Model: Observable 2-coin Model: States are Hidden

Hidden Markov Models Hidden Markov Models are characterized by N, the number of states in the model A, the state transition matrix Observation probability distribution state Initial state distribution, Model Parameter Set:

Left-Right HMM Can only transition to a higher state or stay in the same state No-skip constraint allows states to transition only to the next state or remain the same state Zeros in the state transition matrix represent illegal state transitions 4-state left-right HMM with no skip transitions

Isolated-Word Speech Recognition Recognize one word at a time Assume incoming signal is of the form: silence – speech – silence Feature Analysis • Training Unit Matching • Recognition

Feature Analysis We perform feature analysis to extract observation vectors upon which all processing will be performed The discrete-time speech signal is with discrete Fourier transform To reduce the dimensionality of the V-dim speech vector, we use cepstral coefficients, which serve as the feature observation vector for all future processing

Cepstral Coefficients Feature vectors are cepstral coefficients obtained from the sampled speech vector where is the periodogram estimate of the power spectral density of the speech We eliminate the zeroth component and keep cepstral coefficients 1 through L-1 Dimensionality reduction =

Properties of Cepstral Coefficients Serve to undo the convolution between the pitch and the vocal tract High-order cepstral components carry speaker dependent pitch information, which is not relevant for speech recognition Cepstral coefficients are well approximated by a Gaussian probability density function (pdf) Correlation values of cepstral coefficients are very low

Modeling of Cepstral Coefficients HMM assumes that the Markovian states generate the cepstral vectors Each state represents a Gaussian source with mean vector and covariance matrix Each feature vector of cepstral coefficients can be modeled as a sample vector of an L-dim Gaussian random vector with mean vector and diagonal covariance matrix

Formulation of the Feature Vectors

Unit Matching Initial Goal: obtain an HMM for each speech recognition unit Large vocabulary (300 words): recognition units are phonemes Small-vocabulary (10 words): recognition units are words We will consider an isolated-word speech recognition system for a small vocabulary of M words

Notation Observation vector is , where each is a cepstral feature vector and is the number of feature vectors in an observation State Sequence is , where each State index Word index Time index The term model will be used for both the HMM and the parameter set describing the HMM,

Training We need to obtain an HMM for each of the M words The process of building the HMMs is called training Each HMM is characterized by the number of states, N, and the model parameter set, Each cepstral feature vector, , in state, , can be modeled by an L-dim Gaussian pdf where is the mean vector and is the covariance matrix in state

Training A Gaussian pdf is completely characterized by the mean vector and covariance matrix The model parameter set can be modified to The training procedure is the same for each word. For convenience, we will drop the subscript from

Building the HMM To build the HMM, we need to determine the parameter set that maximizes the likelihood of the observation for that word. Objective: The double maximization can be performed by optimizing over the state sequence and the model individually

Uniform Segmentation Determining the initial state sequence 50 segments  8 states

Maximization over the Model Given the initial state sequence, we maximize over the model The maximization entails estimating the model parameters from the observation given the state sequence Estimation is performed using the Baum-Welch re-estimation formulas

Re-estimation Formulas Initial state distribution Covariance matrix per state Mean vector per state State transition matrix is the number of feature vectors in state

Model Estimation

Maximization over the state sequence Given the model, we maximize over the state sequence The probability expression can be rewritten as

Maximization over the state sequence Applying the logarithm transforms the maximization of a product into a maximization of a sum We are still looking for the state sequence that maximizes the expression The optimal state sequence can be determined using the Viterbi algorithm

Trellis Structure of HMMs Redrawing the HMM as a trellis makes it easy to see the state sequence as a path through the trellis The optimal state sequence is determined by the Viterbi algorithm as the single best path that maximizes

State Sequence Segmentation Training Procedure Cepstral Calculation Uniform Segmentation Estimation of (Baum-Welch) State Sequence Segmentation (Viterbi) No Converged? Yes

Recognition We have a set of HMMs, one for each word Objective: Choose the word model that maximizes the probability of the observation given the model (Maximum Likelihood detection rule) Classifier for observation is The likelihood can be written as a summation over all state sequences

Recognition Replace the full likelihood by an approximation that takes into account only the most probable state sequence capable of producing the observation Treating the most probable state sequence as the best path in the HMM trellis allows us to use the Viterbi algorithm to maximize the above probability The best-path classifier for observation is

Index of recognized word Recognition Cepstral Calculation Select Maximum Index of recognized word

Conclusion Introduced hidden Markov models Described process of isolated-word speech recognition ¤ Feature vectors ¤ Unit matching ¤ Unit matching Training ¤ Recognition Other considerations ¤ Artificial Neural Networks (ANNs) for speech recognition ¤ Hybrid HMM/ANN models ¤ Minimum classification error HMM design