Extended Baum-Welch algorithm

Slides:



Advertisements
Similar presentations
Hadi Goudarzi and Massoud Pedram
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Extended Baum-Welch algorithm Present by shih-hung Liu
Hidden Markov Models Eine Einführung.
Present by: Fang-Hui Chu A Survey of Large Margin Hidden Markov Model Xinwei Li, Hui Jiang York University.
Optimization in Engineering Design 1 Lagrange Multipliers.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Expectation-Maximization
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Isolated-Word Speech Recognition Using Hidden Markov Models
1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
7-Speech Recognition Speech Recognition Concepts
Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.
Chapter 6 Linear Programming: The Simplex Method
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Solving Linear Programming Problems: The Simplex Method
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Discriminative Training and Acoustic Modeling for Automatic Speech Recognition - Chap. 4 Discriminative Training Wolfgang Macherey Von der Fakult¨at f¨ur.
Chapter 6 Linear Programming: The Simplex Method Section 4 Maximization and Minimization with Problem Constraints.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
CS Statistical Machine learning Lecture 24
Mathe III Lecture 7 Mathe III Lecture 7. 2 Second Order Differential Equations The simplest possible equation of this type is:
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
ECE 8443 – Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional Likelihood Mutual Information Estimation (CMLE) Maximum MI Estimation.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Hidden Markov Models BMI/CS 576
Learning, Uncertainty, and Information: Learning Parameters
Chapter 7. Classification and Prediction
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Ch 2.1: Linear Equations; Method of Integrating Factors
Hidden Markov Models - Training
Computational NeuroEngineering Lab
Expectation-Maximization
Chap 9. General LP problems: Duality and Infeasibility
Hidden Markov Models Part 2: Algorithms
Hidden Markov Model LR Rabiner
Chapter 5. The Duality Theorem
Support Vector Machines
Algorithms of POS Tagging
LECTURE 15: REESTIMATION, EM AND MIXTURES
Discriminative Training Approaches for Continuous Speech Recognition
Introduction to HMM (cont)
Hidden Markov Models By Manish Shrivastava.
Simplex method (algebraic interpretation)
Qiang Huo(*) and Chorkin Chan(**)
Chapter 4 . Trajectory planning and Inverse kinematics
The Improved Iterative Scaling Algorithm: A gentle Introduction
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Extended Baum-Welch algorithm Present by shih-hung Liu 20060121

References A generalization of the Baum algorithm to rational objective function-[Gopalakrishnan et al.] IEEE ICASP 1989 An inequality for rational function with applications to some statistical estimation problems [Gopalakrishnan et al.] - IEEE Transactions on Information Theory 1991 HMMs, MMIE, and the Speech Recognition problem -[Normandin 1991] PhD dissertation Function maximization- [Povey 2004] PhD thesis chapter 4.5 NTNU Speech Lab.

Outline Introduction Extended Baum-Welch algorithm [Gopalakrishnan et al.] EBW from discrete to continuous [Normandin] EBW for discrete [Povey] Example of function optimization [Gopalakrishnan et al.] Conclusion NTNU Speech Lab.

Introduction The well-known Baum-Eagon inequality provides an effective iterative scheme for finding a local maximum for homogeneous polynomials with positive coefficients over a domain of probability values However, we are interesting in maximizing a general rational function. We extend the Baum-Eagon inequality to rational function NTNU Speech Lab.

Extended Baum-Welch algorithm (1/6) [Gopalakrishnan 1989] an arbitrary homogeneous polynomial with nonnegative coefficient of degree d in variables Assuming that this polynomial is defined over a domain of probability values, they show how to construct a transformation for some such that following the property: property A : for any and , unless NTNU Speech Lab.

Extended Baum-Welch algorithm (2/6) [Gopalakrishnan 1989] is a ratio of two polynomials in variables defined over a domain we are looking for a growth transformation such that for any and , unless A reduction of the case of rational function to polynomial we reduce the problem of finding a growth transformation for a rational function to of finding that for a specially formed polynomial reduce to Non-homogeneous polynomial with nonnegative Extend Baum-Eagon inequality to Non-homogeneous polynomial with nonnegative NTNU Speech Lab.

Extended Baum-Welch algorithm (3/6) [Gopalakrishnan 1989] Step1: NTNU Speech Lab.

Extended Baum-Welch algorithm (4/6) [Gopalakrishnan 1989] Step2: NTNU Speech Lab.

Extended Baum-Welch algorithm (5/6) [Gopalakrishnan 1989] Step3: finding a growth transformation for a polynomial with nonnegative coefficients can be reduce to the same problem for a homogeneous polynomial with nonnegative coefficients 1 NTNU Speech Lab.

Extended Baum-Welch algorithm (6/6) [Gopalakrishnan 1989] Baum-Eagon inequality: NTNU Speech Lab.

EBW for CDHMM – from discrete to continuous (1/3) [ Normandin 1991 ] Discrete case for emission probability update NTNU Speech Lab.

EBW for CDHMM – from discrete to continuous (2/3) [ Normandin 1991 ] j M subintervals Ik of width NTNU Speech Lab.

EBW for CDHMM – from discrete to continuous (3/3) [ Normandin 1991 ] EBW NTNU Speech Lab.

EBW for discrete HMMs (1/6) [Povey 2004] The Baum-Eagon inequality is formulated for the case where there are variables in a matrix containing rows with a sum-to-one constraint , and we are maximizing a sum of polynomial terms in with nonnegative coefficient For ML training, we can find an auxiliary function and optimize it Finding the maximum of the auxiliary function (e.g. using lagrangian multiplier) leads to the following update, which is a growth transformation for the polynomial: NTNU Speech Lab.

EBW for discrete HMMs (2/6) [Povey 2004] The Baum-Welch update is an update procedure for HMMs which uses this growth transformation together with an algorithm known as the forward-backward algorithm for finding the relevant differentials efficiently NTNU Speech Lab.

EBW for discrete HMMs (3/6) [Povey 2004] An update rule as convenient and provable correct as the Baum-Welch update is not available for discriminative training of HMMs, which is a harder optimization problem The Extended Baum-Welch update equation as originally derived is applicable to rational function of parameters which are subject to sum-to-one constraints The MMI objective function for discrete-probability HMMs is an example of such a function NTNU Speech Lab.

EBW for discrete HMMs (4/6) [Povey 2004] 1. Instead of maximizing for positive and ,we can instead maximize where and are the value of previous iteration ; increasing will cause to increase this is because is a strong sense auxiliary function for around 2. If some terms in the resulting polynomial are negative, we can add to the expression a constant C times a further polynomial which is constrained to be a constant (e.g. ), so as to ensure that no product of terms in the final expression has a negative coefficient two essential points used to derive the EBW update for MMI NTNU Speech Lab.

EBW for discrete HMMs (5/6) [Povey 2004] By applying these two ideas : NTNU Speech Lab.

EBW equivalent smooth function (6/6) [Povey 2004] NTNU Speech Lab.

Example consider C NTNU Speech Lab.

Example NTNU Speech Lab.

Conclusion Presented an algorithm for maximization of certain rational function define over domain of probability values This algorithm is very useful in practical situation for training HMMs parameters NTNU Speech Lab.

MPE: Final Auxiliary Function weak-sense auxiliary function strong-sense auxiliary function smoothing function involved weak-sense auxiliary function NTNU Speech Lab.

EBW derived from auxiliary function NTNU Speech Lab.

EBW derived from auxiliary function NTNU Speech Lab.