Qiang Huo(*) and Chorkin Chan(**)

Slides:

Advertisements

Similar presentations

Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:

Advertisements

Speech Recognition with Hidden Markov Models Winter 2011

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

Biointelligence Laboratory, Seoul National University

Angelo Dalli Department of Intelligent Computing Systems

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.

Extended Baum-Welch algorithm Present by shih-hung Liu

Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,

Visual Recognition Tutorial

PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

Lecture 5: Learning models using EM

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.

Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.

Computer vision: models, learning and inference

Speech Technology Lab Ƅ ɜ: m ɪ ŋ ǝ m EEM4R Spoken Language Processing - Introduction Training HMMs Version 4: February 2005.

Chapter Two Probability Distributions: Discrete Variables

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:

Isolated-Word Speech Recognition Using Hidden Markov Models

1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.

1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

HMM - Part 2 The EM algorithm Continuous density HMM.

1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.

CS Statistical Machine learning Lecture 24

Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.

1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.

 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.

M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

Comp. Genomics Recitation 6 14/11/06 ML and EM.

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture

CS 2750: Machine Learning Density Estimation

Extended Baum-Welch algorithm

Statistical Models for Automatic Speech Recognition

Hidden Markov Models - Training

Computational NeuroEngineering Lab

Latent Variables, Mixture Models and EM

Hidden Markov Models Part 2: Algorithms

Filtering and State Estimation: Basic Concepts

Statistical Models for Automatic Speech Recognition

'Linear Hierarchical Models'

Hidden Markov Model LR Rabiner

Particle Filters for Event Detection

LECTURE 15: REESTIMATION, EM AND MIXTURES

Parametric Methods Berlin Chen, 2005 References:

Introduction to HMM (cont)

EM Algorithm 主講人：虞台文.

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Presentation transcript:

Qiang Huo(*) and Chorkin Chan(**) Bayesian Adaptive Learning of the Parameters of Hidden Markov Model for Speech Recognition (Maximum a Posterior, MAP) Qiang Huo(*) and Chorkin Chan(**) (*)Department of Computer Science The University of Hong Kong, Hong Kong (**) Department of Radio and Electronics University of Science and Technology of China,P.R.C. Presenter：Hsu Ting-Wei 2006/02/16

Outline Introduction Maximum a Posterior (MAP) Estimate for Discrete HMM Maximum a Posterior (MAP) Estimate for Semi-continuous HMM Conclusion 2019/5/17 NTNU Speech Lab

Introduction The widespread popularity of the HMM framework can mainly be attributed to the existence of the efficient training procedures for HMM. HMM parameter estimators have been derived purely from the training observation sequences without any prior information included. Baum Welch and segmental k-means are two most commonly used procedures for the estimation of HMM parameters. Bayesian inference approach provides a convenient method for combining sample and prior information. 2019/5/17 NTNU Speech Lab

Introduction (cont.) ex: ML Prior ML 2019/5/17 NTNU Speech Lab

Introduction (cont.) + = 2019/5/17 NTNU Speech Lab 當 f 函式的model的參數pi,a,b假設為獨立時, 1.在DHMM中搭配的prior function叫Dirichelet分布 2.在SCHMM中搭配的prior fuction為 Dirichelet + Normal-Wishart分布 HMM中每個model中的states的參數的機率組合成一分布型態, 如:常態分布,高斯分布 ML Prior f 函式: 給定lambda下X所成分布 + Prior: 收集許多lambda所求得之分佈，再取log所的分布, 其中prior的參數叫 hyperparameter Q 輔助函式此即ML的概念, 利用EM去對Q函式估測但估測出來的機率不可靠 = R 輔助函式此即MAP的概念, 再利用EM去對R函式估測估測出來的機率較可靠, 因為Knowledge更多 2019/5/17 NTNU Speech Lab

MAP Estimate for Discrete HMM Inference : 2019/5/17 NTNU Speech Lab

MAP Estimate for Discrete HMM (cont.) Definition : 2019/5/17 NTNU Speech Lab

MAP Estimate for Discrete HMM (cont.) Prior : hyperparameter 2019/5/17 NTNU Speech Lab

MAP Estimate for Discrete HMM (cont.) Q-function : E Step 2019/5/17 NTNU Speech Lab

MAP Estimate for Discrete HMM (cont.) Q-function : 2019/5/17 NTNU Speech Lab

MAP Estimate for Discrete HMM (cont.) R-function : 2019/5/17 NTNU Speech Lab

MAP Estimate for Discrete HMM (cont.) Lagrange Multiplier M Step Initial probability sum=1 2019/5/17 NTNU Speech Lab

MAP Estimate for Discrete HMM (cont.) Transition probability 2019/5/17 NTNU Speech Lab

MAP Estimate for Discrete HMM (cont.) Observation probability 2019/5/17 NTNU Speech Lab

MAP Estimate for Discrete HMM (cont.) How to choose the initial estimate for ? One reasonable choice of the initial estimate is the mode of the prior density. 2019/5/17 NTNU Speech Lab

MAP Estimate for Discrete HMM (cont.) What’s the mode ? So applying Lagrange Multiplier we can easily derive above modes. Example : 2019/5/17 NTNU Speech Lab

MAP Estimate for Discrete HMM (cont.) Another reasonable choice of the initial estimate is the mean of the prior density. 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM Model 1 Model 2 Model M 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) Definition : mean precision : covarience的倒數 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) Prior : independent 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) Q-function : E Step 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) Q-function : 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) R-function : 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) Initial probability M Step 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) Transition probability 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) Mixture weight 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) Differentiating w.r.t and equate it to zero. 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) Case1: Full Covariance matrix case 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) Case1: Full Covariance matrix case The initial estimate can be chosen as the mode of the prior PDF And also can be chosen as the mean of the prior PDF 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) Case2: Diagonal Covariance matrix case 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) 2019/5/17 NTNU Speech Lab

MAP Estimate for Semi-continuous HMM (cont.) Case2: Diagonal Covariance matrix case The initial estimate can be chosen as the mode of the prior PDF And also can be chosen as the mean of the prior PDF 2019/5/17 NTNU Speech Lab

Conclusion The important issue of prior density is discussed. Some application : Model adaptation, HMM training….. 2019/5/17 NTNU Speech Lab