Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.

Slides:



Advertisements
Similar presentations
Speech Recognition with Hidden Markov Models Winter 2011
Advertisements

Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
1 Bayesian Adaptation in HMM Training and Decoding Using a Mixture of Feature Transforms Stavros Tsakalidis and Spyros Matsoukas.
K Means Clustering , Nearest Cluster and Gaussian Mixture
An Overview of Machine Learning
Supervised Learning Recap
IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System.
Paper Discussion: “Simultaneous Localization and Environmental Mapping with a Sensor Network”, Marinakis et. al. ICRA 2011.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
On Recognizing Music Using HMM Following the path craved by Speech Recognition Pioneers.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
Speaker Adaptation for Vowel Classification
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering.
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Effective Gaussian mixture learning for video background subtraction Dar-Shyang Lee, Member, IEEE.
Optimal Adaptation for Statistical Classifiers Xiao Li.
Lecture 14: Classification Thursday 18 February 2010 Reading: Ch – 7.19 Last lecture: Spectral Mixture Analysis.
1 USING CLASS WEIGHTING IN INTER-CLASS MLLR Sam-Joo Doh and Richard M. Stern Department of Electrical and Computer Engineering and School of Computer Science.
Speech Technology Lab Ƅ ɜ: m ɪ ŋ ǝ m EEM4R Spoken Language Processing - Introduction Training HMMs Version 4: February 2005.
9.0 Speaker Variabilities: Adaption and Recognition References: of Huang 2. “ Maximum A Posteriori Estimation for Multivariate Gaussian Mixture.
Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models C. J. Leggetter and P. C. Woodland Department of.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
Adaptation Techniques in Automatic Speech Recognition Tor André Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
Gaussian Mixture Model and the EM algorithm in Speech Recognition
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
HMM - Part 2 The EM algorithm Continuous density HMM.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires an impractical amount of data. Solution: Create clusters and.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
RADFORD M. NEAL GEOFFREY E. HINTON 발표: 황규백
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
State Tying for Context Dependent Phoneme Models K. Beulen E. Bransch H. Ney Lehrstuhl fur Informatik VI, RWTH Aachen – University of Technology, D
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Linear Models Tony Dodd. 21 January 2008Mathematics for Data Modelling: Linear Models Overview Linear models. Parameter estimation. Linear in the parameters.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Clustering (1) Clustering Similarity measure Hierarchical clustering
LECTURE 10: DISCRIMINANT ANALYSIS
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
LTI Student Research Symposium 2004 Antoine Raux
SMEM Algorithm for Mixture Models
KAIST CS LAB Oh Jong-Hoon
Generally Discriminant Analysis
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
LECTURE 15: REESTIMATION, EM AND MIXTURES
EM Algorithm and its Applications
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language Technology Workshop, 1995

2 Outline Introduction MLLR Overview Fixed and Dynamic Regression Classes Supervised Adaptation vs. Unsupervised Adaptation Evaluation on WSJ Data Conclusion

3 Introduction Speaker Independent (SI) Recognition systems – Poor performance – Easy to get lots of training data Speaker Dependent (SD) Recognition systems – Better performance – Difficult to get enough training data Solution: SI system + adaptation with little SD data – Advantage: Little SD data is required – Problem: some models are not updated

4 Introduction (aim of the paper) MLLR (Maximum Likelihood Linear Regression) approach – Parameter transformation technique – All models are updated with little adaptation data – Adapts the SI system by transforming the mean parameters with a set of linear transforms Dynamic Regression Classes approach – Optimizing the adaptation procedure during runtime – Allows all models of adaptation to be performed in a single framework

5 MLLR Overview Regression Classes – The set of Gaussians that shares the same transformation SD Data Mixture components Regression Classes Transformation Matrix (W) estimate transform

6 MLLR Overview (cont.) SI meanSD mean Therefore, for a single Gaussian distribution, the probability density function of state j generating a speech observation vector o of dimension n is:

7 Estimation of MLLR matrices Gaussian covariance matrices are diagonal A set of T frames of adaptation data O = o 1 o 2 … o T W j is tied between R Gaussians j 1 j 2 … j R Wj can be updated column by column:

8 Estimation of MLLR matrices (cont.) z i = i th column of Z: The probability of occupying state j at time t while generating O: c (r) ii is the i th diagonal element of the r th tied state covariance scaled by the total state occupation probability

9 MLLR for Incremental Adaptation Can be implemented by accumulating the time dependent components separately Accumulate the observation vectors associated with each Gaussian and the associated occupation probability – MLLR equations can be implemented as any time

10 Fixed Regression Classes Regression classes are predetermined by assessing – the amount of adaptation data – Mixture component clustering procedure based on a likelihood measure Number of regression classes is roughly proportional to the number of adaptation data Disadvantage: – Needs to know the adaptation data in advance – Some regression classes might not have sufficient amount of data Poor estimates of the transformations Class may be dominated by a specific mixture component

11 Dynamic Regression Classes Mixture components are arranged into a tree Leaves of the tree are: – For small HMM system: individual mixture component – For large HMM system: base classes containing a set of mixture components These components are similar in divergence measure Leaves in a tree are then merged into groups of similar components based on a distance measure (divergence)

12 Supervised Adaptation vs. Unsupervised Adaptation Note: Fixed regression class approach was used Figure: Supervised vs. Unsupervised adaptation using RM corpus

13 Evaluation on WSJ Data Experiment settings – Dynamic regression classes approach – Baseline Speaker Independent system (refer to 5.1) S3 test: – Static supervised adaptation for non-native speakers S4 test: – Incremental unsupervised adaptation for native speakers

14 Regression Class Tree Settings Distance measure: – Divergence between mixture components Use clustering algorithm to generate 750 base classes – 750 mixture components were chosen – Assign the nearest 10 to each base class – Assign the rest to the base classes by using an average distance measure from all the existing members Regression tree was then built in a similar distance measure – Base classes are compared in pair-wise basis using an average divergence between all members of each class

15 S3 Test Results Regression Classes Iterations MLLR % Word Error S3-dev 94S3-Nov 94 Native speaker recognizer n/a Baseline Tree Tree Tree Global

16 S4 Test Results Regression Classes Update Interval % Word Error S4-dev 94S4-Nov 94 Baseline Tree Tree Tree Global Note: Increase update interval: large reduction in adaptation computation and only small drop in performance

17 Number of classes vs. number of sentences (S4 Test)

18 Adaptation in Nov’94 Hi-P0 HTK System Unsupervised adaptation Adapt for 15 sentences from each speaker from unfiltered newspaper articles About 15 million parameter in this HMM set Used 750 base classes Adaptation % Word Error H2-dev’94H1 Nov’94 No Yes

19 Conclusion MLLR approach can be used for both static and incremental adaptation MLLR approach can be used for both supervised and unsupervised adaptation Dynamic regression classes