Download presentation
Presentation is loading. Please wait.
Published byAlbert Gaines Modified over 8 years ago
1
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE
2
2/23 Outlines Introduction Adaptive estimation of CDHMM parameters Bayesian adaptation of Gaussian parameters Experimental setup and recognition results Summary
3
3/23 Introduction Adaptive learning Adapting reference speech patterns or models to handle the situations unseen in the training phase For example: Varying channel characteristics Changing environmental noise Varying transducers
4
4/23 Introduction – MAP Maximum a posteriori (MAP) Also called Bayesian adaptation Under the given prior distribution, MAP tries to maximize the posterior distribution
5
5/23 Adaptive estimation of CDHMM parameters Sequence of SD observation Y = {y 1, y 2, …,y T } λ = parameter set of the distribution function Given a training data Y, we want to estimate λ If λ is assumed random with a prior distribution function P 0 ( λ), the MAP estimate for λ is obtained by solving Prior Distribution (MLE) Likelihood function Language Model Posterior Distribution
6
6/23 Adaptive segmental K-Means algorithm Maximization of the state-likelihood of the observation sequences in an iterative manner using the segmental k-means training algorithm s = state sequence 1. For a given model, find the optimal state sequence 2. Based on a state sequence, find the MAP estimate
7
7/23 The choices of prior distributions Non-informative prior Parameters are fixed but unknown and are to be estimated from the data No preference to what the value of the parameters should be MAP = MLE Informative prior Knowledge about the parameters to be estimated is known Choice of prior distribution depends on the acoustic models used to characterize the data
8
8/23 Conjugate prior Prior and posterior probabilities belong to the same distribution family Analytical forms of some conjugate priors are available
9
9/23 Bayesian adaptation of Gaussian parameters 3 implementations of Bayesian adaptation 1. Gaussian mean 2. Gaussian variance 3. Gaussian mean and precision μ= mean and σ 2 = variance of one component of a state observation distribution Precision
10
10/23 Bayesian adaptation of the Gaussian mean Observation μ is random σ 2 is fixed and known MAP estimate for the parameter μ is: where
11
11/23 Bayesian adaptation of the Gaussian mean (cont.) MAP converges to MLE when A large number of samples are used for training. Relatively large value for prior variance τ 2 is chosen (τ 2 >> σ 2 / n). (non-informative prior)
12
12/23 Bayesian adaptation of the Gaussian variance Mean μ is estimated from sample mean Variance σ 2 is given by an informative prior: σ min 2 is estimated from a large collection of speech data
13
13/23 Bayesian adaptation of the Gaussian variance (cont.) Variance parameter is: S y 2 is the sample variance Effective when insufficient amount of sample data is available
14
14/23 Bayesian adaptation of both Gaussian mean and precision Both mean and precision parameters are random The joint conjugate prior P 0 (μ, θ ) is a normal- gamma distribution Gamma DistributionNormal Distribution
15
15/23 Bayesian adaptation of both Gaussian mean and precision (cont.) MAP estimate of μ and σ 2 can be derived as: Prior parameters can be estimated as follows:
16
16/23 Experimental setup 39 words vocabulary 26 English letters 10 digits 3 command words (stop, error, repeat) 2 sets of speech data SI data for SI model, 100 speakers (50F50M) SD data for adaptation, 4 speakers (2F2M) SD data 5 training utterances per word for each male speaker and 7 utterances for each female speaker SD testing data 10 utterances per word per speaker Recorded over local dialed-up telephone lines Sampling rate = 6.67kHz
17
17/23 Experimental setup (cont.) Models are obtained by using the segmental k- means training procedure Maximum number of mixture component per state = 9 Diagonal covariance matrix 5-state HMM 2 sets of SI models 1 st set: as described above 2 nd set: single Gaussian distribution per state
18
18/23 Experimental results 1 Baseline recognition rate: SD: 2 Gaussian mixtures per state per word
19
19/23 Experimental results 2 5 adaptation experiments: EXP1: SD mean and an SD variance (regular MLE) EXP2: SD mean and a fixed variance estimate EXP3: SA mean (3.1) with prior parameters (3.2-3.3) EXP4: SD mean and an SA variance (3.5) EXP5: SA estimates (3.7) with prior parameters (3.8- 3.9)
20
20/23 Experimental results 3
21
21/23 Experimental results 4 SD mean, SD variance (MLE) SD mean, fixed variance SA mean (method 1) SD mean, SA variance (method 2)SA mean and precision (method 3)
22
22/23 Experimental results 5
23
23/23 Conclusions Average recognition rate with all token incorporated = 96.1% Performance improves when More adaptation data are used Both mean and precision are adapted
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.