Presentation is loading. Please wait.

Presentation is loading. Please wait.

LTI Student Research Symposium 2004 Antoine Raux

Similar presentations


Presentation on theme: "LTI Student Research Symposium 2004 Antoine Raux"— Presentation transcript:

1 LTI Student Research Symposium 2004 Antoine Raux
Maximum Likelihood Adaptation of Semi-Continuous HMMs by Latent Variable Decomposition of State Distributions LTI Student Research Symposium 2004 Antoine Raux Work done in collaboration with Rita Singh

2 Outline CDHMMs, SCHMMs, and Adaptation A Little Visit to IR
PLSA Adaptation Scheme Evaluation

3 Outline CDHMMs, SCHMMs, and Adaptation A Little Visit to IR
PLSA Adaptation Scheme Evaluation

4 HMMs for Speech Recognition
Generative probabilistic model of speech States represent sub-phonemic units In general, 2 types of parameters: Temporal aspect: transition probabilities Spectral aspect: output distributions (means, variances, mixing weights of mixtures of Gaussians) 2 broad types of structure: Continuous Density Semi-Continuous

5 Continuous Density HMMs
N(mi1,vi1) N(mi2,vi2) N(mi3,vi3) N(mj1,vj1) N(mj2,vj2) N(mj3,vj3) N(mk1,vk1) N(mk2,vk2) N(mk3,vk3) wi1=P(Ci1|S=Si) wi2 wi3 wj1 wj2 wj3 wk1 wk2 wk3 Si Sj Sk

6 Semi-Continuous HMMs wi1 wi2 wi3 wi4 wi5 wi6 wi7 Si Sj Sk N(m1,v1)

7 SCHMMs vs CDHMMs Less powerful (i.e. continuous are better with large amounts of training data) BUT faster to compute (fewer Gaussian computations) and train well on less data Training of codebook and mixture weights can be decoupled

8 Acoustic Adaptation Both CDHMMs and SCHMMs need a large amount of data for training Such amounts are not always available for some conditions (domain, speakers, environment) Acoustic Adaptation: modify models trained on a large amount of data to match different conditions using a small amount of data

9 Model-based (ML) Adaptation
Tie the parameters of different states so that all states can be adapted with little data Typical method: Maximum Likelihood Linear Regression (MLLR) used to adapt means and variances of CDHMMs

10 Adapting Mixture Weights
Problem: MLLR does not work for mixture weights of SCHMMs Weights are not evenly distributed (because their sum always equals 1) Standard clustering algorithms ineffective Problem: tie states with similar weight distributions

11 Outline CDHMMs, SCHMMs, and Adaptation A Little Visit to IR
PLSA Adaptation Scheme Evaluation

12 Parallel with Information Retrieval
Typical problem in Information Retrieval: identify similar documents Documents can be represented as distributions over the vocabulary: tie documents with similar word distributions

13 Word Document Representation
wi1 wi2 wi3 wi4 wi5 wi6 wi7 Di Dj Dk

14 Problems with Word Document Representation
Word distribution for a document is sparse Ambiguous words, synonyms… Cannot reliably compare distributions to compare documents

15 PLSA for IR Solution proposed by Hofmann (1999): Probabilistic Latent Semantic Analysis Express documents and words as distributions over a latent variable (topic?) Latent variable takes a small number of values compared to words/documents Similar to standard LSA but guarantees proper probability distributions

16 PLSA for IR Word1 Word2 Word3 Word4 Word5 Word6 Word7
wz11=P(Word1|Z=Z1) Z1 Z2 Z3 Z4 wdi1=P(Z1|D=Di) Di Dj Dk

17 PLSA Decomposition Decompose the joint probability:
Independence Assumption ! Pd(d,w) lies on a sub-space of the probability simplex (PLS-Space) Estimate parameters using EM algorithm so as to minimize the KL-divergence between P(d,w) and Pd(d,w)

18 Outline CDHMMs, SCHMMs, and Adaptation A Little Visit to IR
PLSA Adaptation Scheme Evaluation

19 Back to Speech Recognition…
N(m1,v1) N(m2,v2) N(m3,v3) N(m4,v4) N(m5,v5) N(m6,v6) N(m7,v7) wi1 wi2 wi3 wi4 wi5 wi6 wi7 Si Sj Sk

20 PLSA for SCHMMs wz11=P(C1|Z=Z1) Z1 Z2 Z3 Z4 wsi1=P(Z1|S=Si) Si Sj Sk
N(m1,v1) N(m2,v2) N(m3,v3) N(m4,v4) N(m5,v5) N(m6,v6) N(m7,v7) wz11=P(C1|Z=Z1) Z1 Z2 Z3 Z4 wsi1=P(Z1|S=Si) Si Sj Sk

21 Adaptation through PLSA
Large Database Small Database Transitions I Means I Variances I Weights I Transitions II Means II Variances II Weights II Transitions II Means II Variances II Weights III Train SCHMM (Baum-Welch) Retrain SCHMM (Baum-Welch) Decompose Weights using PLSA Decompose Weights using PLSA Recompose Weights P(Z) I P(C|Z) I P(S|Z) I P(Z) I/II P(C|Z) I/II P(S|Z) I/II

22 Outline CDHMMs, SCHMMs, and Adaptation A Little Visit to IR
PLSA Adaptation Scheme Evaluation

23 Evaluation Experiment
Training data/Original models 50 hours of calls to the Communicator system Mostly native speakers 4000 states, 256 Gaussian components Adaptation data 3 hours of calls to the Let’s Go system Non-native speakers Evaluation data 449 utterances (20 min) from calls to Let’s Go

24 Evaluation results

25 Evaluation results Transitions I Means I Variances I Weights I

26 Evaluation results Transitions II Means II Variances II Weights II

27 Evaluation results Best Result: Readapt everything!! Transitions II
Means II Variances II Weights III Best Result: Readapt everything!!

28 Reestimating all three distributions: P(Z), P(C|Z) and P(S|Z)
Large Database Small Database Transitions I Means I Variances I Weights I Transitions II Means II Variances II Weights II Transitions II Means II Variances II Weights III Train SCHMM Retrain SCHMM Decompose Weights Decompose Weights Recompose Weights P(Z) I P(C|Z) I P(S|Z) I P(Z) I/II P(C|Z) I/II P(S|Z) I/II

29 Conclusion PLSA ties states of SCHMMs by introducing a latent variable
PLSA adaptation improves accuracy Best method is equivalent to smoothing the retrained weight distributions by projection on the PLS-space Future direction: directly learn the PLSA parameters in the Baum-Welch training

30 Thank you… Questions?


Download ppt "LTI Student Research Symposium 2004 Antoine Raux"

Similar presentations


Ads by Google