LTI Student Research Symposium 2004 Antoine Raux

LTI Student Research Symposium 2004 Antoine Raux
Maximum Likelihood Adaptation of Semi-Continuous HMMs by Latent Variable Decomposition of State Distributions LTI Student Research Symposium 2004 Antoine Raux Work done in collaboration with Rita Singh

Outline CDHMMs, SCHMMs, and Adaptation A Little Visit to IR
PLSA Adaptation Scheme Evaluation

HMMs for Speech Recognition
Generative probabilistic model of speech States represent sub-phonemic units In general, 2 types of parameters: Temporal aspect: transition probabilities Spectral aspect: output distributions (means, variances, mixing weights of mixtures of Gaussians) 2 broad types of structure: Continuous Density Semi-Continuous

Continuous Density HMMs
N(mi1,vi1) N(mi2,vi2) N(mi3,vi3) N(mj1,vj1) N(mj2,vj2) N(mj3,vj3) N(mk1,vk1) N(mk2,vk2) N(mk3,vk3) wi1=P(Ci1|S=Si) wi2 wi3 wj1 wj2 wj3 wk1 wk2 wk3 Si Sj Sk

Semi-Continuous HMMs wi1 wi2 wi3 wi4 wi5 wi6 wi7 Si Sj Sk N(m1,v1)

SCHMMs vs CDHMMs Less powerful (i.e. continuous are better with large amounts of training data) BUT faster to compute (fewer Gaussian computations) and train well on less data Training of codebook and mixture weights can be decoupled

Acoustic Adaptation Both CDHMMs and SCHMMs need a large amount of data for training Such amounts are not always available for some conditions (domain, speakers, environment) Acoustic Adaptation: modify models trained on a large amount of data to match different conditions using a small amount of data

Model-based (ML) Adaptation
Tie the parameters of different states so that all states can be adapted with little data Typical method: Maximum Likelihood Linear Regression (MLLR) used to adapt means and variances of CDHMMs

Adapting Mixture Weights
Problem: MLLR does not work for mixture weights of SCHMMs Weights are not evenly distributed (because their sum always equals 1) Standard clustering algorithms ineffective Problem: tie states with similar weight distributions

Parallel with Information Retrieval
Typical problem in Information Retrieval: identify similar documents Documents can be represented as distributions over the vocabulary: tie documents with similar word distributions

Word Document Representation
wi1 wi2 wi3 wi4 wi5 wi6 wi7 Di Dj Dk … …

Problems with Word Document Representation
Word distribution for a document is sparse Ambiguous words, synonyms… Cannot reliably compare distributions to compare documents

PLSA for IR Solution proposed by Hofmann (1999): Probabilistic Latent Semantic Analysis Express documents and words as distributions over a latent variable (topic?) Latent variable takes a small number of values compared to words/documents Similar to standard LSA but guarantees proper probability distributions

PLSA for IR Word1 Word2 Word3 Word4 Word5 Word6 Word7
wz11=P(Word1|Z=Z1) Z1 Z2 Z3 Z4 wdi1=P(Z1|D=Di) Di Dj Dk … …

PLSA Decomposition Decompose the joint probability:
Independence Assumption ! Pd(d,w) lies on a sub-space of the probability simplex (PLS-Space) Estimate parameters using EM algorithm so as to minimize the KL-divergence between P(d,w) and Pd(d,w)

Back to Speech Recognition…
N(m1,v1) N(m2,v2) N(m3,v3) N(m4,v4) N(m5,v5) N(m6,v6) N(m7,v7) wi1 wi2 wi3 wi4 wi5 wi6 wi7 Si Sj Sk

PLSA for SCHMMs wz11=P(C1|Z=Z1) Z1 Z2 Z3 Z4 wsi1=P(Z1|S=Si) Si Sj Sk
N(m1,v1) N(m2,v2) N(m3,v3) N(m4,v4) N(m5,v5) N(m6,v6) N(m7,v7) wz11=P(C1|Z=Z1) Z1 Z2 Z3 Z4 wsi1=P(Z1|S=Si) Si Sj Sk

Adaptation through PLSA
Large Database Small Database Transitions I Means I Variances I Weights I Transitions II Means II Variances II Weights II Transitions II Means II Variances II Weights III Train SCHMM (Baum-Welch) Retrain SCHMM (Baum-Welch) Decompose Weights using PLSA Decompose Weights using PLSA Recompose Weights P(Z) I P(C|Z) I P(S|Z) I P(Z) I/II P(C|Z) I/II P(S|Z) I/II

Evaluation Experiment
Training data/Original models 50 hours of calls to the Communicator system Mostly native speakers 4000 states, 256 Gaussian components Adaptation data 3 hours of calls to the Let’s Go system Non-native speakers Evaluation data 449 utterances (20 min) from calls to Let’s Go

Evaluation results

Evaluation results Transitions I Means I Variances I Weights I

Evaluation results Transitions II Means II Variances II Weights II

Evaluation results Best Result: Readapt everything!! Transitions II
Means II Variances II Weights III Best Result: Readapt everything!!

Reestimating all three distributions: P(Z), P(C|Z) and P(S|Z)
Large Database Small Database Transitions I Means I Variances I Weights I Transitions II Means II Variances II Weights II Transitions II Means II Variances II Weights III Train SCHMM Retrain SCHMM Decompose Weights Decompose Weights Recompose Weights P(Z) I P(C|Z) I P(S|Z) I P(Z) I/II P(C|Z) I/II P(S|Z) I/II

Conclusion PLSA ties states of SCHMMs by introducing a latent variable
PLSA adaptation improves accuracy Best method is equivalent to smoothing the retrained weight distributions by projection on the PLS-space Future direction: directly learn the PLSA parameters in the Baum-Welch training

Thank you… Questions?

LTI Student Research Symposium 2004 Antoine Raux

Similar presentations

Presentation on theme: "LTI Student Research Symposium 2004 Antoine Raux"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

LTI Student Research Symposium 2004 Antoine Raux

Similar presentations

Presentation on theme: "LTI Student Research Symposium 2004 Antoine Raux"— Presentation transcript:

Similar presentations

About project

Feedback