Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya
Introduction Two issues for n-gram –Generalizability & adaptability Generalizability –Word class / parsing –Measure similarity in the continuous space Adaptability –Larger parameter numbers for LM –Use continuous space to reduce parameter numbers
Approach Word –Word vector of dimensions –New word vector of dimensions History: concatenation of words –History vector: N-1 words, dimensions –New History vector: uh 1 uh 2 … uh n-1 M M M N-1 … y L
Approach (cont.) Probability density for history y given the word w Probability of word w given history y Smoothed n-gram or smoothed clustered n-gram or *exponents can be used to control the dynamic ranges of n-gram and Gaussian mixture probabilities
Implementation Word co-occurrence matrix E –Word i follows word j –SVD, 100 dimensions To create a trigram –Two words are stacked to form a 200-d vector LDA +MLLT –Reduce dimensionality to 50 GMM Training
Experimental results 5-best rescoring
A discriminative training framework using n-best speech recognition transcriptions and scores for spoken utterance classification Sibel Yaman, Li Deng, Dong Yu, Ye-Yi Wang, Alex Acero
Introduction Conventionally, a two-phase approaches is adapted for SUC (spoken utterance classification) task –ASR transcription –Semantic classification It has been reported that reduction in WER (word error rate) do not necessarily translate into CER (classification error rate) A novel discriminative training framework for learning the language and classification model is proposed
DT framework Using the N-best Lists As long as enough words are recognized to trigger the correct salient phrase, the correct meaning is assigned to the utterance Using ME Classifier Joint association score
DT framework Using the N-best Lists (cont.) The most likely to yield the correct class is first extracted based on joint association score from N-best list Assign remaining sentences in the N-best list Assignment of sentences in the N-best list to classes is an effective mechanism for discriminating the sentence in the N-best list that is most likely to yield the correct class from those that more likely to yield other wrong classes
DT framework Using the N-best Lists (cont.) Discriminant function & loss function Approximation loss
DT framework Using the N-best Lists (cont.) Assignment of class ●
DT framework Using the N-best Lists (cont.) DT of LM parameters DT of classifier parameters
Experimental Results
Conclusions A new discriminative training framework for spoken utterance classification was proposed The use of N-best transcription is motivated by the fact the same class is often associated with many variants of spoken utterances