Yuya Akita , Tatsuya Kawahara Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita , Tatsuya Kawahara
Introduction Spoken-style v.s. writing style Combination of document and spontaneous corpus Irrelevant linguistic expression Model transformation Simulated spoken-style text by randomly inserting fillers Weighted finite-state transducer framework (?) Statistical machine translation framework Problem with Model transformation methods Small corpus, data sparseness One of solutions: POS tag
Statistical Transformation of Language model Posteriori: X: source language model (document style) Y: target language model (spoken language) So, P(X|Y) and P(Y|X) are transformation model Transformation models can be estimated using parallel corpus n-gram count:
Statistical Transformation of Language model (cont.) Data sparseness problem for parallel corpus POS information Linear interpolation Maximum entropy
Training Use aligned corpus Word-based transformation probability POS-based transformation probability Pword(x|y) and PPOS(x|y) are estimated accordingly
Training (cont.) Back-off scheme Linear interpolation scheme Maximum entropy scheme ME model is applied to every n-gram entry of document-style model spoken-style n-garm is generated if transform probability is larger than a threshold
Experiments Training coprus: Test corpus: Baseline corpus: National Congress of Japan, 71M words Parallel corpus: budget committee in 2003, 666K Corpus of Spontaneous Japan, 2.9M words Test corpus: Another meeting of Budget committee in 2003, 63k words
Experiments (cont.) Evaluation of Generality of transformation model LM
Experiments (cont.) r
Conclusions Propose a novel statistical transformation model approach
Non-stationary n-gram model
Concept Probability of sentence Actually, n-gram LM Actually, Miss long-distance and word position information while applying Markov assumption
Concept (cont.)
Training (cont.) ML estimation Smoothing Combination Use low order Use small bins Transform with Smoothed normal ngram Combination Linear interpolation Back-off
Smoothing with lower order (cont.) Additive smoothing Back-off smoothing Linear interpolation
Smoothing with small bins (k=1) (cont.) Back-off smoothing Linear interpolation Hybrid smoothing
Transformation with smoothed ngram Novel method If t-mean(w) decreases, the word is more important Var(w) is used to balance t-mean(w) for active words active word: words can appears at any position in the sentences Back-off smoothing & linear interpolation
Experiments Observation: Marginal position & middle position
Experiments (cont.) NS bigram
Experiments (cont.) Comparison with three smoothing techniques
Experiments (cont.) Error rate with different bins
Conclusions Traditional n-gram model is enhanced by relaxing its stationary hypothesis and exploring the word positional information in language modeling
Two-way Poisson Mixture model
Multivariate Poisson, dim = p (lexicon size) Essential Poisson distribution Poisson mixture model Poisson Rk Poisson 1 Poisson 2 … Class k πk1 πk2 πkRk Σ xp ... X2 X1 Document x Multivariate Poisson, dim = p (lexicon size) *Word clustering: reduce Poisson dimension => Two-way mixtures