Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems http://sourceforge.net/projects/cmusphinx/ Arthur Chan, Jahanzeb Sherwani, Ravishankar Mosur and Alex Rudnicky Computer Science Department, Carnegie Mellon University Sphinx 3.X (X=4) Sphinx: -speaker-independent large vocabulary speech recognition system -open source under Berkeley-style license: one can distribute, modify and use it freely Sphinx 3.X: Reengineering of Sphinx 3 to create a real-time large vocabulary speech recognizer -S3.3: Tree lexicon, Histogram Pruning and Live-mode decoder (R. Mosur 1999) -S3.4 (released Jul 04): Fast GMM Computation , Phoneme look-ahead. (A. Chan 2004, this paper.) -S3.5 (will soon release): MLLR-based Speaker Adaptation , live-mode APIs, alignment, phoneme recognition, lattice rescoring, best path search in lattice. 4-Level of GMM Computation GMMs Frames Gaussians Feature Component Gaussian-level -VQ-based Gaussian Selection (Bochierri 93) -SVQ-base Gaussian Selection (Mosur 99) Feature-level -Sub vector quantization or SDCHMM method (Mosur 97 Mak 97) -LDA, PCA Frame-Level: -Discount alternative frames -Down Sampling (Wycesna 95) GMM-Level: -Only compute important GMM. (e.g. w high CI score) -CI-GMM Selection (Lee 01) Sphinx 3.4: Fast GMM Computation Experiment Results Our approach: -Divide GMM computation in 4 levels -Implement representative techniques in each level -Inspired by 4-level state tying (Sagayama 95) Observation: In each level, full computation can be approximated by computing only parts of the components. Advantages: Provide a general framework of understanding fast GMM computation Experiment Summary: 1, CI-based GMM Selection seems to most effective. 2, Many Gaussian-level techniques seems to have too much overhead. Algorithm WER Total GMM Srch Ovhd BL 18.65 6.9 5.85 0.85 - Down Sampling 19.10 4.35 3.99 0.96 CIGMMS 18.82 3.25 1.18 2.06 Gaussian Selection 18.95 3.95 2.84 0.89 0.22 SVQ 18.69 4.20 2.04 0.98 1.08 Note: Results combined with pruning can be found in the paper.