Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems http://sourceforge.net/projects/cmusphinx/

Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems Arthur Chan, Jahanzeb Sherwani, Ravishankar Mosur and Alex Rudnicky Computer Science Department, Carnegie Mellon University Sphinx 3.X (X=4) Sphinx: -speaker-independent large vocabulary speech recognition system -open source under Berkeley-style license: one can distribute, modify and use it freely Sphinx 3.X: Reengineering of Sphinx 3 to create a real-time large vocabulary speech recognizer -S3.3: Tree lexicon, Histogram Pruning and Live-mode decoder (R. Mosur 1999) -S3.4 (released Jul 04): Fast GMM Computation , Phoneme look-ahead. (A. Chan 2004, this paper.) -S3.5 (will soon release): MLLR-based Speaker Adaptation , live-mode APIs, alignment, phoneme recognition, lattice rescoring, best path search in lattice. 4-Level of GMM Computation GMMs Frames Gaussians Feature Component Gaussian-level -VQ-based Gaussian Selection (Bochierri 93) -SVQ-base Gaussian Selection (Mosur 99) Feature-level -Sub vector quantization or SDCHMM method (Mosur 97 Mak 97) -LDA, PCA Frame-Level: -Discount alternative frames -Down Sampling (Wycesna 95) GMM-Level: -Only compute important GMM. (e.g. w high CI score) -CI-GMM Selection (Lee 01) Sphinx 3.4: Fast GMM Computation Experiment Results Our approach: -Divide GMM computation in 4 levels -Implement representative techniques in each level -Inspired by 4-level state tying (Sagayama 95) Observation: In each level, full computation can be approximated by computing only parts of the components. Advantages: Provide a general framework of understanding fast GMM computation Experiment Summary: 1, CI-based GMM Selection seems to most effective. 2, Many Gaussian-level techniques seems to have too much overhead. Algorithm WER Total GMM Srch Ovhd BL 18.65 6.9 5.85 0.85 - Down Sampling 19.10 4.35 3.99 0.96 CIGMMS 18.82 3.25 1.18 2.06 Gaussian Selection 18.95 3.95 2.84 0.89 0.22 SVQ 18.69 4.20 2.04 0.98 1.08 Note: Results combined with pruning can be found in the paper.

Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems http://sourceforge.net/projects/cmusphinx/

Similar presentations

Presentation on theme: "Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems http://sourceforge.net/projects/cmusphinx/"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems http://sourceforge.net/projects/cmusphinx/

Similar presentations

Presentation on theme: "Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems http://sourceforge.net/projects/cmusphinx/"— Presentation transcript:

Similar presentations

About project

Feedback