Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st ) By Arthur Chan
New features in Sphinx 3.5 Live-mode APIs Speaker Adaptation using linear transformation Incorporation of Sphinx 3.0 tools into Sphinx 3.x SphinxTrain Better support and documentation (In progress) more support of training scripts. Documentation of Sphinx 3.x and SphinxTrain
Live mode APIs Live-mode API is now stable and officially released. Developer’s API for using the Sphinx 3.x’s recognizer was for high performance 10X RT speech recognition used in CMU’s evaluation Use fully continuous HMM (30% relative performance gain from SCHMM) now have close to 1xRT performance. (measured in >1G CPU) in less than 10k task. capability of speaker adaptation. Well documented and commented.
Speaker Adaptation Acoustic-level of learning is now enabled. Incorporated from speaker adaptation routine of CMU’s Robust group. Allow transformation-based speaker adaptation Y=AX+ b In SphinxTrain, mllr_solve: estimation regression matrix/matrices. mllr_transform: allow mean transformation given a set of regression matrices offline. In sphinx 3.5 Allows mean transformation on-line. Possible to support per utterance-based speaker adaptation. Interface not yet exposed (part of Q4 plan)
Incorporation of s3.0 tools Recognizer for research Include research tools for speech’s recognizer align, word/phoneme based aligner astar, N-best hypotheses generator allphone, phoneme recognizer dag, best path search in lattice N-best rescoring is now viable Will benefit researches in high-level information incorporation
SphinxTrain Now with better support and documentation Every tools now support options -help , a help string -example, a string that shows how to use the tool Eliminate possible mismatches of Sphinx3 and SphinxTrain’s feature extraction routines.
Documentation of Sphinx: Project Hieroglyph To build a set of comprehensive documentation for using Sphinx/ SphinxTrain/CMU LM Toolkit. 3 out of 11 chapters are now completed They can be found in www.cs.cmu.edu/~archan/sphinxDoc.html
Q4 Outlook Three major goals Other goals Better Speaker Adaptation Support MAP, Multiple Regression Class Support Enable dynamic addition and deletion of Language Models Further speed-up of the recognizer (We can still be faster.) Other goals Incorporating speaker normalization into feature extraction