15-Jul-04 FSG Implementation in Sphinx2 FSG Implementation in Sphinx2 Mosur Ravishankar Jul 15, 2004
15-Jul-04 FSG Implementation in Sphinx2 Outline Input specification FSG related API Application examples Implementation issues
15-Jul-04 FSG Implementation in Sphinx2 FSG Specification “Assembly language” for specifying FSGs Low-level Most standards should compile down to this level Set of N states, numbered 0.. N-1 Transitions: Emitting or non-emitting (aka null or epsilon) Each emitting transition emits one word Fixed probability 0 < p <= 1. One start state, and one final state Null transitions can effectively give you as many as needed Goal: Find the highest likelihood path from the start state to the final state, given some input speech
15-Jul-04 FSG Implementation in Sphinx2 An FSG Example FSG_BEGIN leg NUM_STATES 10 START_STATE 0 FINAL_STATE 9 # Transitions T to T city1 … T cityN T from T city1 … T cityN T T from T city1 … T cityN T to T city1 … T cityN T FSG_END to from to city1 cityN city1 cityN city1 cityN city1 cityN
15-Jul-04 FSG Implementation in Sphinx2 A Better Representation Composition of FSGs to from to [city] pittsburgh 01 chicago boston buffalo seattle
15-Jul-04 FSG Implementation in Sphinx2 Multiple Pronunciations and Filler Words Alternative pronunciations added automatically Filler word transitions (silence and noise) added automatically A filler self-transition at every state Noise words added only if noise penalty (probability) > 0 to from to [city] [filler]
15-Jul-04 FSG Implementation in Sphinx2 FSG Related API Loading during initialization (i.e., fbs_init() ): -fsgfn flag specifying an FSG file to load (similar to –lmfn flag) Difference: FSG name is contained in the file Dynamic loading: char *uttproc_load_fsgfile(char *fsgfile); returns the FSG string name contained in the file Switching to an FSG: uttproc_set_fsg (char *fsgname); Deleting a previously loaded FSG: uttproc_del_fsg (char *fsgname); Old demos could be run with FSGs, simply by recompiling with new libraries
15-Jul-04 FSG Implementation in Sphinx2 Mixed LM/FSG Decoding Example (See lm_fsg_test.c)
15-Jul-04 FSG Implementation in Sphinx2 Another Example: Garbage Models Extraneous speech could be absorbed using an allphone “garbage model” to from to [city] [allphone]
15-Jul-04 FSG Implementation in Sphinx2 B/W Training and Forced Alignment Consolidate code for FSGs, Baum-Welch training, and forced alignment? Sentence HMMs for training and alignment are essentially linear FSGs Alternative pronunciations and filler words handled automatically Differences: B/W uses forward (and backward) algorithm instead of Viterbi Alignment has to produce phone and state segmentation as well
15-Jul-04 FSG Implementation in Sphinx2 Implementation Straightforward expansion of word-level FSG into a triphone HMM network Viterbi beam search over this HMM network No major optimizations attempted (so far) No lextree implementation (What?) Static allocation of all HMMs; not allocated “on demand” (Oh, no!) FSG transitions represented by NxN matrix (You can’t be serious!!) Speed/Memory usage profile needs to be evaluated Mostly new set of data structures, separate from existing ones Should be easily ported to Sphinx3
15-Jul-04 FSG Implementation in Sphinx2 Implementation: FSG Expansion to HMMs word1 word p1p2p3p4q1q2q word1 word2
15-Jul-04 FSG Implementation in Sphinx2 Implementation: Triphone HMMs p1p2p3p4 0 1 word1 p1p2 p3p4 word1 p1’ p1’’ p4’ p4’’ 0 1 Multiple root HMMs for different left contexts Multiple leaf HMMs for different right contexts p1 p2 p1’ p1’’ p2’ p2’’ Special case for 2-phone words 1-phone words use SIL as right context
15-Jul-04 FSG Implementation in Sphinx2 Possible Optimization: Lextrees p1p2p3p4q1q2q3 word1 wordN Lextree (associated with source state)
15-Jul-04 FSG Implementation in Sphinx2 Possible Optimization: Path Pruning If there are two transitions with the same label into the same state, the one starting out with a worse score can be pruned But reconciling with lextrees is tricky, since labels are now blurred w w
15-Jul-04 FSG Implementation in Sphinx2 Other Issues Pending Dynamic allocation and management of HMMs Implementation of absolute pruning Lattice generation N-best list generation …
15-Jul-04 FSG Implementation in Sphinx2 Where Is It? My copy of open source version of Sphinx2 Someone needs to update the sourceforge copy Html documentation has been updated