Download presentation
Presentation is loading. Please wait.
1
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)1 FSG Implementation in Sphinx2 Mosur Ravishankar Jul 15, 2004
2
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)2 Outline Input specification FSG related API Application examples Implementation issues
3
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)3 FSG Specification “Assembly language” for specifying FSGs Low-level Most standards should compile down to this level Set of N states, numbered 0.. N-1 Transitions: Emitting or non-emitting (aka null or epsilon) Each emitting transition emits one word Fixed probability 0 < p <= 1. One start state, and one final state Null transitions can effectively give you as many as needed Goal: Find the highest likelihood path from the start state to the final state, given some input speech
4
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)4 An FSG Example FSG_BEGIN leg NUM_STATES 10 START_STATE 0 FINAL_STATE 9 # Transitions T 0 1 0.5 to T 1 2 0.1 city1 … T 1 2 0.1 cityN T 2 3 1.0 from T 3 4 0.1 city1 … T 3 4 0.1 cityN T 4 9 1.0 T 0 5 0.5 from T 5 6 0.1 city1 … T 5 6 0.1 cityN T 6 7 1.0 to T 7 8 0.1 city1 … T 7 8 0.1 cityN T 8 9 1.0 FSG_END to from to 1 2 3 67 9 5 0 4 8 city1 cityN city1 cityN city1 cityN city1 cityN
5
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)5 A Better Representation Composition of FSGs to from to 1 2 3 67 9 5 0 4 8 [city] pittsburgh 01 chicago boston buffalo seattle
6
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)6 Multiple Pronunciations and Filler Words Alternative pronunciations added automatically Filler word transitions (silence and noise) added automatically A filler self-transition at every state Noise words added only if noise penalty (probability) > 0 to from to 1 2 3 67 9 5 0 4 8 [city] [filler]
7
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)7 FSG Related API Loading during initialization (i.e., fbs_init() ): -fsgfn flag specifying an FSG file to load (similar to –lmfn flag) Difference: FSG name is contained in the file Dynamic loading: char *uttproc_load_fsgfile(char *fsgfile); returns the FSG string name contained in the file Switching to an FSG: uttproc_set_fsg (char *fsgname); Deleting a previously loaded FSG: uttproc_del_fsg (char *fsgname); Old demos could be run with FSGs, simply by recompiling with new libraries
8
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)8 Mixed LM/FSG Decoding Example (See lm_fsg_test.c)
9
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)9 Another Example: Garbage Models Extraneous speech could be absorbed using an allphone “garbage model” to from to 1 2 3 67 9 5 0 4 8 [city] [allphone]
10
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)10 B/W Training and Forced Alignment Consolidate code for FSGs, Baum-Welch training, and forced alignment? Sentence HMMs for training and alignment are essentially linear FSGs Alternative pronunciations and filler words handled automatically Differences: B/W uses forward (and backward) algorithm instead of Viterbi Alignment has to produce phone and state segmentation as well
11
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)11 Implementation Straightforward expansion of word-level FSG into a triphone HMM network Viterbi beam search over this HMM network No major optimizations attempted (so far) No lextree implementation (What?) Static allocation of all HMMs; not allocated “on demand” (Oh, no!) FSG transitions represented by NxN matrix (You can’t be serious!!) Speed/Memory usage profile needs to be evaluated Mostly new set of data structures, separate from existing ones Should be easily ported to Sphinx3
12
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)12 Implementation: FSG Expansion to HMMs word1 word2 12 0 p1p2p3p4q1q2q3 0 12 word1 word2
13
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)13 Implementation: Triphone HMMs p1p2p3p4 0 1 word1 p1p2 p3p4 word1 p1’ p1’’ p4’ p4’’ 0 1 Multiple root HMMs for different left contexts Multiple leaf HMMs for different right contexts p1 p2 p1’ p1’’ p2’ p2’’ Special case for 2-phone words 1-phone words use SIL as right context
14
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)14 Possible Optimization: Lextrees p1p2p3p4q1q2q3 word1 wordN Lextree (associated with source state)
15
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)15 Possible Optimization: Path Pruning If there are two transitions with the same label into the same state, the one starting out with a worse score can be pruned But reconciling with lextrees is tricky, since labels are now blurred w w
16
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)16 Other Issues Pending Dynamic allocation and management of HMMs Implementation of absolute pruning Lattice generation N-best list generation …
17
15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)17 Where Is It? My copy of open source version of Sphinx2 Someone needs to update the sourceforge copy Html documentation has been updated
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.