Presentation is loading. Please wait.

Presentation is loading. Please wait.

15-Jul-04 FSG Implementation in Sphinx2 FSG Implementation in Sphinx2 Mosur Ravishankar Jul 15, 2004.

Similar presentations


Presentation on theme: "15-Jul-04 FSG Implementation in Sphinx2 FSG Implementation in Sphinx2 Mosur Ravishankar Jul 15, 2004."— Presentation transcript:

1 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)1 FSG Implementation in Sphinx2 Mosur Ravishankar Jul 15, 2004

2 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)2 Outline  Input specification  FSG related API  Application examples  Implementation issues

3 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)3 FSG Specification  “Assembly language” for specifying FSGs  Low-level  Most standards should compile down to this level  Set of N states, numbered 0.. N-1  Transitions:  Emitting or non-emitting (aka null or epsilon)  Each emitting transition emits one word  Fixed probability 0 < p <= 1.  One start state, and one final state  Null transitions can effectively give you as many as needed  Goal: Find the highest likelihood path from the start state to the final state, given some input speech

4 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)4 An FSG Example FSG_BEGIN leg NUM_STATES 10 START_STATE 0 FINAL_STATE 9 # Transitions T 0 1 0.5 to T 1 2 0.1 city1 … T 1 2 0.1 cityN T 2 3 1.0 from T 3 4 0.1 city1 … T 3 4 0.1 cityN T 4 9 1.0 T 0 5 0.5 from T 5 6 0.1 city1 … T 5 6 0.1 cityN T 6 7 1.0 to T 7 8 0.1 city1 … T 7 8 0.1 cityN T 8 9 1.0 FSG_END to from to 1 2 3 67 9 5 0 4 8 city1 cityN city1 cityN city1 cityN city1 cityN  

5 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)5 A Better Representation  Composition of FSGs to from to 1 2 3 67 9 5 0 4 8   [city] pittsburgh 01 chicago boston buffalo seattle

6 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)6 Multiple Pronunciations and Filler Words  Alternative pronunciations added automatically  Filler word transitions (silence and noise) added automatically  A filler self-transition at every state  Noise words added only if noise penalty (probability) > 0 to from to 1 2 3 67 9 5 0 4 8   [city] [filler]

7 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)7 FSG Related API  Loading during initialization (i.e., fbs_init() ):  -fsgfn flag specifying an FSG file to load (similar to –lmfn flag)  Difference: FSG name is contained in the file  Dynamic loading:  char *uttproc_load_fsgfile(char *fsgfile); returns the FSG string name contained in the file  Switching to an FSG:  uttproc_set_fsg (char *fsgname);  Deleting a previously loaded FSG:  uttproc_del_fsg (char *fsgname);  Old demos could be run with FSGs, simply by recompiling with new libraries

8 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)8 Mixed LM/FSG Decoding Example  (See lm_fsg_test.c)

9 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)9 Another Example: Garbage Models  Extraneous speech could be absorbed using an allphone “garbage model” to from to 1 2 3 67 9 5 0 4 8   [city] [allphone]

10 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)10 B/W Training and Forced Alignment  Consolidate code for FSGs, Baum-Welch training, and forced alignment?  Sentence HMMs for training and alignment are essentially linear FSGs  Alternative pronunciations and filler words handled automatically  Differences:  B/W uses forward (and backward) algorithm instead of Viterbi  Alignment has to produce phone and state segmentation as well

11 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)11 Implementation  Straightforward expansion of word-level FSG into a triphone HMM network  Viterbi beam search over this HMM network  No major optimizations attempted (so far)  No lextree implementation (What?)  Static allocation of all HMMs; not allocated “on demand” (Oh, no!)  FSG transitions represented by NxN matrix (You can’t be serious!!)  Speed/Memory usage profile needs to be evaluated  Mostly new set of data structures, separate from existing ones  Should be easily ported to Sphinx3

12 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)12 Implementation: FSG Expansion to HMMs word1 word2 12 0 p1p2p3p4q1q2q3 0 12 word1 word2

13 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)13 Implementation: Triphone HMMs p1p2p3p4 0 1 word1 p1p2 p3p4 word1 p1’ p1’’ p4’ p4’’ 0 1 Multiple root HMMs for different left contexts Multiple leaf HMMs for different right contexts p1 p2 p1’ p1’’ p2’ p2’’ Special case for 2-phone words 1-phone words use SIL as right context

14 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)14 Possible Optimization: Lextrees p1p2p3p4q1q2q3 word1 wordN Lextree (associated with source state)

15 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)15 Possible Optimization: Path Pruning  If there are two transitions with the same label into the same state, the one starting out with a worse score can be pruned  But reconciling with lextrees is tricky, since labels are now blurred w w

16 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)16 Other Issues Pending  Dynamic allocation and management of HMMs  Implementation of absolute pruning  Lattice generation  N-best list generation  …

17 15-Jul-04 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)17 Where Is It?  My copy of open source version of Sphinx2  Someone needs to update the sourceforge copy  Html documentation has been updated


Download ppt "15-Jul-04 FSG Implementation in Sphinx2 FSG Implementation in Sphinx2 Mosur Ravishankar Jul 15, 2004."

Similar presentations


Ads by Google