Presentation is loading. Please wait.

Presentation is loading. Please wait.

Juicer: A weighted finite-state transducer speech decoder

Similar presentations


Presentation on theme: "Juicer: A weighted finite-state transducer speech decoder"— Presentation transcript:

1 Juicer: A weighted finite-state transducer speech decoder
D. Moore1, J. Dines1, M. Magimai Doss1, J. Vepa1, O. Cheng1 and T. Hain2 1 IDIAP Research Institute 2 Department of Computer Science, University of Sheffield

2 Overview The speech decoding problem Why develop another decoder?
WFST theory and practice What is Juicer? Benchmarking experiments The future of Juicer

3 The speech decoding problem
Given a recording and models of speech & language, generate a text transcription of what was said Decoder She had your dark suit…. Models

4 The speech decoding problem
Or…

5 The speech decoding problem
Or…

6 The speech decoding problem
ASR system building blocks Grammar N-gram language model Lexical knowledge pronunciation dictionary Phonetic knowledge context dependency phonological rules Acoustic knowledge state distributions Naive combination of these knowledge sources leads to a large, inefficient representation of the search space

7 The speech decoding problem
The main issue in decoding is carrying out an efficient search of the space defined by the knowledge sources Two ways we can do this: Avoid performing redundant search Don’t pursue unpromising hypotheses An additional issue: flexibility of the decoder

8 Why develop another decoder?
Need of a state-of-the-art speech decoder that is also suitable for on-going research At present, such software is not freely available to the research community Open-source development and distribution framework

9 WFST theory and practice
Maps sequences of input symbols to sequences of output symbols Transition pairs have an associated weight In the example: Input sequence I={a b c d} maps to output sequence O={X Y Z W}, with the path weight a function of all transition weights associated with that path, f(0.1,0.2,0.5,0.1)

10 WFST theory and practice WFST operations
Composition: Combination of transducers Determinisation: Only one transition per input label Minimisation: Least number of states and transitions Weight pushing to aid in minimisation

11 WFST theory and practice Composition

12 WFST theory and practice Determinisation

13 WFST theory and practice Weight pushing & minimisation

14 WFST theory and practice WFST and speech decoding
ASR system building blocks Grammar Lexical knowledge Phonetic knowledge Acoustic knowledge Each of these knowledge sources has a WFST representation

15 WFST theory and practice WFST and speech decoding
Requires some special considerations Lexicon and grammar composition can not be determinised Nor can the context dependency transducer where L, G, C are WFSTs for the grammar, lexicon and context dependency

16 WFST theory and practice WFST and speech decoding
Pros Flexibility Simple decoder architecture Optimised search space Cons Transducer size Knowledge sources are fixed during composition WFST-only knowledge sources

17 What is Juicer? A time-synchronous Viterbi decoder
Tools for WFST construction An interface between 3rd party FSM tools

18 State-to-phone transducer is not optimised
What is Juicer? Decoder Pruning beam search, histogram 1-best output word and model timing information Lattice generation phone level lattice output State-to-phone transducer is not optimised incorporated at run time

19 What is Juicer? WFST tools
gramgen word-loop, word pair, N-gram language models lexgen multiple pronunciations cdgen monophone, word-internal n-phone, cross-word triphone HTK CDHMM and hybrid HMM/ANN model support build-wfst composition, determinisation and minimisation using 3rd party tools (AT&T, MIT)

20 Benchmarking experiments
Experiments were conducted in order to: Compare with existing state-of-the-art decoders Assess the current capabilities and limitations of the decoder Guide future development and research directions

21 Benchmarking experiments 20k Wall Street Journal Task
Equivalent performance on wide beam settings HDecode wins out on narrow beam-widths Only part of the story…

22 Benchmarking experiments …but what’s the catch?
Composition of large static networks: Practically infeasible due to memory limitations Is slow And may not always be necessary System TOT Sub Del Ins P1.HDecode 41.1 21.1 14.7 5.3 P1.Juicer 43.5 23.0 13.7 7.8 P2.HDecode 33.1 15.9 13.4 3.9 P2.Juicer 34.5 16.9 13.6 4.0 Language # of arcs #of arcs FSM Time Model G L C Tool L o G C o L o G Required Pruned-07 4,145,199 127,048 1,065,766 AT&T + MIT 7,008,333 14,945,731 30mins Pruned-08 13,692,081 MIT 23,160,795 50,654,758 1:44 Pruned-09 35,895,383 59,626,339 120,060,629 5:38 Unpruned 98,288,579 DNF 10:33+

23 Benchmarking experiments AMI Meeting Room Recogniser
Decoding for the NIST Rich Transcription evaluations Juicer uses pruned LMs Good trade-off between RTF and WER Chosen operating point

24 The future of Juicer Further benchmarking Added capabilities
Testing against HDecode Trade off between pruned LMs and performance Added capabilities ‘On the fly’ network expansion Word lattice generation Support for MLLR transforms, feature transforms Distribution and support Currently only available to AMI and IM2 partners

25 Summary Questions? *** but more importantly ***
I have presented today… WFST theory and practice The Juicer tools and decoder Preliminary experiments *** but more importantly *** We hope to have generated interest in Juicer


Download ppt "Juicer: A weighted finite-state transducer speech decoder"

Similar presentations


Ads by Google