Tight Coupling between ASR and MT in Speech-to-Speech Translation Arthur Chan Prepared for Advanced Machine Translation Seminar
This Seminar Introduction (4 slides)
A Conceptual Model of Speech- to-Speech Translation Speech Recognizer Machine Translator Speech Synthesizer waveforms Decoding Result(s) Translation waveforms
Motivation of Tight Coupling between ASR and MT One best of ASR could be wrong MT could be benefited from wide range of supplementary information provided by ASR N-best list Lattice Sentenced/Word-based Confidence Scores E.g. Word posterior probability Confusion network Or consensus decoding (Mangu 1999) Some observed that MT quality depends on WER.
Scope of this talk Speech Recognizer Machine Translator Speech Synthesizer waveforms 1-best? Translation waveforms Lattice? N-best? Confusion network? 1, Should we combine the two? 2, How tight should be the coupling?
Topics Covered Today The concept of Coupling The “tightness” of coupling between ASR and X (Ringger 95) Interfaces between ASR and MT in loose coupling What could ASR provide? What could MT use? Very tight coupling Ney’s formulae AT&T Approach Combination of features of ASR and MT Direct Modeling
The Concept of Coupling
Classification of Coupling of ASR and Natural Language Understanding (NLU) Proposed in Ringger 95, Harper 94 3 Dimensions of ASR/NLU Complexity of the search algorithm Simple N-gram? Incrementality of the coupling On-line? Left-to-right? Tightness of the coupling Tight? Loose? Semi-tight?
Tightness of Coupling Tight Semi- Tight Loose
Summary of Coupling between ASR and NLU
Implication on ASR/MT coupling Generalize many systems Loose coupling Any system which uses 1-best, n-best, lattice for 1-way module communication Tight coupling AT&T FST-based system Semi-tight coupling [Filled in a quote here]
Interfaces in Loose Coupling
Perspectives What output could an ASR generates? Not all of them are used but it could mean opportunity in future. What algorithms could MT uses given a certain inputs? On-line algorithm is a focus
Decoding of HMM-based ASR Searching the best path in a huge HMM-state lattice. 1-best ASR result The best path one could find from backtracking. State Lattice (Next page)
Things one could extract from the state lattice From the backtracking information: N-best list The N best decoding results from the state lattice Lattice A lattice of the decoding but in the word level From the lattice N-best list Confusion network. Or “consensus decoding” (Mangu 99)
Other things one could extract from the decoder Begin time and end time Useful in time-sensitive application E.g. multi-modal applications Sentence/Word-based Confidence Scores Found to be pretty useful in many other occasions
Experimental Results
How MT used the output? What decoding algorithms are using?
Tight Coupling
Literature Eric K. Ringger, “A Robust Loose Coupling for Speech Recognition and Natural Language Understanding”, Technical Report 592, Computer Science Department, Rochester University, 1995 [The AT&T paper]