Spoken Language Translation 1 Intelligent Robot Lecture Note.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Coupling between ASR and MT in Speech-to- Speech Translation Arthur Chan Prepared for Advanced Machine Translation Seminar.
DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Novel Reordering Approaches in Phrase-Based Statistical Machine Translation S. Kanthak, D. Vilar, E. Matusov, R. Zens & H. Ney ACL Workshop on Building.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
The Use of Speech in Speech-to-Speech Translation Andrew Rosenberg 8/31/06 Weekly Speech Lab Talk.
Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
A Low-Power Low-Memory Real-Time ASR System. Outline Overview of Automatic Speech Recognition (ASR) systems Sub-vector clustering and parameter quantization.
Statistical Machine Translation Part VIII – Log-Linear Models Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Direct Translation Approaches: Statistical Machine Translation
Graphical models for part of speech tagging
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.
8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Xinhao Wang, Jiazhong Nie, Dingsheng Luo, and Xihong Wu Speech and Hearing Research Center, Department of Machine Intelligence, Peking University September.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
The Use of Virtual Hypothesis Copies in Decoding of Large-Vocabulary Continuous Speech Frank Seide IEEE Transactions on Speech and Audio Processing 2005.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Tight Coupling between ASR and MT in Speech-to-Speech Translation Arthur Chan Prepared for Advanced Machine Translation Seminar.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
Tight Coupling between ASR and MT in Speech-to-Speech Translation Arthur Chan Prepared for Advanced Machine Translation Seminar.
Bayes Risk Minimization using Metric Loss Functions R. Schlüter, T. Scharrenbach, V. Steinbiss, H. Ney Present by Fang-Hui, Chu.
Juicer: A weighted finite-state transducer speech decoder
An overview of decoding techniques for LVCSR
Coupling between ASR and MT in Speech-to-Speech Translation
8.0 Search Algorithms for Speech Recognition
Tight Coupling between ASR and MT in Speech-to-Speech Translation
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
Statistical Machine Translation Papers from COLING 2004
Coupling between ASR and MT in Speech-to-Speech Translation
Dynamic Programming Search
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presenter : Jen-Wei Kuo
Presentation transcript:

Spoken Language Translation 1 Intelligent Robot Lecture Note

Spoken Language Translation 2 Intelligent Robot Lecture Note

Spoken Language Translation Spoken language translation (SLT) is to directly translate spoken utterances into another language. Major components ► Automatic Speech Recognition (ASR) ► Machine Translation (MT) ► Text-to-Speech (TTS) 3 ASR MT TTS Source Speech Source Sentence Target Sentence Target Speech 버스 정류장이 어디에 있나요 ? Where is the bus stop? Intelligent Robot Lecture Note

Spoken Language Translation In comparison with written language, ► Speech and especially spontaneous speech poses additional difficulties for the task of automatic translation. ► Typically, these difficulties are caused by errors of the speech recognition step, which is carried out before the translation process. ► As a result, the sentence to be translated is not necessarily well- formed from a syntactic point-of-view. Why a statistical approach for machine translation? ► Even without recognition errors, structures of spontaneous speech differ from those of written language. ► The statistical approach ◦ Avoid hard decisions at any level of the translation process ◦ For any source sentence, a translated sentence in the target language is guaranteed to be generated. 4 Intelligent Robot Lecture Note

Spoken Language Translation Coupling ASR to MT Motivation ► ASR cannot secure an error-free system ◦ One best of ASR could be wrong ◦ SLT must be designed robust to speech recognition errors ◦ MT could be benefited from wide range of supplementary information provided by ASR ► MT quality may depend on WER of ASR ◦ Strong correlation between recognition and translation quality ◦ WER of ASR decreases in a set of hypotheses ◦ Idea : Exploitation of more transcriptions SLT systems vary in the degree to which SMT and ASR are integrated within the overall translation process. 5 Intelligent Robot Lecture Note

Spoken Language Translation Coupling ASR to MT Loose coupling ► SMT uses ASR output (1-best, N-best, lattice, or confusion network) as input for 1-way module communication Tight coupling ► The whole search space of ASR and MT is integrated 6 ASR SMT TTS Source Speech 1-best, N-best, Lattice, or CN Target Sentence Target Speech ASR + SMT TTS Source Speech Target Sentence Target Speech Intelligent Robot Lecture Note

Spoken Language Translation Coupling ASR to MT Statistical spoken language translation ► Given a speech input x in the source language, find the best translation e ► F(o) is a set of possible transcriptions ◦ Loose coupling : 1-best, N-best, lattice, or confusion network ◦ Tight coupling : full search space ► Pr(f,e|x) : speech translation model ◦ Acoustic and translation features 7 Intelligent Robot Lecture Note

Spoken Language Translation Coupling ASR to MT Loose coupling vs. Tight couplings 8 Loose CouplingTight Coupling Modularity of Knowledge Sources Each KS in stand- alone module All KSs integrated in single model Inter-module Communication Typically one-way (pipelined) N/A ScalabilityEasyNot easy ComplexityFeasible Feasible only for very small domains Intelligent Robot Lecture Note

Spoken Language Translation ASR Outputs Automatic speech recognition (ASR) is a process by which an acoustic speech signal is converted into a set of words. Architecture 9 Feature Extraction Decoding Acoustic Model Pronunciation Model Language Model Speech Signals ASR outputs ( 1-best, N-best, Lattice, or CN ) Network Construction Speech DB Text Corpora HMM Estimation G2P LM Estimation Intelligent Robot Lecture Note

Spoken Language Translation ASR Outputs Network Structure Decoding of HMM-based ASR ► Searching the best path in a huge HMM-state lattice 10 ONETWOONE THREE ONE TWO THREE ONE Sentence HMM WAHN ONE Word HMM Phone HMM W Intelligent Robot Lecture Note

Spoken Language Translation ASR Outputs 1-best ► The best path could find from back tracking ► Why a 1-best “word” sequence? ◦ Storing the backtracking pointer table for state sequence takes a lot of memory ◦ Usually a backtrack pointer storing : The previous words before the current word N-best ► Traceback not only from the 1 st -best, also from the 2 nd best and 3 rd best, etc. ► Methods ◦ Directly from search backtrack pointer table – Exact N-best algorithm, Word pair N-best algorithm, A* search using Viterbi score as heuristic ◦ Generate lattice first, then generate N-best from lattice 11 Intelligent Robot Lecture Note

Spoken Language Translation ASR Outputs Lattice ► A word-based lattice ◦ A compact representation of state-lattice ◦ Only word node are involved ► From the decoding backtracking pointer table ◦ Only record all the links between word nodes ► From N-best list ◦ Become a compact representation of N-best 12 Intelligent Robot Lecture Note

Spoken Language Translation ASR Outputs Confusion Network (L. Mangu et al., 2000) ► Or “Sausage Network” ► Or “Consensus Network” ► A weighted directed graph with a start node, an end node, and word labels over its edges ► Each path from the start node to the end node goes through all the other nodes ► From lattice ◦ Multiple alignment 13 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : 1-best The best hypothesis produced by the ASR system is passed as a text to the MT system. ► Baseline ► Simple structure ► Fast translation The speech recognition module and translation module are running rather independently ► Lacks joint optimality No use of multiple transcriptions ► Supplementary information easily available from the ASR system were not exploited in the translation process 14 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : 1-best Structure ► Recognition ► Translation 15 ASR SMT TTS Source Speech 1-best Target Sentence Target Speech Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : N-best N hypotheses are translated by a text MT decoder and re-ranked according to ASR & SMT scores (R. Zhang et al., 2004) Structure 16 ASR SMT Rescore Source Speech N-best NxM translation Best translation Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : N-best ASR module ► To generate N-best speech recognition hypotheses ► : n-th best speech recognition hypothesis SMT module ► To generate M-best translation hypotheses ► : m-th best translation hypotheses produced from Rescore module ► To rescore all NXM translations ► Key component ► Log linear model ◦ Features derived from ASR and SMT are combined in this module to rescore translation candidates. 17 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : N-best Rescore : Log-linear models ► : all possible translation hypotheses ► : m-th feature in log value ◦ ASR features : acoustic model, source language model ◦ SMT features : target language model, phrase translation model, distortion model, length model, … ► : weight of each feature 18 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : N-best Parameter optimization (F.J. Och, 2003) ► Objective function ► : translation output after log-linear model rescoring ► : references of English sentences ► : automatic translation quality metrics ◦ BLUE : A weighted geometric mean of the n-gram matches between test and reference sentences plus a short sentence penalty ◦ NIST : An arithmetic mean of the n-gram matches between test and reference sentences ◦ mWER : multiple reference word error rate ◦ mPER : multiple reference position independent word error rate 19 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : N-best Parameter optimization : Direction Set Methods 20 Change initial lambda Local optimization Change Direction Local lambda Best lambda Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Lattice Lattice-based MT ► Input ◦ Word lattices produced by the ASR system ► Directly integrate all models in the decoding process ◦ Phrase based lexica, single word based lexica, recognition features ► Problem ◦ How to translate the word lattices? Approach ► Joint probability approach ◦ WFST (E. Matusov et al., 2005) ► Phrase-based approach ◦ Log-linear model (E. Matusov et al., 2005) ◦ WFST (L. Mathias et al., 2006) 21 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Lattice Structure 22 ASR Rescore Source Speech Best translation Word lattice Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Lattice From the derived decision rule ► : Standard acoustic model ► : Target language model ► : Translation model Source language model? ► To take into account requirement for the well-formedness of the source sentence, the translation model has to include context dependency on the previous source words ► This dependency for the whole sentence can be approximated by including a source language model 23 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Lattice (Joint Probability Approach : WFST) Joint probability approach ► The conditional probability term and can be rewritten when using a joint probability translation model ► This simplifies coupling the systems ◦ The joint probability translation model can be used instead of the usual LM in ASR 24 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Lattice (Joint Probability Approach : WFST) WFST-Based Joint Probability System ► The joint probability MT system is implemented with WFST ► First, the training corpus is transformed based on a word alignment ► Then, a statistical m-gram model is trained on the bilingual corpus ► This language model is represented as a finite-state transducer which is the final translation model 25 vorrei|I’d_like del|some gelato|ice_cream per|ε favore|please Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Lattice (Joint Probability Approach : WFST) WFST-Based Joint Probability System ► Searching for the best target sentence is done in the composition of the input represented as a WFST and the translation transducer. ► Coupling the FSA system with ASR is simple ◦ The output of the ASR represented as WFST can be used directly as input to the MT search ◦ Feature – Only Acoustic, translation probability ◦ The Source LM scores are not included – The joint m-gram translation probability serve as a source LM 26 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : Log-linear Model) Probability distributions are represented as features in a loglinear model ► The translation model probability is decomposed into several probabilities ► Acoustic model and source langue model probabilities are also included ► For a hypothesized recognized source sentence f 1 J and a hypothesized translation e 1 I, let k → (j k, i k ), k = 1,…,K be a monotone segmentation of the sentence pair into K bilingual phrases 27 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : Log-linear Model) Features ► The m-gram target langue model ► The phrasal lexicon models ◦ The phrase translation probabilities are computed as a log-linear interpolation of the relative frequencies ► The single word based lexicon models 28 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : Log-linear Model) Features (con’t) ► c1, c2 : word, phrase penalty feature ► The recognition model ◦ The acoustic model probability ◦ The m-gram source langue model probability Optimization ► All features are scaled with a set of exponents λ = {λ 1,…,λ 7 } and μ = {μ 1,μ 2 }. ► The scaling factors are optimized in a minimum error training framework iteratively by performing 100 to 200 translations of a development set ► The criterion : WER, BLEU, mWER, mPER 29 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : Log-linear Model) Practical aspects of lattice translation ► Generation of Word Lattices ◦ In a first step, We mapped all entities that were not spoken words onto the empty arc label ε ◦ The time information is not used - Remove it from the lattices ◦ The structure is compressed by applying ε-removal, determinization, and minimization ◦ This step significantly reduced runtime without changing the results ► Phrase Extraction ◦ The number of different phrase pairs is very large ◦ Candidate phrase pairs have to be kept in main memory ◦ In case of ASR word lattice input, the lattice for each test utterance is traversed, and only phrases which match sequences of arcs in the lattice are extracted ◦ Thus only phrases which can be used in translation will be loaded 30 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : Log-linear Model) Practical aspects of lattice translation (Con’t) ► Pruning ◦ A word lattice of high density as input → an enormous search space → pruning is necessary ◦ Coverage pruning and histogram pruning ◦ Based on the total costs of a hypothesis ◦ It may also be necessary to prune the input word lattices Advantage ► The utilization of multiple features ► The direct optimization for an objective error measure Disadvantage ► A less efficient search ► Heavy pruning unavoidable 31 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : WFST) Statistical Modeling for Text Translation ► Ω : All foreign phrase sequences that could have generated the foreign text ► The translation system effectively translates phrase sequences, rather than word sequences ◦ This is done by first mapping the sentence into all its phrase sequences 32 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : WFST) Phrase Sequence Lattice contains the phrase sequence that can be extracted from the text ► All phrase sequences correspond to the unique foreign sentence ► Here, a phrase is a sequence of word which can be translated ► Different phrase sequences lead to different translations ► The lattice is unweighted 33 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : WFST) Statistical Modeling for Speech Translation ► The Target Phrase Mapping transducer is applied to the foreign language ASR word lattice ► L·Ω : The likely foreign phrase sequences that could have generated the foreign speech ► The translation system still effectively translates phrase sequences, rather than word sequences ◦ These are extracted from the ASR lattice, with ASR score, rather than from a text sentence 34 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : WFST) Phrase Sequence Lattice contains the phrase sequences that can be extracted from the text ► Phrase sequences correspond to the translatable word sequences in the lattice ► The lattice contains weights from the ASR system ► Translating this foreign phrase lattice is MAP translation of the foreign speech under the generative model 35 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Lattice (Phrase-based Approach : WFST) Spoken language translation is recast as an ASR analysis problem in which the goal is to extract translatable foreign language phrases from ASR word lattices ► Step 1. Perform foreign language ASR to generate a foreign language word lattice L ► Step 2. Analyze the foreign language word lattice and extract the phrases to be translated ► Step 3. Build the target language phrase mapping transducer Ω ► Step 4. Compose L and to create the foreign language ASR Phrase Lattice Ω ► Step 5. Translate the foreign language phrase lattice ASR and MT must be very compatible for this approach 36 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Confusion Network CN-based decoder (N. Bertoldi et al., 2005) ► Input ◦ Confusion network represented as a matrix ◦ Text vs. CN – Text – CN ► Problem ◦ How to translate confusion network input? 37 나는소년입니다. 나 1.0 는 0.7 은 0.3 소녀 0.6 소년 0.4 입니다 0.5 입니까 0.3 합니다 ? 0.2 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Confusion Network Solution ► Simple! ► CN-based SLT decoder can be developed starting from phrase-based SMT decoder ► CN-based SLT decoder is substantially the same as the phrase-based SMT decoder apart from the way the input is managed Compare to N-best methods ► N-best Decoder ◦ Does not advantage from overlaps among N-best ► CN Decoder ◦ Exploits overlaps among hypotheses 38 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Confusion Network Phrase-based Translation Model ► Phrase ◦ Sequence of consecutive words ► Alignment ◦ Map between CN and target phrases one word per column aligned with a target phrase ► Search criterion ► is a log-linear phrase- based model 39 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Confusion Network Log-Linear Phrase-based Translation Model ► The conditional distribution is determined through suitable real valued feature functions, and takes the parametric form: ► Feature functions ◦ Language model ◦ Fertility models ◦ Distortion models ◦ Lexicon model ◦ Likelihood of the path within CN ◦ True length of the path 40 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Confusion Network Step-wise translation process ► Translation is performed with a step-wise process ► Each step translates a sub-CN and produces a target phrase ► The process starts with a empty translations ► After each step, we get a partial translation ► A partial translation is complete if the whole input CN is translated Complexity Reduction ► Recombining theories ► Beam search ► Reordering constraints ► Lexicon pruning ► Confusion network pruning 41 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Confusion Network Algorithms 42 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling : Confusion Network Step-wise translation process 43 Intelligent Robot Lecture Note

Spoken Language Translation Loose Coupling 1-BestN-BestLatticeCN Multiple hypotheses?XOOO ASR features into MT decoding? XXOO Overlaps among hypotheses? XOOX Approximation for word lattice? XOXO 44 Intelligent Robot Lecture Note

Spoken Language Translation Tight Coupling Theory (H. Ney, 1999) ► Three factors ◦ Pr(e) : target language model ◦ Pr(f|e) : translation model ◦ Pr(x|f) : acoustic model 45 Baye’s Rule Introduce f as hidden variable Baye’s Rule Assume x doesn’t depend on target language Sum to Max Intelligent Robot Lecture Note

Spoken Language Translation Tight Coupling ASR vs. Tight Coupling (SLT) ► Brute Force Method ◦ Instead of incorporating LM into standard Viterbi algorithm, incorporating P(e) and P(f|e) ◦ Very complicated ◦ Not feasible 46 ASR vs SLT Acoustic Model Acoustic Model Source LM Source LM Acoustic Model Acoustic Model Target LM Target LM Translation Model Translation Model Intelligent Robot Lecture Note

Spoken Language Translation Tight Coupling WFST-Based Joint Probability System (Fully integration) ► The ASR search network ◦ A composition of WFSTs ◦ : the HMM topology ◦ : the context-dependency ◦ : the lexicon ◦ : the LM ◦ Only need to replace the source LM by the translation model ► Speech translation search network ST ► Result ◦ Small improvement of translation quality ◦ But, very slow 47 Intelligent Robot Lecture Note

Spoken Language Translation Tight Coupling Bleu scores against lattice density (S.Saleem et al, 2004) ► Improvements from tighter coupling may only be observed when ASR lattices are sparse, i.e. when there are only few hypothesized words per spoken word in the lattice ► This would mean that a fully integrated speech translation would not work at all. 48 Intelligent Robot Lecture Note

Spoken Language Translation Tight Coupling Possible issues of tight coupling ► In ASR, source n-gram LM is very closed to the best configuration ► The complexity of the algorithm is too high, approximation is still necessary to make it work ► The current approaches still haven’t really implement tight-coupling Conclusion ► The approach seem to be haunted by very high complexity of search algorithm construction 49 Intelligent Robot Lecture Note

Spoken Language Translation Reading List L. Mangu, E. Brill, A. Stolcke Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Computer Speech and Language 14(4), V. H. Quan, M. Federico, M. Cettolo Integrated N-best Re- ranking for Spoken Language Translation. EuroSpeech. R. Zhang, G. Kikui, H. Yamamoto, T. Watanabe, F. Soong, and W. K. Lo A unified approach in speech-to-speech translation: Integrating features of speech recognition and machine translation. In Proc. of Coling F.J. Och Minimum Error Rate Training in Statistical Machine Translation. In Proc. of ACL. E. Matusov, S. Kanthak, and H. Ney On the Integration of Speech Recognition and Statistical Machine Translation. in Proc. Interspeech E. Matusov, H. Ney, R. Schluter Phrase-based Translation of Speech Recognizer Word Lattices Using Loglinear Model Combination. ASRU Intelligent Robot Lecture Note

Spoken Language Translation Reading List E. Matusov, H. Ney, R. Schluter Integrating Speech Recognition And Machine Translation : Where Do We Stand. ICASSP L. Mathias, W. Byrne Statistical Phrase-based Speech Translation. ICASSP N. Bertoldi, M. Federico A new decoder for spoken language translation based on confusion networks. in IEEE ASRU Workshop. H. Ney Speech translation: Coupling of recognition and translation. in Proc. ICASSP. S.Saleem, S. C. Jou, S. Vogel, and T. Schultz, Using word lattice information for a tighter coupling in speech translation systems. in Proc. ICSLP, Intelligent Robot Lecture Note