1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.

Slides:



Advertisements
Similar presentations
1 Bayesian Adaptation in HMM Training and Decoding Using a Mixture of Feature Transforms Stavros Tsakalidis and Spyros Matsoukas.
Advertisements

Rapid and Accurate Spoken Term Detection David R. H. Miller BBN Technolgies 14 December 2006.
歡迎 IBM Watson 研究員 詹益毅 博士 蒞臨 國立台灣師範大學. Hai-Son Le, Ilya Oparin, Alexandre Allauzen, Jean-Luc Gauvain, Franc¸ois Yvon ICASSP 2011 許曜麒 Structured Output.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Novel Reordering Approaches in Phrase-Based Statistical Machine Translation S. Kanthak, D. Vilar, E. Matusov, R. Zens & H. Ney ACL Workshop on Building.
Investigation on Mandarin Broadcast News Speech Recognition Mei-Yuh Hwang, Xin Lei, Wen Wang*, Takahiro Shinozaki University of Washington, *SRI 9/19/2006,
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
WEB-DATA AUGMENTED LANGUAGE MODEL FOR MANDARIN SPEECH RECOGNITION Tim Ng 1,2, Mari Ostendrof 2, Mei-Yuh Hwang 2, Manhung Siu 1, Ivan Bulyko 2, Xin Lei.
Scalable Text Mining with Sparse Generative Models
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
The CMU-UKA Statistical Machine Translation Systems for IWSLT 2007 Ian Lane, Andreas Zollmann, Thuy Linh Nguyen, Nguyen Bach, Ashish Venugopal, Stephan.
1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.
1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.
Matthew Snover (UMD) Bonnie Dorr (UMD) Richard Schwartz (BBN) Linnea Micciulla (BBN) John Makhoul (BBN) Study of Translation Edit Rate with Targeted Human.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Achieving Domain Specificity in SMT without Over Siloing William Lewis, Chris Wendt, David Bullock Microsoft Research Machine Translation.
March 24, 2005EARS STT Workshop1 A Study of Some Factors Impacting SuperARV Language Modeling Wen Wang 1 Andreas Stolcke 1 Mary P. Harper 2 1. Speech Technology.
Large Language Models in Machine Translation Conference on Empirical Methods in Natural Language Processing 2007 報告者:郝柏翰 2013/06/04 Thorsten Brants, Ashok.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
1 Machine Translation MIRA and MBR Stephan Vogel Spring Semester 2011.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
A daptable A utomatic E valuation M etrics for M achine T ranslation L ucian V lad L ita joint work with A lon L avie and M onica R ogati.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
1 Using TDT Data to Improve BN Acoustic Models Long Nguyen and Bing Xiang STT Workshop Martigny, Switzerland, Sept. 5-6, 2003.
Rapid and Accurate Spoken Term Detection Michael Kleber BBN Technologies 15 December 2006.
Improving out of vocabulary name resolution The Hanks David Palmer and Mari Ostendorf Computer Speech and Language 19 (2005) Presented by Aasish Pappu,
1 Using a Large LM Nicolae Duta Richard Schwartz EARS Technical Workshop September 5, Martigny, Switzerland.
AQUAINT Herbert Gish and Owen Kimball June 11, 2002 Answer Spotting.
Effective Use of Linguistic and Contextual Information for Statistical Machine Translation Libin Shen and Jinxi Xu and Bing Zhang and Spyros Matsoukas.
1 Update on WordWave Fisher Transcription Owen Kimball, Chia-lin Kao, Jeff Ma, Rukmini Iyer, Rich Schwartz, John Makhoul.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010.
1 DUTIE Speech: Determining Utility Thresholds for Information Extraction from Speech John Makhoul, Rich Schwartz, Alex Baron, Ivan Bulyko, Long Nguyen,
An Unsupervised Approach for the Detection of Outliers in Corpora David Guthrie Louise Guthire, Yorick Wilks The University of Sheffield.
Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
Using Neural Network Language Models for LVCSR Holger Schwenk and Jean-Luc Gauvain Presented by Erin Fitzgerald CLSP Reading Group December 10, 2004.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
H ADVANCES IN MANDARIN BROADCAST SPEECH RECOGNITION Overview Goal Build a highly accurate Mandarin speech recognizer for broadcast news (BN) and broadcast.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
N-best list reranking using higher level phonetic, lexical, syntactic and semantic knowledge sources Mithun Balakrishna, Dan Moldovan and Ellis K. Cave.
A CASE STUDY OF GERMAN INTO ENGLISH BY MACHINE TRANSLATION: MOSES EVALUATED USING MOSES FOR MERE MORTALS. Roger Haycock 
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
Monoligual Semantic Text Alignment and its Applications in Machine Translation Alon Lavie March 29, 2012.
Transformer result, convolutional encoder-decoder
Statistical Machine Translation Papers from COLING 2004
Presentation transcript:

1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul

2 Why need LM adaptation in MT?  Model genre/style variations  LM training data sources are not homogeneous –Largest corpora (e.g. Gigaword), which may not be the most relevant, dominate when n-gram counts are merged  Style/topics vary depending on –source of data (i.e. publisher) –epoch –original medium (newswire vs. broadcast) –processing (data created in English vs. translated from other languages)

3 Types of Adaptation Studied Here  LM interpolation: combine LMs by interpolating their probabilities with weights estimated by –Minimizing perplexity on a tuning set (supervised) –Minimizing perplexity on 1-best output (unsupervised) –Optimizing MT performance criteria (TER or BLEU) directly using tuning set n-best lists (discriminative)  Log-linear combination of scores from different LMs

4 LM Adaptation via Linear Interpolation  Build separate LMs from each training corpus  Choose a tuning set similar in topics/style to the test material  Interpolate corpus LMs using weights estimated by minimizing perplexity on the tuning set Corpus N Estimate LM Corpus 1 Estimate LM … … Corpus LM N Corpus LM 1 … Interpolate LMs Interpolated LM Tuning set

5 Unsupervised Adaptation  Use unadapted 3-gram LM to decode input, either one document at a time or the entire test set  Produce n-best lists and 1- best hypotheses  Adapt the 5-gram LM by optimizing LM interpolation weights on the 1-best  Rescore n-best lists with the adapted 5-gram LM MT decoder N-best lists 1-best hypotheses LM adaptation Adapted 5-gram LM Rescore Input: one document or entire test set Output: translations Unadapted 3-gramLM

6 Discriminative Adaptation  Hill climbing optimization (maxBLEU or minTER) using tuning set n-best lists –Log-linear space: treat each LM component as an independent knowledge source –Probability space: identical to standard LM interpolation

7 Evaluation Task  Translation from Arabic speech to English text –Broadcast News (BN) and Broadcast Conversations (BC) around 30K words in each genre –Both BN and BC are transcribed/translated jointly but scored separately –Translation performance is reported for both: reference transcriptions and STT outputs –Two scoring metrics used: BLEU and TER (translation edit rate, similar to WER, but allows phrase shifts)  Tuning –Tuning set (BNC-tune) similar to the test in epoch and sources –MT system is optimized using reference transcripts –Two sets of weights are computed with different optimization criteria: minTER and maxBLEU

8 Experimental Setup  BBN Arabic STT system –1300 hours of acoustic training (SCTM, MPFE, SAT) –1B words of LM training (2,3-gram decoding, 4-gram rescoring)  BBN MT translation engine –140M words of Arabic-English parallel text –6B words of English LM training data –Phrase translations are obtained by running GIZA++ –Phrase translations are generalized by using POS classes –Features used by the decoder include Backward and forward translation probabilities Pruned 3-gram LM score Penalty for phrase reordering Phrase segmentation score Word insertion penalty –N-best (n=300) lists rescored with unpruned 5-gram LM

9 English LM Training  English training texts totaling around 6B words –Gigaword v2 –News articles from on-line archives –UW Web corpus: web text of conversation-like style –CNN talk show transcripts from CNN.com –News articles from a variety of on-line publishers downloaded daily (02/ /2006) –English side of news portion of the parallel data  5-gram Kneser-Ney LM, without pruninig –4.3B n-grams

10 LM Component Weights after Adaptation LM component Size tokens Min perplexityDiscriminative Optimization setUnsupervised fromLog- linear* Prob. space MT02BNCref transstt hyps Gigaword(NYT)1.3B Gigaword(AFP)400M Gigaword(APW)900M Gigaword(XIN)200M NewsArchives1.3B ConvWeb650M CNNtrans60M DailyNews1.0B Parallel14M * The log-linear combination weights are normalized for ease of comparison.

11 MT Performance LM component weight optimization Reference transcriptionsSTT hypotheses BNBCBN WER=20.1% BC WER=29.7% TERBLEUTERBLEUTERBLEUTERBLEU Merged counts Min-perpl. on MT Min-perpl. on BNC Unsupervised, all docs Unsupervised, per doc Discrim., log-linear Discrim., prob. space

12 Conclusions and Future Work  LM adaptation leads to improvements (up to half a point in TER or BLEU) in MT performance from speech –Discriminative adaptation gave largest gains –Gains from unsupervised adaptation diminish as WER increases –Unsupervised adaptation at the document level did not outperform full test set adaptation  Larger impact from adaptation likely if: –Both decoding and rescoring LMs are adapted, especially if the decoding LM is pruned –Unsupervised adaptation is performed on groups of similar documents

13 Related Work at BBN  S. Matsoukas, I. Bulyko, B. Xiang, R. Schwartz and J. Makhoul, “Integrating speech recognition and machine translation,” ICASSP07. (lecture tomorrow, 3:45pm)  B. Xiang, J. Xu, R. Bock, I. Bulyko, J. Maguire, S. Matsoukas, A. Rosti, R. Schwartz, R. Weischedel and J. Makhoul, “The BBN machine translation system for the NIST 2006 MT evaluation,” presentation, NIST MT06 Workshop.  J. Ma and S. Matsoukas, “Unsupervised training on a large amount of Arabic broadcast news data,” ICASSP07. (poster this morning)  M. Snover, B. Dorr, R. Schwartz, L. Micciulla and J. Makhoul, “A study of translation edit rate with targeted human annotation,” in Proc. AMTA, 2006.