Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Hierarchical Dirichlet Trees for Information Retrieval Gholamreza Haffari Simon Fraser University Yee Whye Teh University College London NAACL talk, Boulder,
The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada
Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
Bag-Of-Word normalized n-gram models ISCA 2008 Abhinav Sethy, Bhuvana Ramabhadran IBM T. J. Watson Research Center Yorktown Heights, NY Presented by Patty.
Vamshi Ambati | Stephan Vogel | Jaime Carbonell Language Technologies Institute Carnegie Mellon University A ctive Learning and C rowd-Sourcing for Machine.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
1 Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski Natural Language Lab Simon Fraser university Homotopy-based Semi- Supervised Hidden Markov.
Using Trees to Depict a Forest Bin Liu, H. V. Jagadish EECS, University of Michigan, Ann Arbor Presented by Sergey Shepshelvich 1.
Hybridity in MT: Experiments on the Europarl Corpus Declan Groves 24 th May, NCLT Seminar Series 2006.
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Improving Word-Alignments for Machine Translation Using Phrase-Based Techniques Mike Rodgers Sarah Spikes Ilya Sherman.
1 Language Model (LM) LING 570 Fei Xia Week 4: 10/21/2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA A A.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Experimental Evaluation
Scalable Text Mining with Sparse Generative Models
Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.
© 2014 The MITRE Corporation. All rights reserved. Stacey Bailey and Keith Miller On the Value of Machine Translation Adaptation LREC Workshop: Automatic.
An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.
SI485i : NLP Set 3 Language Models Fall 2012 : Chambers.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Active Learning for Class Imbalance Problem
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
1 Gholamreza Haffari Simon Fraser University PhD Seminar, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
Korea Maritime and Ocean University NLP Jung Tae LEE
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Yajuan Lü, Jin Huang and Qun Liu EMNLP, 2007 Presented by Mei Yang, May 12nd, 2008 Improving SMT Preformance by Training Data Selection and Optimization.
1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs SHONOSUKE ISHIWATARI NOBUHIRO KAJI NAOKI YOSHINAGA.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.
Statistical Machine Translation Part II: Word Alignments and EM
Xiaolin Wang Andrew Finch Masao Utiyama Eiichiro Sumita
Triangular Architecture for Rare Language Translation
Adaptive2 Language Model
Statistical Machine Translation Papers from COLING 2004
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presented by: Anurag Paul
Neural Machine Translation by Jointly Learning to Align and Translate
Presentation transcript:

Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL talk, Boulder, June 2009

2 The Problem Statistical Machine Translation (SMT) M F  E is a standard log-linear model and is composed of two main components: –Phrase tables –Language model Good phrase tables are typically learned from large bilingual (F,E)-text –What if we don’t have large bilingual text? MFEMFE Language F Language E

3 A Solution Suppose we are given a large monolingual text in the source language F Pay a human expert and ask him/her to translate these sentences into the target language E –This way, we will have a bigger bilingual text But our budget is limited ! –We cannot afford to translate all monolingual sentences

4 A Better Solution Choose a subset of monolingual sentences for which: if we had the translation, the SMT performance would increase the most Only ask the human expert for the translation of these highly informative sentences This is the goal of Active Learning –Workshop on Active Learning for NLP

5 Active Learning for SMT Train MFEMFE Bilingual text F F E E Monolingual text Decode Translated text F F E E Translate by human F F E E F F Select Informative Sentences Select Informative Sentences Re- For more details, see the paper

6 Outline General idea of active learning (AL) for statistical machine translation (SMT) Sentence Selection Strategies –Similarity, Decoder’s Confidence –Hierarchical Adaptive Sampling –Sentence merit based on the translation units Experiments –The simulated AL setting –The real AL setting

7 Intuitive Underpinnings for Sent. Selection Sentences for which the model is not confident about their translations –Hopefully high confident translations are good ones Sentences similar to bilingual text are easy to translate by the model –Select the dissimilar ones to the bilingual text Cluster monolingual sentences –Choose some representative sentences for each cluster

8 Sentence Selection strategies Baseline: Randomly choose sentences from the pool of monolingual sentences Previous Work: Decoder’s confidence for the translations (Kato & Barnard, 2007) Our proposed methods: –Similarity to the bilingual training data –Reverse model –Hierarchical Adaptive Sampling (HAS) –Utility of the translation units

9 Sentence Selection strategies Baseline: Randomly choose sentences from the pool of monolingual sentences Previous Work: Decoder’s confidence for the translations (Kato & Barnard, 2007) Our proposed methods: –Similarity to the bilingual training data  Reverse model  Hierarchical Adaptive Sampling (HAS)  Utility of the translation units

10 Reverse Model Comparing –the original sentence, and –the final sentence Tells us something about the value of the sentence I will let you know about the issue later Je vais vous faire plus tard sur la question I will later on the question MEFMEF Rev: M F  E

11 Hierarchical Adaptive Sampling U 0 : Monolingual sentences U1U1 U2U2 U 2,2 U 2,1 Average Decoder’s Score Sort sentences wrt similarity to the Bilingual text Sample sentences from these two nodes MFEMFE Bilingual text F F E E (Dasgupta & Hsu, 2008)

12 Utility of the Translation Units Phrases are the basic units of translations in phrase-based SMT I will let you know about the issue later Monolingual Text Bilingual Text The more frequent a phrase is in the monolingual text, the more important it is The more frequent a phrase is in the bilingual text, the less important it is

13 Generative Models for Phrases Monolingual TextBilingual Text Count Probability CountProbability mm bb

14 Averaged Probability Ratio Score For a monolingual sentence S –Consider, the bag of its phrases –Score: Normalized probability ratio P(S|  m )/P(S|  b ) –We will refer to it as Geom-Phrase Dividing the phrase probabilities captures our intuition about the utility of the translation units

15 Sentence Segmentation How to prepare the bag of phrases for a sentence S? –For the bilingual text, we have the segmentation from the training phase of the SMT model –For the monolingual text, we run the SMT model to produce the top-n translations and the corresponding segmentations

16 Extensions of the Score Instead of using phrases, we may use n-grams We may alternatively use the following score –We will refer to it as Arithmetic Average

17 Sentence Selection strategies (Recap) Baseline: Randomly choose sentences from the pool of monolingual sentences Previous Work: Decoder’s confidence for the translations (Kato & Barnard, 2007) Our proposed methods:  Similarity to the bilingual training data  Reverse model  Hierarchical Adaptive Sampling (HAS)  Utility of the translation units

18 Outline General idea of active learning (AL) for statistical machine translation (SMT) Sentence Selection Strategies –Similarity, Decoder’s Confidence –Hierarchical Adaptive Sampling –Sentence merit based on the translation units Experiments –The simulated AL setting –The real AL setting

19 Experimental Setup Dataset size: We select 200 (or 100) sentences from the monolingual sentence set for 25 (or 5) iterations We use Portage from NRC as the underlying SMT system (Ueffing et al, 2007) Bilingual textMonolingual Texttest Bangla-English11K20K1K Fr,Gr,Sp-English5K20K2K

20 The Simulated AL Setting Geometric Phrase Random Decoder’s Confidence Better

21 The Real AL Setting Our human translator is different from the text author –The methods are good at adapting to the new writing style Geometric Phrase Random

22 Domain Adaptation Now suppose the both test and monolingual text are out-of-domain with respect to the bilingual text –The ‘Decoder’s Confidence’ does a good job –The ‘Geom 1-gram’ outperforms other methods since it quickly expands the lexicon set in an effective manner Geom 1-gram Random Decoder’s Conf

23 Analysis The coverage of the bilingual text is important but is not the only factor –Notice the Geom 1-gram and Geom-phrase methods Coverage

24 Analysis

25 Conclusions We presented different sentence selection methods for SMT in an AL setting Using knowledge about the internal architecture of the SMT system is crucial Yet, we are after better sentence selection strategies –See our upcoming paper in ACL09

26 Merci Thank You

27 Domain Adaptation Selecting sentences based on: –The ‘Confidence’ does a good job –The ‘1-gram’ outperforms other methods since it quickly expands the lexicon set in an effective manner MethodBleu% per% wer% Geom 1-gram Confidence Random

28 The Simulated AL Setting Language PairGeometric Average Bleu% per% wer% Random (Baseline) Bleu% per% wer% French-English German-English Spanish-English Using other measure other than BLEU –wer: word error rate –per: position independent word error rate