Encouraging Consistent Translation Choices Ferhan Ture, Douglas W. Oard, Philip Resnik University of Maryland NAACL-HLT’12 June 5, 2012 1.

Slides:

Advertisements

Similar presentations

Statistical Machine Translation

Advertisements

1 Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19,

Chapter 5: Introduction to Information Retrieval

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Improved TF-IDF Ranker

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

2 Information Retrieval System IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.

Word Sense Disambiguation for Machine Translation Han-Bin Chen

A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven

Information Retrieval Models: Probabilistic Models

1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.

Cross Language IR Philip Resnik Salim Roukos Workshop on Challenges in Information Retrieval and Language Modeling Amherst, Massachusetts, September 11-12,

EBMT1 Example Based Machine Translation as used in the Pangloss system at Carnegie Mellon University Dave Inman.

ACL, June Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard University of Maryland,

1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.

Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.

Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.

Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen

Achieving Domain Specificity in SMT without Over Siloing William Lewis, Chris Wendt, David Bullock Microsoft Research Machine Translation.

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.

INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.

Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.

Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.

NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.

Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.

Chapter 6: Information Retrieval and Web Search

10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.

Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,

Ibrahim Badr, Rabih Zbib, James Glass. Introduction Experiment on English-to-Arabic SMT. Two domains: text news,spoken travel conv. Explore the effect.

Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.

Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.

NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

Addressing the Rare Word Problem in Neural Machine Translation

ISchool, Cloud Computing Class Talk, Oct 6 th Computing Pairwise Document Similarity in Large Collections: A MapReduce Perspective Tamer Elsayed,

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.

Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.

A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.

Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.

Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.

A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.

Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.

January 2012Spelling Models1 Human Language Technology Spelling Models.

Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,

LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.

Named Entities in Domain Unlimited Speech Translation Alex Waibel, Stephan Vogel, Tanja Schultz Carnegie Mellon University Interactive Systems Labs.

Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Introduction to Parsing (adapted from CS 164 at Berkeley)

Neural Machine Translation by Jointly Learning to Align and Translate

Vector-Space (Distributional) Lexical Semantics

Technical translation

Zhifei Li and Sanjeev Khudanpur Johns Hopkins University

Word Embedding Word2Vec.

Memory-augmented Chinese-Uyghur Neural Machine Translation

Machine Translation(MT)

Word embeddings (continued)

Statistical NLP Spring 2011

Attention for translation

Neural Machine Translation by Jointly Learning to Align and Translate

Presentation transcript:

Encouraging Consistent Translation Choices Ferhan Ture, Douglas W. Oard, Philip Resnik University of Maryland NAACL-HLT’12 June 5,

Introduction MT systems typically operate at sentence- level Useful information available at higher levels Goal: “One translation per discourse” in MT (Carpuat’09) –similar to “one sense per discourse” in WSD 2

Related Work Limited focus on super-sentential context in MT Post-process translation output to impose heuristic ( Carpuat’09 ) Replace each ambiguous translation within document by most frequent one ( Xiao et al’11 ) Translation memory to find similar source sentences (Ma et al’11) Domain adaptation biases TM/LM using in-domain data ( Bertoldi&Federico’09, Hildebrand et al’05, Sanchis- Trilles&Casacuberta’10; Tiedemann’10; Zhao et al’04 ) 3

Exploratory Analysis Goal: Does bitext exhibit “one translation per discourse”? Forced decoding: Find most probable derivation (using SCFG) that produces source-target sentence pair Experiments on Ar-En MT08 dataset –assume discourse = document –74 documents / 813 sentences 4

Exploratory Analysis Method 5

Exploratory Analysis Counting cases [X1] ‘s fighters were killed nine [X1] killed [X1] that [X2] killed to kill [X1] killing of [X1] launch attacks in a in an attack [X1] [X2] assault [X1] a [X2] offensive to a into 's of … قتلوا مقتل 9 ]2[ مقتل قتل مقتل مقتل بهجوم بهجوم ]2[ بهجوم في … [1] Case Count Source phraseDoc # مقتول 566 killed = 2 killing of = 1 الرهائن 782 hostages = 2 الرهائن 138 hostage = 1 hostages = 2 من 30 from = 2 التي 30 the = 1 which = 1 NO YES NO 6

176 cases, occurring in 512 sentences (63% of test set) –consistent translation in 128/176 (73%) –analysis of remaining 48 cases: Exploratory Analysis Results 19 other words29 content-bearing words 7

Data supports “one translation per discourse”  potential for improvement Inconsistent translations may refer to stylistic choices  fixing such cases will not degrade accuracy Encourage consistency, do not enforce it –sentence structure conventions may require the same phrase to be translated differently Exploratory Analysis Conclusions 8

Approach Inspired by Information Retrieval (IR): count words in document … house … …caterpillar … House … cat… … houses … Dog … dogs word TFDF house 3 116/10 6 cat /10 6 caterpillar /10 6 dog /10 6 … X … …Y… X … … X … Y… Z … house … …caterpillar … House … cat… … houses … Dog … dogs pair TF DF X, house 3 116/10 6 X,cat /10 6 Y,caterpillar /10 6 Z,dog /10 6 Y,dog /10 6  count translations in document pair 9 Okapi bm25 term weight

Approach Goal: Encourage translation model towards consistency, given document-level translation information Three MT consistency features C 1, C 2, and C 3, each implementing a variant of this idea A two-pass decoding approach –first pass: perform translation without any consistency feature –second pass: compute a feature score for each rule, based on per-document counts from first pass, and add this to model 10

[X,1] ||| britain, [X,1] [X,1] ||| britain [X,1] [X,1] ||| uk [X,1] ||| britain ||| the uk بريطانيا R1:R2:R3:R4:R5:R1:R2:R3:R4:R5: count occurrence of string “LHS ||| RHS” for each used rule award more frequent rules C 1 : Counting rules count from first pass rule used in first pass 11

C 2 : Counting target tokens count each target token e of each used rule award more frequent and rare words e.g. [X,1] ||| uk [X,1] ||| the uk بريطانيا R3:R5:R3:R5: 12

count each target token e of each used rule award more frequent and rare words R 6 : [ X,1 ] الاخيرة علي [ X,2 ] ||| [ X,1 ] on a life support [ X,2 ] R 7 : يؤيد ||| support C 2 : Counting target tokens 13

C 3 : Counting token pairs count occurrence of each token pair aligned to each other in a used rule award more frequent pairs and rare target sides R 6 : [ X,1 ] الاخيرة علي [ X,2 ] ||| [ X,1 ] on a life support [ X,2 ] R 7 : يؤيد ||| support الاخيرة علي يؤيد 14

Evaluation Setup Experiments using cdec with Hiero -style SCFG GIZA++ for word alignments, MIRA for tuning feature weights, SRILM for 5-gram English LM Arabic-EnglishChinese-English Preprocesssimple punctuation + ATBv3 segmentation (lattice of two) Stanford segmenter Train3.4m sentences from GALE, NIST1.6m sentences from NIST TuneMT docs, 1797 sentencesMT docs, 878 sentences TestMT08 74 docs, 813 sentencesMT06 79 docs, 1664 sentences Baseline BLEU (4 references) st in MT th in MT06 15

Evaluation BLEU score improvement 16

Evaluation Case-by-case changes Sample 60 of 197 = 26 BLEU  14 BLEU  C 2 most aggressive (16+ 9-) C 1 most conservative in # changes (8+ 5-) C 3 good balance (16+ 4-) Any = C1 or C2 or C3 Method Arabic-EnglishChinese-English # cases% of test set# cases% of test set C1C C2C C3C C 1 or C 2 or C C

Evaluation Examples Source phraseContextOutput organizational/regulat ory organizational groups supporting terrorism Base: 1 “organizational”, 1 “regulatory” C 1,C 2 : 2 “organizational” Refs: “organized” and “organizational” + Border/frontier troops/guards violence along India-Nepal border Base: 1 “frontier guard”, 1 “border troop” C 1,C 2,C 3 : “border”  “frontier” Refs: all use the word “border” - sneak/infiltrate/enter w/o permission Turkey trying to enter European Union Base: 1 “sneak”, 1 “infiltrate” C 2,C 3 : 2 “infiltrate” Refs: each consistent, “worm its way”, “sneak”, “sneak into”, “enter” - ? 18

Conclusions A novel technique to test “one translation per discourse” Three consistency features in translation model brings solid and consistent improvements in MT Future ideas: Try alternatives to bm25, max-token, BLEU… Choosing the right discourse – document or collection? Learning other patterns from forced decoding 19

Thank you! 20

Arabic: English: Exploratory Analysis Forced decoding example 21

Exploratory Analysis Method 1.Keep track of all grammar rules used in forced decoding (i.e., R) 2.Count unique (f, d) pairs s.t. f appears in multiple rules in R d 3.Group together rules with minor differences الرهائن, ||| hostages الرهائن ||| hostage 4.Remove cases s.t. source phrase has no alternative translation 22