Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein.

Slides:

Advertisements

Similar presentations

Statistical Machine Translation

Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Word Alignment Philipp Koehn USC/Information Sciences Institute USC/Computer Science Department School of Informatics University of Edinburgh Some slides.

Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.

TURKALATOR A Suite of Tools for English to Turkish MT Siddharth Jonathan Gorkem Ozbek CS224n Final Project June 14, 2006.

Hybridity in MT: Experiments on the Europarl Corpus Declan Groves 24 th May, NCLT Seminar Series 2006.

Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

1 An Introduction to Statistical Machine Translation Dept. of CSIE, NCKU Yao-Sheng Chang Date:

A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.

A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.

EBMT1 Example Based Machine Translation as used in the Pangloss system at Carnegie Mellon University Dave Inman.

Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.

“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.

Expectation Maximization Algorithm

ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.

Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.

C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.

LES COULEURS WALT: to learn the colours

Natural Language Processing Expectation Maximization.

Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart

Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.

Machine Translation Course 5 Diana Trandab ă ț Academic year:

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,

Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.

Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,

Bayesian Subtree Alignment Model based on Dependency Trees Toshiaki Nakazawa Sadao Kurohashi Kyoto University 1 IJCNLP2011.

Korea Maritime and Ocean University NLP Jung Tae LEE

Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang I2R SMT-Reading Group.

Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.

INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.

February 2006Machine Translation II.21 Postgraduate Diploma In Translation Example Based Machine Translation Statistical Machine Translation.

Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.

NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.

LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.

MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011.

Statistical NLP Spring 2010 Lecture 17: Word / Phrase MT Dan Klein – UC Berkeley.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.

Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore

NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Maximum Entropy … the fact that a certain prob distribution maximizes entropy subject to certain constraints representing our incomplete information, is.

Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.

A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05

Computational Linguistics Seminar LING-696G Week 6.

LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.

Statistical Machine Translation Part II: Word Alignments and EM

CSE 517 Natural Language Processing Winter 2015

Statistical NLP Spring 2010

Statistical NLP Spring 2011

Statistical Machine Translation Part III – Phrase-based SMT / Decoding

Introduction to IBM Model 1&2 Alignment

CSCI 5832 Natural Language Processing

Expectation-Maximization Algorithm

Machine Translation and MT tools: Giza++ and Moses

Recap: Conditional Exponential Model

Memory-augmented Chinese-Uyghur Neural Machine Translation

Statistical Machine Translation Papers from COLING 2004

Machine learning overview

Machine Translation and MT tools: Giza++ and Moses

Statistical NLP Spring 2011

Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi

Presentation transcript:

Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein

Overview: Learning Phrases Sentence-aligned corpus cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8 dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9 language ||| langue ||| 0.9 … Phrase table (translation model) Intersected and grown word alignments Directional word alignments

Overview: Learning Phrases Sentence-aligned corpus cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8 dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9 language ||| langue ||| 0.9 … Phrase table (translation model) Phrase-level generative model Early successful phrase-based SMT system [Marcu & Wong ‘02] Challenging to train Underperforms heuristic approach

Outline I) Generative phrase-based alignment Motivation Model structure and training Performance results II) Error analysis Properties of the learned phrase table Contributions to increased error rate III) Proposed Improvements

Motivation for Learning Phrases Translate! Input sentence: Output sentence: J ’ ai un chat. I have a spade.

Motivation for Learning Phrases appelleunchatunchat call a spade a appellecall chat un chatspade a spade

Motivation for Learning Phrases appelleunchatunchat call a spade a appelle appelle un appelle un chat un un chat un chat un chat chat un chat un chat call call a call a spade a x2 a spade x2 a spade a spade x2 spade a spade a spade … appelle un chat un chat …

A Phrase Alignment Model Compatible with Pharaoh les chats aiment le poisson frais. cats like fresh fish.

Training Regimen That Respects Word Alignment leschats aiment le poisson cats like fresh fish..frais. leschats aiment le poisson cats like fresh fish.. frais. X

Training Regimen That Respects Word Alignment leschats aiment le poisson cats like fresh fish..frais. Only 46% of training sentences contributed to training.

Performance Results Heuristically generated parameters

Performance Results Lost training data is not the whole story Learned parameters with 4x training data underperform heuristic

Outline I) Generative phrase-based alignment Model structure and training Performance results II) Error analysis Properties of the learned phrase table Contributions to increased error rate III) Proposed Improvements

Training Corpus French: carte sur la table English: map on the table French: carte sur la table English: notice on the chart Example: Maximizing Likelihood with Competing Segmentations carte carte sur carte sur la sur la sur la sur la table la table table map notice map on notice on map on the notice on the on the on the on the table on the chart the table the chart table chart * 7 / 7 = 0.25 carte sur la table Likelihood Computation

Training Corpus French: carte sur la table English: map on the table French: carte sur la table English: notice on the chart Example: Maximizing Likelihood with Competing Segmentations carte carte sur carte sur la sur sur la sur la table la la table table map notice on notice on the on on the on the table the the table chart 1.0 carte sur la table Likelihood of “notice on the chart” pair: 1.0 * 2 / 7 = 0.28 > 0.25 Likelihood of “map on the table” pair: 1.0 * 2 / 7 = 0.28 > 0.25

EM Training Significantly Decreases Entropy of the Phrase Table French phrase entropy: 10% of French phrases have deterministic distributions

Effect 1: Useful Phrase Pairs Are Lost Due to Critically Small Probabilities In 10k translated sentences, no phrases with weight less than were used by the decoder.

Effect 2: Determinized Phrases Override Better Candidates During Decoding the situation varies to an enormous degree the situation varie d ' une immense degré the situation varies to an enormous degree the situation varie d ' une immense caractérise Heuristic Learned ~00.02amount extent level degree degré 0.998~0degree ~00.05features characterized characterizes caractérise

Effect 3: Ambiguous Foreign Phrases Become Active During Decoding Deterministic phrases can be used by the decoder with no cost. Translations for the French apostrophe

Outline I) Generative phrase-based alignment Model structure and training Performance results II) Error analysis Properties of the learned phrase table Contributions to increased error rate III) Proposed Improvements

Motivation for Reintroducing Entropy to the Phrase Table 1. Useful phrase pairs are lost due to critically small probabilities. 2. Determinized phrases override better candidates. 3. Ambiguous foreign phrases become active during decoding.

Reintroducing Lost Phrases Interpolation yields up to 1.0 BLEU improvement

Smoothing Phrase Probabilities Reserves probability mass for unseen translations based on the length of the French phrase

Conclusion Generative phrase models determinize the phrase table via the latent segmentation variable. A determinized phrase table introduces errors at decoding time. Modest improvement can be realized by reintroducing phrase table entropy.

Questions?