Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem Mikhail Zaslavskiy Marc Dymetman Nicola Cancedda ACL 2009.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

22C:19 Discrete Math Graphs Fall 2010 Sukumar Ghosh.
Introduction to Graph Theory Instructor: Dr. Chaudhary Department of Computer Science Millersville University Reading Assignment Chapter 1.
22C:19 Discrete Math Graphs Fall 2014 Sukumar Ghosh.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
NP-Completeness Lecture for CS 302. Traveling Salesperson Problem You have to visit n cities You want to make the shortest trip How could you do this?
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
1 MERLIN A polynomial solution for the Traveling Salesman Problem Dr. Joachim Mertz, 2005.
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Combinatorial Algorithms
DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering.
Complexity 11-1 Complexity Andrei Bulatov NP-Completeness.
1 Polynomial Church-Turing thesis A decision problem can be solved in polynomial time by using a reasonable sequential model of computation if and only.
1 NP-Completeness Objectives: At the end of the lesson, students should be able to: 1. Differentiate between class P, NP, and NPC 2. Reduce a known NPC.
Transitivity of  poly Theorem: Let ,  ’, and  ’’ be three decision problems such that   poly  ’ and  ’  poly  ’’. Then  poly  ’’. Proof:
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
February 25, 2015CS21 Lecture 211 CS21 Decidability and Tractability Lecture 21 February 25, 2015.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
GRAPH Learning Outcomes Students should be able to:
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Machines with Memory Chapter 3 (Part B). Turing Machines  Introduced by Alan Turing in 1936 in his famous paper “On Computable Numbers with an Application.
Complexity Classes (Ch. 34) The class P: class of problems that can be solved in time that is polynomial in the size of the input, n. if input size is.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Great Theoretical Ideas in Computer Science.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Polynomial-time reductions We have seen several reductions:
CSE 024: Design & Analysis of Algorithms Chapter 9: NP Completeness Sedgewick Chp:40 David Luebke’s Course Notes / University of Virginia, Computer Science.
EMIS 8373: Integer Programming NP-Complete Problems updated 21 April 2009.
The Traveling Salesman Problem Over Seventy Years of Research, and a Million in Cash Presented by Vladimir Coxall.
Networks Igor Segota Statistical physics presentation.
Korea Maritime and Ocean University NLP Jung Tae LEE
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
A Linear Search Strategy Using Bounds Sharlee Climer and Weixiong Zhang.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Integer Programming (정수계획법)
CS 3343: Analysis of Algorithms Lecture 25: P and NP Some slides courtesy of Carola Wenk.
MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011.
CSE 589 Part V One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important. Bertrand Russell.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
CPS Computational problems, algorithms, runtime, hardness (a ridiculously brief introduction to theoretical computer science) Vincent Conitzer.
A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
NP-completeness NP-complete problems. Homework Vertex Cover Instance. A graph G and an integer k. Question. Is there a vertex cover of cardinality k?
David Luebke 1 2/18/2016 CS 332: Algorithms NP Completeness Continued: Reductions.
CSC 413/513: Intro to Algorithms
Chapter 9: Graphs.
Lecture. Today Problem set 9 out (due next Thursday) Topics: –Complexity Theory –Optimization versus Decision Problems –P and NP –Efficient Verification.
CSE 421 Algorithms Richard Anderson Lecture 27 NP-Completeness Proofs.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
1 GRAPH Learning Outcomes Students should be able to: Explain basic terminology of a graph Identify Euler and Hamiltonian cycle Represent graphs using.
CSE 332: NP Completeness, Part II Richard Anderson Spring 2016.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
More NP-Complete and NP-hard Problems
Richard Anderson Lectures NP-Completeness
Richard Anderson Lecture 26 NP-Completeness
Richard Anderson Lecture 26 NP-Completeness
Richard Anderson Lecture 25 NP-Completeness
Great Ideas in Computing Problem mapping
Statistical Machine Translation Papers from COLING 2004
Presentation transcript:

Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem Mikhail Zaslavskiy Marc Dymetman Nicola Cancedda ACL 2009

Introduction Word-based & Phrase-based Machine Translation (MT) –Statistical machine translation (SMT) Successful in practice –Open Source Moses, Google Translate, etc. cette traduction automatique est curieuse (this automatic translation is curious) Biphrase table

Decoding Complexity Decoding: Perform MT given models. –Translation, language, distortion, etc. Word-based SMT is NP-hard –Any NP problem can be reduced to Travelling Salesman Problem (TSP) –Any TSP instance can be reduced to word-based SMT It is in NP So it is NP-complete –Kevin Knight Decoding Complexity in Word-Replacement Translation Models. Computational Linguistics.

Goal TSP is NP-complete Word-based SMT is in NP So SMT can be reduced to TSP, theoretically. Goal –Reduce SMT to TSP –Directly apply existing TSP solvers to SMT

Traveling Salesman Problem STSP (Symmetric TSP) –Most standard and studied –Undirected graph G on N nodes, where the edges carry real-valued costs. –Goal: find a Hamiltonian Circuit of minimal cost ATSP (Asymmetric TSP) –Graph G is directed –Edges (i,j) and (j,i) may carry different costs

Traveling Salesman Problem (2) SGTSP (Symmetric Generalized TSP) –Undirected graph G of |G| nodes –Given partition of these |G| nodes into m non-empty, disjoint clusters –Find a circular sequence of m nodes of minimal total cost, where each cluster is visited exactly once. CmCm C2C2 C1C1 C4C4 C3C3

Traveling Salesman Problem (3) AGTSP (Asymmetric Generalized TSP) –Directed SGTSP –Edges (i,j) and (j,i) may carry different costs Reductions –SMT --> AGTSP This paper –AGTSP --> ATSP C. Noon and J.C. Bean An efficient transformation of the generalized traveling salesman problem. INFOR, pages 39–44. –ATSP --> STSP David L. Applegate et al, The Traveling Salesman Problem: A Computational Study (Princeton Series in Applied Mathematics). Princeton University Press, January.

Phrase-based Decoding as AGTSP Translating the French sentence "cette traduction automatique est curieuse" into English. Biphrase table

Clusters in AGTSP Graph nodes are all the possible pairs (w, b). –b = biphrase, w = source word contained by b –biphrase ht contributes (cette, ht) and (traduction, ht) Clusters are the subsets of the graph nodes that share a common source word w. # of clusters = # of words in the sentence –5 words in this case

Example Graph Start cluster cette cluster traduction cluster automatique cluster est cluster curieuse cluster

Transition Cost Transition between nodes M and N a.M is (w1, b) and N is (w2, b), and w1 and w2 are consecutive words in b. Source side of b is "......w1w2...." Cost = 0, because of same biphrase

Transition Cost b.M is (w1, b1), where w is the rightmost source word in b1, and N = (w2, b2), where w2 is the leftmost source word in b2 Meaning: combine biphrases b1 and b2 Costs of b1 and b2 Language model, translation model, etc. Costs of combining them Language model Distortion model

Example Circuit This machine translation is strange Output: This machine translation is strange

Experiment 1 Given English (target) word sequence in French (source) order. The goal is to reconstruct "bad English" into "good English" with pure language model. One node for each cluster. Example –this translation automatic is curious (cette traduction automatique est curieuse) –Reorder the sentence into this automatic translation is curious Corpus –Training: sentences from NewsCommentary corpus –Testing: 170 sentences, average length is 17 words

Experiment 1 Exact TSP solver (Concorde) vs. SMT (Moses) Better performance for both bigram & trigram Wrong sentence with higher score than correct sentence is possible Bigram Trigram

Experiment 2 Machine Translation task LK (Lin-Kernighan) TSP solver implemented in Concorde –Not exact solver, since node size is too large Data: Europarl –Training: 2.81 million sents –Testing: 500 sents

Comment Main contribution –Transform SMT to TSP –Directly solve MT with TSP solver Problem –Experiment 1 Word reordering is less practical –Experiment 2 No significant test, diff(BLEU) < 1 BLEU score is too low (30 in 2003) –Experiment Sentence length (17) for test Sentence number (170, 500) for test