Statistical Machine Translation Part IIIb – Phrase-based Model

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Information Extraction Lecture 12 – Multilingual Extraction CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Machine Translation Course 9 Diana Trandab ă ț Academic year
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Information Extraction Lecture 9 – Multilingual Extraction CIS, LMU München Winter Semester Dr. Alexander Fraser.
DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.
Hidden Markov Models.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
Statistical Machine Translation Part IX – Better Word Alignment, Morphology and Syntax Alexander Fraser ICL, U. Heidelberg CIS, LMU München
Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.
Natural Language Processing Expectation Maximization.
Statistical Machine Translation Part VIII – Log-Linear Models Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part V – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München Seminar: Open Source.
Statistical Machine Translation Part III – Phrase-based SMT Alexander Fraser CIS, LMU München WSD and MT.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Haitham Elmarakeby.  Speech recognition
MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011.
Statistical Machine Translation Part III – Many-to-Many Alignments Alexander Fraser CIS, LMU München WSD and MT.
NLP. Machine Translation Tree-to-tree – Yamada and Knight Phrase-based – Och and Ney Syntax-based – Och et al. Alignment templates – Och and Ney.
A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.
Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical.
Machine Translation Statistical Machine Translation Part VI – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München.
Neural Machine Translation
Statistical Machine Translation Part II: Word Alignments and EM
Statistical Machine Translation Part IV – Log-Linear Models
Alexander Fraser CIS, LMU München Machine Translation
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
CSCI 5832 Natural Language Processing
Section 2.4 notes continued
Statistical Machine Translation
Bayesian Models in Machine Learning
CSCI 5832 Natural Language Processing
Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.
Machine Translation and MT tools: Giza++ and Moses
Statistical Machine Translation Papers from COLING 2004
Machine Translation and MT tools: Giza++ and Moses
Statistical Machine Translation Part VI – Phrase-based Decoding
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.
Presentation transcript:

Statistical Machine Translation Part IIIb – Phrase-based Model Alexander Fraser CIS, LMU München 2015.11.10 WSD and MT

Where we have been We defined the overall problem and talked about evaluation We have now covered word alignment IBM Model 1, true Expectation Maximization Briefly mentioned: IBM Model 4, approximate Expectation Maximization Symmetrization Heuristics (such as Grow) Applied to two Viterbi alignments (typically from Model 4) Results in final word alignment

Where we are going We will discuss the "traditional" phrase-based model (which noone actually uses, but gives a good intuition) Then we will define a high performance translation model (next slide set) Finally, we will show how to solve the search problem for this model (= decoding)

Outline Phrase-based translation Model Estimating parameters Decoding

We could use IBM Model 4 in the direction p(f|e), together with a language model, p(e), to translate argmax P( e | f ) = argmax P( f | e ) P( e ) e e

However, decoding using Model 4 doesn’t work well in practice One strong reason is the bad 1-to-N assumption Another problem would be defining the search algorithm If we add additional operations to allow the English words to vary, this will be very expensive Despite these problems, Model 4 decoding was briefly state of the art We will now define a better model…

Slide from Koehn 2008

Slide from Koehn 2008

Language Model Often a trigram language model is used for p(e) P(the man went home) = p(the | START) p(man | START the) p(went | the man) p(home | man went) Language models work well for comparing the grammaticality of strings of the same length However, when comparing short strings with long strings they favor short strings For this reason, an important component of the language model is the length bonus This is a constant > 1 multiplied for each English word in the hypothesis It makes longer strings competitive with shorter strings

d Modified from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

z^n Slide from Koehn 2008