Statistical Machine Translation Part IIIb – Phrase-based Model

Slides:

Advertisements

Similar presentations

Statistical Machine Translation

Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Information Extraction Lecture 12 – Multilingual Extraction CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.

Machine Translation Course 9 Diana Trandab ă ț Academic year

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests

Information Extraction Lecture 9 – Multilingual Extraction CIS, LMU München Winter Semester Dr. Alexander Fraser.

DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering.

Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.

Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.

A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.

Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.

ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.

Hidden Markov Models.

Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.

Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??

Statistical Machine Translation Part IX – Better Word Alignment, Morphology and Syntax Alexander Fraser ICL, U. Heidelberg CIS, LMU München

Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.

MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.

THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.

Natural Language Processing Expectation Maximization.

Statistical Machine Translation Part VIII – Log-Linear Models Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart

An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:

English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.

Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation Part V – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München Seminar: Open Source.

Statistical Machine Translation Part III – Phrase-based SMT Alexander Fraser CIS, LMU München WSD and MT.

Sequence Models With slides by me, Joshua Goodman, Fei Xia.

Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.

LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

Haitham Elmarakeby.  Speech recognition

MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011.

Statistical Machine Translation Part III – Many-to-Many Alignments Alexander Fraser CIS, LMU München WSD and MT.

NLP. Machine Translation Tree-to-tree – Yamada and Knight Phrase-based – Och and Ney Syntax-based – Och et al. Alignment templates – Och and Ney.

A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.

Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.

Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ； Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical.

Machine Translation Statistical Machine Translation Part VI – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München.

Neural Machine Translation

Statistical Machine Translation Part II: Word Alignments and EM

Statistical Machine Translation Part IV – Log-Linear Models

Alexander Fraser CIS, LMU München Machine Translation

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests

Statistical Machine Translation Part III – Phrase-based SMT / Decoding

CSCI 5832 Natural Language Processing

Section 2.4 notes continued

Statistical Machine Translation

Bayesian Models in Machine Learning

CSCI 5832 Natural Language Processing

Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.

Machine Translation and MT tools: Giza++ and Moses

Statistical Machine Translation Papers from COLING 2004

Machine Translation and MT tools: Giza++ and Moses

Statistical Machine Translation Part VI – Phrase-based Decoding

Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi

Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011

CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.

Presentation transcript:

Statistical Machine Translation Part IIIb – Phrase-based Model Alexander Fraser CIS, LMU München 2015.11.10 WSD and MT

Where we have been We defined the overall problem and talked about evaluation We have now covered word alignment IBM Model 1, true Expectation Maximization Briefly mentioned: IBM Model 4, approximate Expectation Maximization Symmetrization Heuristics (such as Grow) Applied to two Viterbi alignments (typically from Model 4) Results in final word alignment

Where we are going We will discuss the "traditional" phrase-based model (which noone actually uses, but gives a good intuition) Then we will define a high performance translation model (next slide set) Finally, we will show how to solve the search problem for this model (= decoding)

Outline Phrase-based translation Model Estimating parameters Decoding

We could use IBM Model 4 in the direction p(f|e), together with a language model, p(e), to translate argmax P( e | f ) = argmax P( f | e ) P( e ) e e

However, decoding using Model 4 doesn’t work well in practice One strong reason is the bad 1-to-N assumption Another problem would be defining the search algorithm If we add additional operations to allow the English words to vary, this will be very expensive Despite these problems, Model 4 decoding was briefly state of the art We will now define a better model…

Slide from Koehn 2008

Slide from Koehn 2008

Language Model Often a trigram language model is used for p(e) P(the man went home) = p(the | START) p(man | START the) p(went | the man) p(home | man went) Language models work well for comparing the grammaticality of strings of the same length However, when comparing short strings with long strings they favor short strings For this reason, an important component of the language model is the length bonus This is a constant > 1 multiplied for each English word in the hypothesis It makes longer strings competitive with shorter strings

d Modified from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

z^n Slide from Koehn 2008