Alexander Fraser CIS, LMU München Machine Translation

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Machine Translation Course 9 Diana Trandab ă ț Academic year
Information Extraction Lecture 9 – Multilingual Extraction CIS, LMU München Winter Semester Dr. Alexander Fraser.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Statistical NLP: Lecture 11
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
Hidden Markov Models.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
1 Lab Session-III CSIT-120 Spring 2001 Revising Previous session Data input and output While loop Exercise Limits and Bounds GOTO SLIDE 13 Lab session.
Statistical Machine Translation Part IX – Better Word Alignment, Morphology and Syntax Alexander Fraser ICL, U. Heidelberg CIS, LMU München
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
1 1 Automatic Transliteration of Proper Nouns from Arabic to English Mehdi M. Kashani, Fred Popowich, Anoop Sarkar Simon Fraser University Vancouver, BC.
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.
Statistical Machine Translation Part VIII – Log-Linear Models Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part V – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München Seminar: Open Source.
Statistical Machine Translation Part III – Phrase-based SMT Alexander Fraser CIS, LMU München WSD and MT.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original.
Haitham Elmarakeby.  Speech recognition
Notes for Week 11 Term project evaluation and tips 3 lectures before Final exam Discussion questions for this week.
Statistical Machine Translation Part III – Many-to-Many Alignments Alexander Fraser CIS, LMU München WSD and MT.
NLP. Machine Translation Tree-to-tree – Yamada and Knight Phrase-based – Och and Ney Syntax-based – Och et al. Alignment templates – Och and Ney.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Getting the structure right for word alignment: LEAF Alexander Fraser and Daniel Marcu Presenter Qin Gao.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Introduction to Profile HMMs
Machine Translation Statistical Machine Translation Part VI – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München.
Advanced Algorithms Analysis and Design
Statistical Machine Translation Part II: Word Alignments and EM
Statistics 200 Objectives:
Statistical Machine Translation Part IV – Log-Linear Models
3.7. Other Game Physics Approaches
Suggestions for Class Projects
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Alex Fraser Institute for Natural Language Processing
CSCI 5832 Natural Language Processing
Hidden Markov Models Part 2: Algorithms
Objective of This Course
Statistical Machine Translation
CSCI 5832 Natural Language Processing
Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.
Machine Translation and MT tools: Giza++ and Moses
Lecture 10: Speech Recognition (II) October 28, 2004 Dan Jurafsky
Statistical Machine Translation Papers from COLING 2004
Statistical Machine Translation Part IIIb – Phrase-based Model
Machine Translation(MT)
Improving IBM Word-Alignment Model 1(Robert C. MOORE)
Machine Translation and MT tools: Giza++ and Moses
Statistical Machine Translation Part VI – Phrase-based Decoding
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
Presentation transcript:

Alexander Fraser CIS, LMU München 2017.05.17 Machine Translation Statistical Machine Translation Part III – Many-to-Many Alignments and Phrase-Based SMT Alexander Fraser CIS, LMU München 2017.05.17 Machine Translation

Next week Next week exercise and lecture are swapped Lecture Tuesday, location: C003 Exercise Wednesday, location: here in 131 Check the web page, all details will be there

Last time, we discussed Model 1 and Expectation Maximization Today we will discuss getting useful alignments for translation and a translation model

Slide from Koehn 2008

Slide from Koehn 2009

Slide from Koehn 2009

HMM Model Model 4 requires local search (making small changes to an initial alignment and rescoring) Another popular model is the HMM model, which is similar to Model 2 except that it uses relative alignment positions (like Model 4) Popular because it supports inference via the forward-backward algorithm

Overcoming 1-to-N We'll now discuss overcoming the poor assumption behind alignment functions

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2009

IBM Models: 1-to-N Assumption Multi-word “cepts” (words in one language translated as a unit) only allowed on target side. Source side limited to single word “cepts”. Forced to create M-to-N alignments using heuristics

Slide from Koehn 2008

Slide from Koehn 2009

Slide from Koehn 2009

Discussion Most state of the art SMT systems are built as I presented Use IBM Models to generate both: one-to-many alignment many-to-one alignment Combine these two alignments using symmetrization heuristic output is a many-to-many alignment used for building decoder Moses toolkit for implementation: www.statmt.org Uses Och and Ney GIZA++ tool for Model 1, HMM, Model 4 However, there is newer work on alignment that is interesting!

Where we have been We defined the overall problem and talked about evaluation We have now covered word alignment IBM Model 1, true Expectation Maximization Briefly mentioned: IBM Model 4, approximate Expectation Maximization Symmetrization Heuristics (such as Grow) Applied to two Viterbi alignments (typically from Model 4) Results in final word alignment

Where we are going We will discuss the "traditional" phrase-based model (which noone actually uses, but gives a good intuition) Then we will define a high performance translation model (next slide set) Finally, we will show how to solve the search problem for this model (= decoding)

Outline Phrase-based translation Model Estimating parameters Decoding

We could use IBM Model 4 in the direction p(f|e), together with a language model, p(e), to translate argmax P( e | f ) = argmax P( f | e ) P( e ) e e

However, decoding using Model 4 doesn’t work well in practice One strong reason is the bad 1-to-N assumption Another problem would be defining the search algorithm If we add additional operations to allow the English words to vary, this will be very expensive Despite these problems, Model 4 decoding was briefly state of the art We will now define a better model…

Slide from Koehn 2008

Slide from Koehn 2008

Language Model Often a trigram language model is used for p(e) P(the man went home) = p(the | START) p(man | START the) p(went | the man) p(home | man went) Language models work well for comparing the grammaticality of strings of the same length However, when comparing short strings with long strings they favor short strings For this reason, an important component of the language model is the length bonus This is a constant > 1 multiplied for each English word in the hypothesis It makes longer strings competitive with shorter strings

d Modified from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

Slide from Koehn 2008

z^n Slide from Koehn 2008