Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Machine Translation: Challenges and Approaches
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Word Alignment Philipp Koehn USC/Information Sciences Institute USC/Computer Science Department School of Informatics University of Edinburgh Some slides.
Translation Model Parameters & Expectation Maximization Algorithm Lecture 2 (adapted from notes from Philipp Koehn & Mary Hearne) Dr. Declan Groves, CNGL,
Machine Translation Dr. Nizar Habash Center for Computational Learning Systems Columbia University COMS 4705: Natural Language Processing Fall 2010.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
Corpora and Translation Parallel corpora Statistical MT (not to mention: Corpus of translated text, for translation studies)
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
Natural Language Processing Expectation Maximization.
Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Machine Translation Course 5 Diana Trandab ă ț Academic year:
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Martin KayTranslation—Meaning1 Martin Kay Stanford University with thanks to Kevin Knight.
Korea Maritime and Ocean University NLP Jung Tae LEE
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
(Statistical) Approaches to Word Alignment
Machine Translation: Challenges and Approaches Nizar Habash Associate Research Scientist Center for Computational Learning Systems Columbia University.
NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Machine Translation Course 4 Diana Trandab ă ț Academic year:
Computational Linguistics Seminar LING-696G Week 6.
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.
Linear Models Tony Dodd. 21 January 2008Mathematics for Data Modelling: Linear Models Overview Linear models. Parameter estimation. Linear in the parameters.
Hidden Markov Models BMI/CS 576
Statistical Machine Translation Part II: Word Alignments and EM
Approaches to Machine Translation
CSE 517 Natural Language Processing Winter 2015
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Statistical Machine Translation Part IV – Log-Linear Models
Alexander Fraser CIS, LMU München Machine Translation
Bayes Net Learning: Bayesian Approaches
Statistical NLP Spring 2011
Data Mining Lecture 11.
CS 4/527: Artificial Intelligence
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
CSCI 5832 Natural Language Processing
Statistical NLP: Lecture 9
Statistical Machine Translation
Statistical Models for Automatic Speech Recognition
CSCI 5832 Natural Language Processing
Expectation-Maximization Algorithm
Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.
Approaches to Machine Translation
Machine Translation and MT tools: Giza++ and Moses
Statistical Machine Translation Part IIIb – Phrase-based Model
Word Alignment David Kauchak CS159 – Fall 2019 Philipp Koehn
Lecture 12: Machine Translation (II) November 4, 2004 Dan Jurafsky
Machine Translation and MT tools: Giza++ and Moses
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.
Presentation transcript:

Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical models. Much of the materials was borrowed from course slides of Chris Callison-Burch (2014) and Nizar Habash (2013)

Survey Results Majority had issues but think they can finish setting up the baseline. Most people preferred more guided / structured assignments over freedom to choose. We will adjust accordingly. Expect new (more structured) assignment.

Probability Refresh your probability concepts with Chris Callison-Burch's slideswith Chris Callison-Burch's slides And / or Koehn’s book Chapter 3 And /or any probability and statistics intro (some are directed to NLP crowd) Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 3

Noisy Channel and Bayes’ Rule Russian is actually “garbled English” Noise in communication, twisting sound / letters Accidental omissions Accidental additions Some signals arrived faster than others (“wrong” order) Learn prob of each “error” Bayes Rule: p(E|F) = p(F|E) x p(E) / p(F) E best = argmax E p(E|F) = argmax E p(F|E) x p(E) Can drop the constant p(F) TM x LM “Distortion” p(ePos|fPos,E_Len), “fertility” Potentially confusing at first sight! We model given the target language, which we don’t have before we translate... Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 4

Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 5

Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 6

Word Alignment Will be explained with phrases Expectation Maximization (EM) Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 7

Expectation Maximization (EM) Initialize parameters (e.g., uniform alignment prob’s) Repeat Expectation: calc expected counts of the unseen events (word A aligned with B) Maximization: update parameters to maximize the likelihood of the (not really) observed events Use the expected counts as proxy for observed counts. Rinse, repeat (until no change, or had enough of this) (Alignment) likelihood is guaranteed to be monotonically increasing (more precisely, non-decreasing). In some cases there are computation tricks to make it faster (IBM m1) Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 8

Expectation Maximization (EM), cont. Expected value of the likelihood function: the probability-weighted average of all possible values Marginalize: sum all alignments containing the link of interest (words A-B) Divide by sum of all possible alignments: P(A|B) / sum[P(A|*)] (in principle, exponentially many alignments!) Model parameters: prob of word alignments, e.g., P(A|B) See Adam Lopez’s tutorial:

IBM Models IBM Model 1: lexical translation (requires EM for word alignment; generative model) IBM Model 2: adds absolute reordering model IBM Model 3: adds fertility model (0 = deletion, >1 = expansion / one-to-many), IBM Model 4: relative reordering model IBM Model 5: fixes deficiency (keeps track of available positions) Only IBM Model 1 has global maximum Training of a higher IBM model builds on previous model IBM Models 1-4 are deficient some impossible translations have positive probability multiple output words may be placed in the same position probability mass is wasted IBM Model 5 fixes deficiency by keeping track of vacancies (available positions) Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 10

Statistical MT IBM Model (Word-based Model) Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 11

Shortcomings of Word-Based models Weak reordering model -- output is not fluent. Many decisions -- many things can go wrong. IBM Model 1 is convex (easy to get max), but not so good reordering IBM model 4 has fertility (word to phrase), local reordering (better), not convex, not tractable Not moving phrases together, as often is needed in translation (typically worse word salads) Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 12

Phrase-Based Statistical MT Foreign input segmented in to phrases –“phrase” is any sequence of words Each phrase is probabilistically translated into English –P(to the conference | zur Konferenz) –P(into the meeting | zur Konferenz) Phrase distortion This is state-of-the-art! Morgenfliegeichnach Kanadazur Konferenz TomorrowIwill flyto the conferenceIn Canada Slide courtesy of Kevin Knight Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 13

Phrase-Based Statistical MT P(phrase segmentation) x P(phrase translation) x P(phrase distortion) Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 14

Mary did not slap the green witch Maria no dió una bofetada a la bruja verde Word Alignment Induced Phrases (Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) Slide courtesy of Kevin Knight Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 15

Mary did not slap the green witch Maria no dió una bofetada a la bruja verde Word Alignment Induced Phrases (Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) Slide courtesy of Kevin Knight Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 16

Mary did not slap the green witch Maria no dió una bofetada a la bruja verde Word Alignment Induced Phrases (Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) Slide courtesy of Kevin Knight Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 17

Mary did not slap the green witch Maria no dió una bofetada a la bruja verde (Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) … Word Alignment Induced Phrases Slide courtesy of Kevin Knight Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 18

Mary did not slap the green witch Maria no dió una bofetada a la bruja verde (Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) … (Maria no dió una bofetada a la bruja verde, Mary did not slap the green witch) Word Alignment Induced Phrases Slide courtesy of Kevin Knight Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 19

Phrase-Based Models Sentence f is decomposed into J phrases f 1 J = f 1,...,f j,...,f J Sentence e is decomposed into l phrases e = e I 1 = e 1,...,e i,...,e I. We choose the sentence with the highest probability: Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 20

Phrase-Based Models Model the posterior probability using a log-linear combination of feature functions. We have a set of M feature functions h m (e I 1,f 1 J ),m = 1,...,M. For each feature function, there exists a model parameter λ m,m = 1,...,M The decision Rule is Features cover the main components Phrase-Translation Model Reordering Model Language Model Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 21

Advantages of Phrase-Based SMT Many-to-many mappings can handle non-compositional phrases Local context is very useful for disambiguating “Interest rate”  … “Interest in”  … The more data, the longer the learned phrases Sometimes whole sentences Slide courtesy of Kevin Knight Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 22

Bottom up hypothesis building Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 23

Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 24

Probability Refresh your probability concepts with Chris Callison-Burch's slideswith Chris Callison-Burch's slides And / or Koehn’s book Chapter 3 And /or any probability and statistics intro (some are directed to NLP crowd) Jan 12, 2016 Yuval Marton Ling class 1: Intro to MT (cont.), word alignment, Bayes, phrase 25