Download presentation
Presentation is loading. Please wait.
Published byMaurice Hampton Modified over 8 years ago
1
Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical models. Much of the materials was borrowed from course slides of Chris Callison-Burch (2014) and Nizar Habash (2013)
2
Survey Results Majority had issues but think they can finish setting up the baseline. Most people preferred more guided / structured assignments over freedom to choose. We will adjust accordingly. Expect new (more structured) assignment.
3
Probability Refresh your probability concepts with Chris Callison-Burch's slideswith Chris Callison-Burch's slides And / or Koehn’s book Chapter 3 And /or any probability and statistics intro (some are directed to NLP crowd) Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 3
4
Noisy Channel and Bayes’ Rule Russian is actually “garbled English” Noise in communication, twisting sound / letters Accidental omissions Accidental additions Some signals arrived faster than others (“wrong” order) Learn prob of each “error” Bayes Rule: p(E|F) = p(F|E) x p(E) / p(F) E best = argmax E p(E|F) = argmax E p(F|E) x p(E) Can drop the constant p(F) TM x LM “Distortion” p(ePos|fPos,E_Len), “fertility” Potentially confusing at first sight! We model given the target language, which we don’t have before we translate... Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 4
5
Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 5
6
Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 6
7
Word Alignment Will be explained with phrases Expectation Maximization (EM) Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 7
8
Expectation Maximization (EM) Initialize parameters (e.g., uniform alignment prob’s) Repeat Expectation: calc expected counts of the unseen events (word A aligned with B) Maximization: update parameters to maximize the likelihood of the (not really) observed events Use the expected counts as proxy for observed counts. Rinse, repeat (until no change, or had enough of this) (Alignment) likelihood is guaranteed to be monotonically increasing (more precisely, non-decreasing). In some cases there are computation tricks to make it faster (IBM m1) Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 8
9
Expectation Maximization (EM), cont. Expected value of the likelihood function: the probability-weighted average of all possible values Marginalize: sum all alignments containing the link of interest (words A-B) Divide by sum of all possible alignments: P(A|B) / sum[P(A|*)] (in principle, exponentially many alignments!) Model parameters: prob of word alignments, e.g., P(A|B) See Adam Lopez’s tutorial: http://www.statmt.org/mtm12/pub/mtm-tutorial-2012.pdf http://www.statmt.org/mtm12/pub/mtm-tutorial-2012.pdf
10
IBM Models IBM Model 1: lexical translation (requires EM for word alignment; generative model) IBM Model 2: adds absolute reordering model IBM Model 3: adds fertility model (0 = deletion, >1 = expansion / one-to-many), IBM Model 4: relative reordering model IBM Model 5: fixes deficiency (keeps track of available positions) Only IBM Model 1 has global maximum Training of a higher IBM model builds on previous model IBM Models 1-4 are deficient some impossible translations have positive probability multiple output words may be placed in the same position probability mass is wasted IBM Model 5 fixes deficiency by keeping track of vacancies (available positions) Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 10
11
Statistical MT IBM Model (Word-based Model) http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 11
12
Shortcomings of Word-Based models Weak reordering model -- output is not fluent. Many decisions -- many things can go wrong. IBM Model 1 is convex (easy to get max), but not so good reordering IBM model 4 has fertility (word to phrase), local reordering (better), not convex, not tractable Not moving phrases together, as often is needed in translation (typically worse word salads) Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 12
13
Phrase-Based Statistical MT Foreign input segmented in to phrases –“phrase” is any sequence of words Each phrase is probabilistically translated into English –P(to the conference | zur Konferenz) –P(into the meeting | zur Konferenz) Phrase distortion This is state-of-the-art! Morgenfliegeichnach Kanadazur Konferenz TomorrowIwill flyto the conferenceIn Canada Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 13
14
Phrase-Based Statistical MT P(phrase segmentation) x P(phrase translation) x P(phrase distortion) Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 14
15
Mary did not slap the green witch Maria no dió una bofetada a la bruja verde Word Alignment Induced Phrases (Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 15
16
Mary did not slap the green witch Maria no dió una bofetada a la bruja verde Word Alignment Induced Phrases (Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 16
17
Mary did not slap the green witch Maria no dió una bofetada a la bruja verde Word Alignment Induced Phrases (Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 17
18
Mary did not slap the green witch Maria no dió una bofetada a la bruja verde (Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) … Word Alignment Induced Phrases Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 18
19
Mary did not slap the green witch Maria no dió una bofetada a la bruja verde (Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) … (Maria no dió una bofetada a la bruja verde, Mary did not slap the green witch) Word Alignment Induced Phrases Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 19
20
Phrase-Based Models Sentence f is decomposed into J phrases f 1 J = f 1,...,f j,...,f J Sentence e is decomposed into l phrases e = e I 1 = e 1,...,e i,...,e I. We choose the sentence with the highest probability: Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 20
21
Phrase-Based Models Model the posterior probability using a log-linear combination of feature functions. We have a set of M feature functions h m (e I 1,f 1 J ),m = 1,...,M. For each feature function, there exists a model parameter λ m,m = 1,...,M The decision Rule is Features cover the main components Phrase-Translation Model Reordering Model Language Model Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 21
22
Advantages of Phrase-Based SMT Many-to-many mappings can handle non-compositional phrases Local context is very useful for disambiguating “Interest rate” … “Interest in” … The more data, the longer the learned phrases Sometimes whole sentences Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 22
23
Bottom up hypothesis building Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 23
24
Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 24
25
Probability Refresh your probability concepts with Chris Callison-Burch's slideswith Chris Callison-Burch's slides And / or Koehn’s book Chapter 3 And /or any probability and statistics intro (some are directed to NLP crowd) Jan 12, 2016 Yuval Marton Ling 575 -- class 1: Intro to MT (cont.), word alignment, Bayes, phrase 25
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.