Download presentation
Presentation is loading. Please wait.
Published byLisa Watson Modified over 8 years ago
1
Computational Linguistics Seminar LING-696G Week 6
2
Administrivia Next time (March 2 nd ) Return of our guest lecturer: – Bryan Heidorn, School of Information – ISTA 455/555: Applied NLP Today: – IBM Model 1: we begin the "meaty" part of the seminar course
3
Chapter 4 errata p. 84: Text before Equation 4.5 should read at position j to a German output at position i. The indexes are erroneously flipped in the book. p. 90: Text before Equation 4.13 should read the same simplifications as in Equation (4.10). The book makes an erroneous reference to Equation 4.11. p. 91: The two end for on line 12 and 13 should be one indentation to the left. p. 91: Left hand side of equation 4.14 should be instead of. p. 93: The first sum should be a product. p. 93: The result of the computation of Equation 4.20 should be instead of. p. 93: The text should clarify that in Equation 4.21 perplexity is computed over the whole corpus, and not the average per sentence.
4
4.2 Learning Lexical Translation Models Expectation Maximation (EM) algorithm IBM Model 1 worked example Critical Ideas: – Begin with no assumptions (equal probability of translation) – Run training data over and over and over again… – Each time, update the probability – Convergence
5
4.2 Learning Lexical Translation Models e = English sentence e 1 …e l_e f = foreign sentence f 1 …f l_f a = a (specific) alignment p(e,a|f) = probability of translating f into e with alignment a t(e 1 |f a(1) ) t(e 2 |f a(2) ) … t(e l_e |f a(l_e) ) scaling by # of possible alignments: assuming each of NULL + f j can be freely mapped onto any single e i
6
4.2 Learning Lexical Translation Models e = English sentence e 1 …e l_e f = foreign sentence f 1 …f l_f p(e|f) = probability of translating f into e with any alignment
7
4.2 Learning Lexical Translation Models p(a|e,f) = probability of alignment a given translation sentences e and f
8
4.2 Learning Lexical Translation Models
9
t(e|f)
10
4.2 Learning Lexical Translation Models
11
Words: – (das,ein,Haus,Buch) – (the,a,house,book) Corpus: – das Haus/the house – das Buch/the book – ein Buch/a book Initialization step: – each German word can be translated into one of 4 English words, so let translation probability t(e i |f j ) = 0.25 for all combinations of e i and f j
12
4.2 Learning Lexical Translation Models Assume free alignment: das Haus/the house 0.25 t(the|das) 0.25t(house|das) 0.25t(the|Haus) 0.25t(house|Haus) das Buch/the book 0.25 t(the|das) 0.25t(book|das) 0.25t(the|Buch) 0.25t(book|Buch) ein Buch/a book 0.25t(a|ein) 0.25t(book|ein) 0.25t(a|Buch) 0.25t(book|Buch) t(the|f j ) = 0.5, t(house|f j ) = 0.5 c(the|das) = 0.25/0.5, c(the|Haus) = 0.25/0.5 c(house|das) = 0.25/0.5, c(house|Haus) = 0.25/0.5 t(the|f j ) = 0.5, t(book|f j ) = 0.5 c(the|das) = 0.5 + 0.25/0.5, c(the|Buch) = 0.25/0.5 c(book|das) = 0.25/0.5, c(book|Buch) = 0.25/0.5 t(a|f j ) = 0.5, t(book|f j ) = 0.5 c(a|ein) = 0.25/0.5, c(a|Buch) = 0.25/0.5 c(book|ein) = 0.25/0.5, c(book|Buch) = 0.5 +0.25/0.5
13
4.2 Learning Lexical Translation Models Update probabilities: c(the|Haus) = 0.5 c(house|das) = 0.5 c(house|Haus) = 0.5 c(the|das) = 1.0 c(the|Buch) = 0.5 c(book|das) = 0.5 c(a|ein) = 0.5 c(a|Buch) = 0.5 c(book|ein) = 0.5 c(book|Buch) = 1.0 total(Haus) = 1.0 total(Buch) = 2.0 total(das) = 2.0 total(ein) = 1.0 das Haus/the house 0.5 t(the|das) 0.25t(house|das) 0.5t(the|Haus) 0.5t(house|Haus) das Buch/the book 0.5 t(the|das) 0.25t(book|das) 0.25t(the|Buch) 0.5t(book|Buch) ein Buch/a book 0.5t(a|ein) 0.5t(book|ein) 0.25t(a|Buch) 0.5t(book|Buch) t(e i |f j ) = c(e i |f j ) / total(f j ) translation of das 0.5 t(the|das) 0.25t(house|das) 0.25t(book|das) translation of Buch 0.25t(the|Buch) 0.5t(book|Buch) 0.25t(a|Buch) translation of Haus 0.5t(the|Haus) 0.5t(house|Haus) translation of ein 0.5t(a|ein) 0.5t(book|ein)
14
4.2 Learning Lexical Translation Models 2 nd iteration: das Haus/the house 0.5 t(the|das) 0.25t(house|das) 0.5t(the|Haus) 0.5t(house|Haus) das Buch/the book 0.5 t(the|das) 0.25t(book|das) 0.25t(the|Buch) 0.5t(book|Buch) ein Buch/a book 0.5t(a|ein) 0.5t(book|ein) 0.25t(a|Buch) 0.5t(book|Buch) t(the|f j ) = 1.0, t(house|f j ) = 0.75 c(the|das) = 0.5/1.0, c(the|Haus) = 0.5/1.0 c(house|das) = 0.25/0.75, c(house|Haus) = 0.5/0.75 t(the|f j ) = 0.75, t(book|f j ) = 0.75 c(the|das) = 0.5 + 0.5/0.75, c(the|Buch) = 0.25/0.75 c(book|das) = 0.25/0.75, c(book|Buch) = 0.5/0.75 t(a|f j ) = 0.75, t(book|f j ) = 1.0 c(a|ein) = 0.5/0.75, c(a|Buch) = 0.25/0.75 c(book|ein) = 0.5/1.0, c(book|Buch) = 0.67 +0.5/1.0
15
4.2 Learning Lexical Translation Models Update probabilities (2 nd iteration): c(the|Haus) = 0.5 c(house|das) = 0.33 c(house|Haus) = 0.67 c(the|das) = 1.17 c(the|Buch) = 0.33 c(book|das) = 0.33 c(a|ein) = 0.67 c(a|Buch) = 0.33 c(book|ein) = 0.5 c(book|Buch) = 1.17 total(Haus) = 1.17 total(Buch) = 1.83 total(das) = 1.83 total(ein) = 1.17 das Haus/the house 0.64 t(the|das) 0.18t(house|das) 0.43t(the|Haus) 0.57t(house|Haus) das Buch/the book 0.64t(the|das) 0.18t(book|das) 0.18t(the|Buch) 0.64t(book|Buch) ein Buch/a book 0.57t(a|ein) 0.43t(book|ein) 0.18t(a|Buch) 0.64t(book|Buch) t(e i |f j ) = c(e i |f j ) / total(f j ) translation of das 0.64 t(the|das) 0.18t(house|das) 0.18t(book|das) translation of Buch 0.18t(the|Buch) 0.64t(book|Buch) 0.18t(a|Buch) translation of Haus 0.43t(the|Haus) 0.57t(house|Haus) translation of ein 0.57t(a|ein) 0.43t(book|ein)
16
4.2 Learning Lexical Translation Models 3 rd iteration: das Haus/the house 0.64 t(the|das) 0.18t(house|das) 0.43t(the|Haus) 0.57t(house|Haus) das Buch/the book 0.64t(the|das) 0.18t(book|das) 0.18t(the|Buch) 0.64t(book|Buch) ein Buch/a book 0.57t(a|ein) 0.43t(book|ein) 0.18t(a|Buch) 0.64t(book|Buch) t(the|f j ) = 1.06, t(house|f j ) = 0.75 c(the|das) = 0.64/1.06, c(the|Haus) = 0.43/1.06 c(house|das) = 0.18/0.75, c(house|Haus) = 0.57/0.75 t(the|f j ) = 0.82, t(book|f j ) = 0.82 c(the|das) = 0.6 + 0.64/0.82, c(the|Buch) = 0.18/0.82 c(book|das) = 0.18/0.82, c(book|Buch) = 0.64/0.82 t(a|f j ) = 0.75, t(book|f j ) = 1.06 c(a|ein) = 0.57/0.75, c(a|Buch) = 0.18/0.75 c(book|ein) = 0.43/1.06, c(book|Buch) = 0.78 +0.64/1.06
17
4.2 Learning Lexical Translation Models Update probabilities (3 rd iteration): c(the|Haus) = 0.4 c(house|das) = 0.24 c(house|Haus) = 0.76 c(the|das) = 1.38 c(the|Buch) = 0.22 c(book|das) = 0.22 c(a|ein) = 0.76 c(a|Buch) = 0.24 c(book|ein) = 0.4 c(book|Buch) = 1.38 total(Haus) = 1.16 total(Buch) = 1.84 total(das) = 1.84 total(ein) = 1.16 das Haus/the house 0.75 t(the|das) 0.13t(house|das) 0.35t(the|Haus) 0.65t(house|Haus) das Buch/the book 0.75t(the|das) 0.12t(book|das) 0.12t(the|Buch) 0.75t(book|Buch) ein Buch/a book 0.65t(a|ein) 0.35t(book|ein) 0.13t(a|Buch) 0.75t(book|Buch) t(e i |f j ) = c(e i |f j ) / total(f j ) translation of das 0.75 t(the|das) 0.13t(house|das) 0.12t(book|das) translation of Buch 0.12t(the|Buch) 0.75t(book|Buch) 0.13t(a|Buch) translation of Haus 0.35t(the|Haus) 0.65t(house|Haus) translation of ein 0.65t(a|ein) 0.35t(book|ein)
18
4.2 Learning Lexical Translation Models 4 th iteration: das Haus/the house 0.64 t(the|das) 0.18t(house|das) 0.43t(the|Haus) 0.57t(house|Haus) das Buch/the book 0.64t(the|das) 0.18t(book|das) 0.18t(the|Buch) 0.64t(book|Buch) ein Buch/a book 0.57t(a|ein) 0.43t(book|ein) 0.18t(a|Buch) 0.64t(book|Buch) das Haus/the house 0.75 t(the|das) 0.13t(house|das) 0.35t(the|Haus) 0.65t(house|Haus) das Buch/the book 0.75t(the|das) 0.12t(book|das) 0.12t(the|Buch) 0.75t(book|Buch) ein Buch/a book 0.65t(a|ein) 0.35t(book|ein) 0.13t(a|Buch) 0.75t(book|Buch) das Haus/the house 0.5 t(the|das) 0.25t(house|das) 0.5t(the|Haus) 0.5t(house|Haus) das Buch/the book 0.5 t(the|das) 0.25t(book|das) 0.25t(the|Buch) 0.5t(book|Buch) ein Buch/a book 0.5t(a|ein) 0.5t(book|ein) 0.25t(a|Buch) 0.5t(book|Buch) das Haus/the house 0.25 t(the|das) 0.25t(house|das) 0.25t(the|Haus) 0.25t(house|Haus) das Buch/the book 0.25 t(the|das) 0.25t(book|das) 0.25t(the|Buch) 0.25t(book|Buch) ein Buch/a book 0.25t(a|ein) 0.25t(book|ein) 0.25t(a|Buch) 0.25t(book|Buch)
19
4.2 Learning Lexical Translation Models
24
File: ibm1s.py (Simplified IBM Model 1)
25
4.2 Learning Lexical Translation Models ∑ j t(ew|fw j ) t(ew|fw) / ∑ j t(ew|fw j )
26
4.2 Learning Lexical Translation Models hardcoded here, a parameter variable in the download precision 2 true if ∆t(ew|fw) ≤ threshold save t(ew|fw) to the front of t history (ew|fw)
27
4.2 Learning Lexical Translation Models
28
Number of iterations needed for the difference in t(e|f) for successive iterations to be < threshold: Note: simply duplicating the data doesn't decrease # iterations
29
4.2 Learning Lexical Translation Models Number of iterations needed for the difference in t(e|f) for successive iterations to be < threshold: "straight line"
30
4.2 Learning Lexical Translation Models Sentence translation probability: Assuming, no NULL e = English sentence f = foreign sentence p(e|f) = probability of translating f into e with any alignment ∏
31
4.2 Learning Lexical Translation Models ∏ (t(e 1 |f 1 )+t(e 1 |f 2 ))*(t(e 2 |f 1 )+t(e 2 |f 2 )
32
4.2 Learning Lexical Translation Models Perplexity: measured over all translation sentence pairs
33
4.3 Ensuring Fluent Output Language model small step (5m/2m) vs. little step (0.6m/0.26m) P(e) use an n-gram model for English translation sentence e
34
4.3 Ensuring Fluent Output Noisy channel model: Bayes Rule
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.