Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Linguistics Seminar LING-696G Week 6.

Similar presentations


Presentation on theme: "Computational Linguistics Seminar LING-696G Week 6."— Presentation transcript:

1 Computational Linguistics Seminar LING-696G Week 6

2 Administrivia Next time (March 2 nd ) Return of our guest lecturer: – Bryan Heidorn, School of Information – ISTA 455/555: Applied NLP Today: – IBM Model 1: we begin the "meaty" part of the seminar course

3 Chapter 4 errata p. 84: Text before Equation 4.5 should read at position j to a German output at position i. The indexes are erroneously flipped in the book. p. 90: Text before Equation 4.13 should read the same simplifications as in Equation (4.10). The book makes an erroneous reference to Equation 4.11. p. 91: The two end for on line 12 and 13 should be one indentation to the left. p. 91: Left hand side of equation 4.14 should be instead of. p. 93: The first sum should be a product. p. 93: The result of the computation of Equation 4.20 should be instead of. p. 93: The text should clarify that in Equation 4.21 perplexity is computed over the whole corpus, and not the average per sentence.

4 4.2 Learning Lexical Translation Models Expectation Maximation (EM) algorithm IBM Model 1 worked example Critical Ideas: – Begin with no assumptions (equal probability of translation) – Run training data over and over and over again… – Each time, update the probability – Convergence

5 4.2 Learning Lexical Translation Models e = English sentence e 1 …e l_e f = foreign sentence f 1 …f l_f a = a (specific) alignment p(e,a|f) = probability of translating f into e with alignment a t(e 1 |f a(1) ) t(e 2 |f a(2) ) … t(e l_e |f a(l_e) ) scaling by # of possible alignments: assuming each of NULL + f j can be freely mapped onto any single e i

6 4.2 Learning Lexical Translation Models e = English sentence e 1 …e l_e f = foreign sentence f 1 …f l_f p(e|f) = probability of translating f into e with any alignment

7 4.2 Learning Lexical Translation Models p(a|e,f) = probability of alignment a given translation sentences e and f

8 4.2 Learning Lexical Translation Models

9 t(e|f)

10 4.2 Learning Lexical Translation Models

11 Words: – (das,ein,Haus,Buch) – (the,a,house,book) Corpus: – das Haus/the house – das Buch/the book – ein Buch/a book Initialization step: – each German word can be translated into one of 4 English words, so let translation probability t(e i |f j ) = 0.25 for all combinations of e i and f j

12 4.2 Learning Lexical Translation Models Assume free alignment: das Haus/the house 0.25 t(the|das) 0.25t(house|das) 0.25t(the|Haus) 0.25t(house|Haus) das Buch/the book 0.25 t(the|das) 0.25t(book|das) 0.25t(the|Buch) 0.25t(book|Buch) ein Buch/a book 0.25t(a|ein) 0.25t(book|ein) 0.25t(a|Buch) 0.25t(book|Buch) t(the|f j ) = 0.5, t(house|f j ) = 0.5 c(the|das) = 0.25/0.5, c(the|Haus) = 0.25/0.5 c(house|das) = 0.25/0.5, c(house|Haus) = 0.25/0.5 t(the|f j ) = 0.5, t(book|f j ) = 0.5 c(the|das) = 0.5 + 0.25/0.5, c(the|Buch) = 0.25/0.5 c(book|das) = 0.25/0.5, c(book|Buch) = 0.25/0.5 t(a|f j ) = 0.5, t(book|f j ) = 0.5 c(a|ein) = 0.25/0.5, c(a|Buch) = 0.25/0.5 c(book|ein) = 0.25/0.5, c(book|Buch) = 0.5 +0.25/0.5

13 4.2 Learning Lexical Translation Models Update probabilities: c(the|Haus) = 0.5 c(house|das) = 0.5 c(house|Haus) = 0.5 c(the|das) = 1.0 c(the|Buch) = 0.5 c(book|das) = 0.5 c(a|ein) = 0.5 c(a|Buch) = 0.5 c(book|ein) = 0.5 c(book|Buch) = 1.0 total(Haus) = 1.0 total(Buch) = 2.0 total(das) = 2.0 total(ein) = 1.0 das Haus/the house 0.5 t(the|das) 0.25t(house|das) 0.5t(the|Haus) 0.5t(house|Haus) das Buch/the book 0.5 t(the|das) 0.25t(book|das) 0.25t(the|Buch) 0.5t(book|Buch) ein Buch/a book 0.5t(a|ein) 0.5t(book|ein) 0.25t(a|Buch) 0.5t(book|Buch) t(e i |f j ) = c(e i |f j ) / total(f j ) translation of das 0.5 t(the|das) 0.25t(house|das) 0.25t(book|das) translation of Buch 0.25t(the|Buch) 0.5t(book|Buch) 0.25t(a|Buch) translation of Haus 0.5t(the|Haus) 0.5t(house|Haus) translation of ein 0.5t(a|ein) 0.5t(book|ein)

14 4.2 Learning Lexical Translation Models 2 nd iteration: das Haus/the house 0.5 t(the|das) 0.25t(house|das) 0.5t(the|Haus) 0.5t(house|Haus) das Buch/the book 0.5 t(the|das) 0.25t(book|das) 0.25t(the|Buch) 0.5t(book|Buch) ein Buch/a book 0.5t(a|ein) 0.5t(book|ein) 0.25t(a|Buch) 0.5t(book|Buch) t(the|f j ) = 1.0, t(house|f j ) = 0.75 c(the|das) = 0.5/1.0, c(the|Haus) = 0.5/1.0 c(house|das) = 0.25/0.75, c(house|Haus) = 0.5/0.75 t(the|f j ) = 0.75, t(book|f j ) = 0.75 c(the|das) = 0.5 + 0.5/0.75, c(the|Buch) = 0.25/0.75 c(book|das) = 0.25/0.75, c(book|Buch) = 0.5/0.75 t(a|f j ) = 0.75, t(book|f j ) = 1.0 c(a|ein) = 0.5/0.75, c(a|Buch) = 0.25/0.75 c(book|ein) = 0.5/1.0, c(book|Buch) = 0.67 +0.5/1.0

15 4.2 Learning Lexical Translation Models Update probabilities (2 nd iteration): c(the|Haus) = 0.5 c(house|das) = 0.33 c(house|Haus) = 0.67 c(the|das) = 1.17 c(the|Buch) = 0.33 c(book|das) = 0.33 c(a|ein) = 0.67 c(a|Buch) = 0.33 c(book|ein) = 0.5 c(book|Buch) = 1.17 total(Haus) = 1.17 total(Buch) = 1.83 total(das) = 1.83 total(ein) = 1.17 das Haus/the house 0.64 t(the|das) 0.18t(house|das) 0.43t(the|Haus) 0.57t(house|Haus) das Buch/the book 0.64t(the|das) 0.18t(book|das) 0.18t(the|Buch) 0.64t(book|Buch) ein Buch/a book 0.57t(a|ein) 0.43t(book|ein) 0.18t(a|Buch) 0.64t(book|Buch) t(e i |f j ) = c(e i |f j ) / total(f j ) translation of das 0.64 t(the|das) 0.18t(house|das) 0.18t(book|das) translation of Buch 0.18t(the|Buch) 0.64t(book|Buch) 0.18t(a|Buch) translation of Haus 0.43t(the|Haus) 0.57t(house|Haus) translation of ein 0.57t(a|ein) 0.43t(book|ein)

16 4.2 Learning Lexical Translation Models 3 rd iteration: das Haus/the house 0.64 t(the|das) 0.18t(house|das) 0.43t(the|Haus) 0.57t(house|Haus) das Buch/the book 0.64t(the|das) 0.18t(book|das) 0.18t(the|Buch) 0.64t(book|Buch) ein Buch/a book 0.57t(a|ein) 0.43t(book|ein) 0.18t(a|Buch) 0.64t(book|Buch) t(the|f j ) = 1.06, t(house|f j ) = 0.75 c(the|das) = 0.64/1.06, c(the|Haus) = 0.43/1.06 c(house|das) = 0.18/0.75, c(house|Haus) = 0.57/0.75 t(the|f j ) = 0.82, t(book|f j ) = 0.82 c(the|das) = 0.6 + 0.64/0.82, c(the|Buch) = 0.18/0.82 c(book|das) = 0.18/0.82, c(book|Buch) = 0.64/0.82 t(a|f j ) = 0.75, t(book|f j ) = 1.06 c(a|ein) = 0.57/0.75, c(a|Buch) = 0.18/0.75 c(book|ein) = 0.43/1.06, c(book|Buch) = 0.78 +0.64/1.06

17 4.2 Learning Lexical Translation Models Update probabilities (3 rd iteration): c(the|Haus) = 0.4 c(house|das) = 0.24 c(house|Haus) = 0.76 c(the|das) = 1.38 c(the|Buch) = 0.22 c(book|das) = 0.22 c(a|ein) = 0.76 c(a|Buch) = 0.24 c(book|ein) = 0.4 c(book|Buch) = 1.38 total(Haus) = 1.16 total(Buch) = 1.84 total(das) = 1.84 total(ein) = 1.16 das Haus/the house 0.75 t(the|das) 0.13t(house|das) 0.35t(the|Haus) 0.65t(house|Haus) das Buch/the book 0.75t(the|das) 0.12t(book|das) 0.12t(the|Buch) 0.75t(book|Buch) ein Buch/a book 0.65t(a|ein) 0.35t(book|ein) 0.13t(a|Buch) 0.75t(book|Buch) t(e i |f j ) = c(e i |f j ) / total(f j ) translation of das 0.75 t(the|das) 0.13t(house|das) 0.12t(book|das) translation of Buch 0.12t(the|Buch) 0.75t(book|Buch) 0.13t(a|Buch) translation of Haus 0.35t(the|Haus) 0.65t(house|Haus) translation of ein 0.65t(a|ein) 0.35t(book|ein)

18 4.2 Learning Lexical Translation Models 4 th iteration: das Haus/the house 0.64 t(the|das) 0.18t(house|das) 0.43t(the|Haus) 0.57t(house|Haus) das Buch/the book 0.64t(the|das) 0.18t(book|das) 0.18t(the|Buch) 0.64t(book|Buch) ein Buch/a book 0.57t(a|ein) 0.43t(book|ein) 0.18t(a|Buch) 0.64t(book|Buch) das Haus/the house 0.75 t(the|das) 0.13t(house|das) 0.35t(the|Haus) 0.65t(house|Haus) das Buch/the book 0.75t(the|das) 0.12t(book|das) 0.12t(the|Buch) 0.75t(book|Buch) ein Buch/a book 0.65t(a|ein) 0.35t(book|ein) 0.13t(a|Buch) 0.75t(book|Buch) das Haus/the house 0.5 t(the|das) 0.25t(house|das) 0.5t(the|Haus) 0.5t(house|Haus) das Buch/the book 0.5 t(the|das) 0.25t(book|das) 0.25t(the|Buch) 0.5t(book|Buch) ein Buch/a book 0.5t(a|ein) 0.5t(book|ein) 0.25t(a|Buch) 0.5t(book|Buch) das Haus/the house 0.25 t(the|das) 0.25t(house|das) 0.25t(the|Haus) 0.25t(house|Haus) das Buch/the book 0.25 t(the|das) 0.25t(book|das) 0.25t(the|Buch) 0.25t(book|Buch) ein Buch/a book 0.25t(a|ein) 0.25t(book|ein) 0.25t(a|Buch) 0.25t(book|Buch)

19 4.2 Learning Lexical Translation Models

20

21

22

23

24 File: ibm1s.py (Simplified IBM Model 1)

25 4.2 Learning Lexical Translation Models ∑ j t(ew|fw j ) t(ew|fw) / ∑ j t(ew|fw j )

26 4.2 Learning Lexical Translation Models hardcoded here, a parameter variable in the download precision 2 true if ∆t(ew|fw) ≤ threshold save t(ew|fw) to the front of t history (ew|fw)

27 4.2 Learning Lexical Translation Models

28 Number of iterations needed for the difference in t(e|f) for successive iterations to be < threshold: Note: simply duplicating the data doesn't decrease # iterations

29 4.2 Learning Lexical Translation Models Number of iterations needed for the difference in t(e|f) for successive iterations to be < threshold: "straight line"

30 4.2 Learning Lexical Translation Models Sentence translation probability: Assuming, no NULL e = English sentence f = foreign sentence p(e|f) = probability of translating f into e with any alignment ∏

31 4.2 Learning Lexical Translation Models ∏ (t(e 1 |f 1 )+t(e 1 |f 2 ))*(t(e 2 |f 1 )+t(e 2 |f 2 )

32 4.2 Learning Lexical Translation Models Perplexity: measured over all translation sentence pairs

33 4.3 Ensuring Fluent Output Language model small step (5m/2m) vs. little step (0.6m/0.26m) P(e) use an n-gram model for English translation sentence e

34 4.3 Ensuring Fluent Output Noisy channel model: Bayes Rule


Download ppt "Computational Linguistics Seminar LING-696G Week 6."

Similar presentations


Ads by Google