Computational Linguistics Seminar LING-696G Week 6.

Computational Linguistics Seminar LING-696G Week 6

Administrivia Next time (March 2 nd ) Return of our guest lecturer: – Bryan Heidorn, School of Information – ISTA 455/555: Applied NLP Today: – IBM Model 1: we begin the "meaty" part of the seminar course

Chapter 4 errata p. 84: Text before Equation 4.5 should read at position j to a German output at position i. The indexes are erroneously flipped in the book. p. 90: Text before Equation 4.13 should read the same simplifications as in Equation (4.10). The book makes an erroneous reference to Equation 4.11. p. 91: The two end for on line 12 and 13 should be one indentation to the left. p. 91: Left hand side of equation 4.14 should be instead of. p. 93: The first sum should be a product. p. 93: The result of the computation of Equation 4.20 should be instead of. p. 93: The text should clarify that in Equation 4.21 perplexity is computed over the whole corpus, and not the average per sentence.

4.2 Learning Lexical Translation Models Expectation Maximation (EM) algorithm IBM Model 1 worked example Critical Ideas: – Begin with no assumptions (equal probability of translation) – Run training data over and over and over again… – Each time, update the probability – Convergence

4.2 Learning Lexical Translation Models e = English sentence e 1 …e l_e f = foreign sentence f 1 …f l_f a = a (specific) alignment p(e,a|f) = probability of translating f into e with alignment a t(e 1 |f a(1) ) t(e 2 |f a(2) ) … t(e l_e |f a(l_e) ) scaling by # of possible alignments: assuming each of NULL + f j can be freely mapped onto any single e i

4.2 Learning Lexical Translation Models e = English sentence e 1 …e l_e f = foreign sentence f 1 …f l_f p(e|f) = probability of translating f into e with any alignment

4.2 Learning Lexical Translation Models p(a|e,f) = probability of alignment a given translation sentences e and f

4.2 Learning Lexical Translation Models

t(e|f)

Words: – (das,ein,Haus,Buch) – (the,a,house,book) Corpus: – das Haus/the house – das Buch/the book – ein Buch/a book Initialization step: – each German word can be translated into one of 4 English words, so let translation probability t(e i |f j ) = 0.25 for all combinations of e i and f j

File: ibm1s.py (Simplified IBM Model 1)

4.2 Learning Lexical Translation Models ∑ j t(ew|fw j ) t(ew|fw) / ∑ j t(ew|fw j )

4.2 Learning Lexical Translation Models hardcoded here, a parameter variable in the download precision 2 true if ∆t(ew|fw) ≤ threshold save t(ew|fw) to the front of t history (ew|fw)

Number of iterations needed for the difference in t(e|f) for successive iterations to be < threshold: Note: simply duplicating the data doesn't decrease # iterations

4.2 Learning Lexical Translation Models Number of iterations needed for the difference in t(e|f) for successive iterations to be < threshold: "straight line"

4.2 Learning Lexical Translation Models Sentence translation probability: Assuming, no NULL e = English sentence f = foreign sentence p(e|f) = probability of translating f into e with any alignment ∏

4.2 Learning Lexical Translation Models ∏ (t(e 1 |f 1 )+t(e 1 |f 2 ))*(t(e 2 |f 1 )+t(e 2 |f 2 )

4.2 Learning Lexical Translation Models Perplexity: measured over all translation sentence pairs

4.3 Ensuring Fluent Output Language model small step (5m/2m) vs. little step (0.6m/0.26m) P(e) use an n-gram model for English translation sentence e

4.3 Ensuring Fluent Output Noisy channel model: Bayes Rule

Computational Linguistics Seminar LING-696G Week 6.

Similar presentations

Presentation on theme: "Computational Linguistics Seminar LING-696G Week 6."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computational Linguistics Seminar LING-696G Week 6.

Similar presentations

Presentation on theme: "Computational Linguistics Seminar LING-696G Week 6."— Presentation transcript:

Similar presentations

About project

Feedback