NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.

Machine Translation

Source-channel model of communication Parametric probabilistic models of language and translation

Given f, guess e e f E  F encoder e’ F  E decoder e’ = argmax P(e|f) = argmax P(f|e) P(e) e e translation modellanguage model

p(e)p(f|e)p(e)*p(f|e) a flower red red flower a flower red a a red dog dog cat mouse a red flower Translate from French: “une fleur rouge”?

p(e)p(f|e)p(e)*p(f|e) a flower redlowhighlow red flower a flower red a a red dog dog cat mouse a red flower Translate from French: “une fleur rouge”?

p(e)p(f|e)p(e)*p(f|e) a flower redlowhighlow red flower alowhighlow flower red a a red dog dog cat mouse a red flower Translate from French: “une fleur rouge”?

p(e)p(f|e)p(e)*p(f|e) a flower redlowhighlow red flower alowhighlow flower red alowhighlow a red dog dog cat mouse a red flower Translate from French: “une fleur rouge”?

p(e)p(f|e)p(e)*p(f|e) a flower redlowhighlow red flower alowhighlow flower red alowhighlow a red doghighlow dog cat mouselow a red flowerhigh Translate from French: “une fleur rouge”?

p(Chinese|English) x p(English) ~p(English|Chinese)

Text-to-text (summarization) –also text-to-signal, speech recognition, OCR, spelling correction Example (OCR) –P(text|pixels) = P(text) P(pixels|text)

I watched an interesting play I watched watched an interesting play play play I watched watched an play play play interesting J’ ai vu une pièce de théâtre intéressante

Word translation Local alignment Fertilities Class-based alignment Non-deficient algorithm (avoid overlaps, overflow)

Tokenization Sentence alignment (1-1, 2-2, 2-1 mappings) –Church and Gale (based on sentence length) –Church (sequences of 4-grams) – based on cognates

[Church/Gale 1993]

Alignments –La maison bleue –The blue house –Alignments: {1,2,3}, {1,3,2}, {1,3,3}, {1,1,1} –All are equally likely Conditional probabilities –P(f|A,e) = ?

Algorithm –Pick length of translation –Choose an alignment –Pick the French words –That gives you P(f,A|e) –We need P(f|A,e) –Use EM (expectation-maximization) to find the hidden variables

We need p(f|e) but we don’t know the word alignments (which are assumed to be equally likely)

green house the house casa verde la casa Corpus: Uniform translation model:

E-step 1: compute the expected counts E[count(t(f|e))] for all word pairs (f j,e aj ) E-step 1a: compute P(a,f|e) by multiplying all t probabilities using E-step 1b: normalize P(a,f|e) to get P(a|e,f) using E-step 1c: compute expected fractional counts, by weighting each count by P(a|e,f)

M-step 1: Compute the MLE probability params by normalizing the tcounts to sum to 1. E-step 2a: Recompute P(a,f|e) again by multiplying the t probabilities More iterations are needed (until convergence)

Distortion parameters D(i|j,l,m) –i and j are words in the two sentences –l and m are the lengths of these sentences Example –D(“boy”|”garçon”,5,6)

Fertility P(  i |e) Examples –(a) play = pièce de théâtre –(to) place = mettre en place p 1 is an extra parameter that defines  0

http://www.isi.edu/natural-language/mt/wkbk.rtf (an awesome tutorial by Kevin Knight) http://www.statmt.org/ (a comprehensive site, including references to the old IBM papers, pointers to Moses, etc.)

NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.

Similar presentations

Presentation on theme: "NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.

Similar presentations

Presentation on theme: "NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation."— Presentation transcript:

Similar presentations

About project

Feedback