Natural Language Processing Expectation Maximization.

Natural Language Processing Expectation Maximization

Word Based Model How to translate a word → look up in dictionary – Haus — house, building, home, household, shell Multiple translations – some more frequent than others – for instance: house, and building most common – special cases: Haus of a snail is its shell

Collect Statistics Look at a parallel corpus (German text along with English translation)

Estimate Translation Probabilities

Alignment In a parallel text (or when we translate), we align words in one language with the words in the other

Alignment Function Formalizing alignment with an alignment function Mapping an English target word at position i to a German source word at position j with a function a : i → j Example a : {1 → 1, 2 → 2, 3 → 3, 4 → 4}

Reordering Words may be reordered during translation a : {1 → 3, 2 → 4, 3 → 2, 4 → 1}

One-to-Many Translation A source word may translate into multiple target words a : {1 → 1, 2 → 2, 3 → 3, 4 → 4, 5 → 4}

Dropping Words Words may be dropped when translated (German article das is dropped) a : {1 → 2, 2 → 3, 3 → 4}

Inserting Words Words may be added during translation – The English just does not have an equivalent in German – We still need to map it to something: special null token a : {1 → 1, 2 → 2, 3 → 3, 4 → 0, 5 → 4}

IBM Model 1 Generative model: break up translation process into smaller steps – IBM Model 1 only uses lexical translation Translation probability – for a foreign sentence f = (f 1,..., f lf ) of length l f – to an English sentence e = (e 1,..., e le ) of length l e – with an alignment of each English word e j to a foreign word f i according to the alignment function a : j → i

Example

Learning Lexical Translation Model We would like to estimate the lexical translation probabilities t(e|f) from a parallel corpus... but we do not have the alignments Chicken and egg problem – if we had the alignments, → we could estimate the parameters of our generative model – if we had the parameters, → we could estimate the alignments

EM Algorithm Incomplete data – if we had complete data, we could estimate model – if we had model, we could fill in the gaps in the data Expectation Maximization (EM) in a nutshell – initialize model parameters (e.g. uniform) – assign probabilities to the missing data – estimate model parameters from completed data – iterate steps 2–3 until convergence

EM Algorithm Initial step: all alignments equally likely Model learns that, e.g., la is often aligned with the

EM Algorithm After one iteration Alignments, e.g., between la and the are more likely

EM Algorithm Convergence Inherent hidden structure revealed by EM

EM Algorithm Parameter estimation from the aligned corpus

EM Algorithm EM Algorithm consists of two steps Expectation-Step: Apply model to the data – parts of the model are hidden (here: alignments) – using the model, assign probabilities to possible values Maximization-Step: Estimate model from data – take assigned values as fact – collect counts (weighted by probabilities) – estimate model from counts Iterate these steps until convergence

EM Algorithm We need to be able to compute: – Expectation-Step: probability of alignments – Maximization-Step: count collection

EM Algorithm

EM Algorithm : Expectation Step We need to compute p(a|e, f) Applying the chain rule: We already have the formula for p(e, a|f) (definition of Model 1)

EM Algorithm: Expectation Step We need to compute p(e|f)

EM Algorithm : Expectation Step Combining what we have:

EM Algorithm : Maximization Step Now we have to collect counts Evidence from a sentence pair E,F that word e is a translation of word f:

EM Algorithm : Maximization Step After collecting these counts over a corpus, we can estimate the model: Now – iterate until convergence

das Haus the house das Haus the house das Haus the house das Haus the house Counting: Expectation: Sentence 1

das Buch the book das Buch the book das Buch the book das Buch the book Counting: Expectation: Sentence 2

ein Buch a book ein Buch a book ein Buch a book ein Buch a book Counting: Expectation: Sentence 3

Maximization

Machine Translation Our translation model cannot decide between small and little Sometime one is preferred over the other: – small step: 2,070,000 occurrences in the Google index – little step: 257,000 occurrences in the Google index Language model – estimate how likely a string is English – based on n-gram statistics

Machine Translation We would like to integrate a language model Bayes rule

Natural Language Processing Expectation Maximization.

Similar presentations

Presentation on theme: "Natural Language Processing Expectation Maximization."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Natural Language Processing Expectation Maximization.

Similar presentations

Presentation on theme: "Natural Language Processing Expectation Maximization."— Presentation transcript:

Similar presentations

About project

Feedback