Download presentation
Presentation is loading. Please wait.
Published byAmos Davis Modified over 9 years ago
1
Natural Language Processing Expectation Maximization
2
Word Based Model How to translate a word → look up in dictionary – Haus — house, building, home, household, shell Multiple translations – some more frequent than others – for instance: house, and building most common – special cases: Haus of a snail is its shell
3
Collect Statistics Look at a parallel corpus (German text along with English translation)
4
Estimate Translation Probabilities
5
Alignment In a parallel text (or when we translate), we align words in one language with the words in the other
6
Alignment Function Formalizing alignment with an alignment function Mapping an English target word at position i to a German source word at position j with a function a : i → j Example a : {1 → 1, 2 → 2, 3 → 3, 4 → 4}
7
Reordering Words may be reordered during translation a : {1 → 3, 2 → 4, 3 → 2, 4 → 1}
8
One-to-Many Translation A source word may translate into multiple target words a : {1 → 1, 2 → 2, 3 → 3, 4 → 4, 5 → 4}
9
Dropping Words Words may be dropped when translated (German article das is dropped) a : {1 → 2, 2 → 3, 3 → 4}
10
Inserting Words Words may be added during translation – The English just does not have an equivalent in German – We still need to map it to something: special null token a : {1 → 1, 2 → 2, 3 → 3, 4 → 0, 5 → 4}
11
IBM Model 1 Generative model: break up translation process into smaller steps – IBM Model 1 only uses lexical translation Translation probability – for a foreign sentence f = (f 1,..., f lf ) of length l f – to an English sentence e = (e 1,..., e le ) of length l e – with an alignment of each English word e j to a foreign word f i according to the alignment function a : j → i
12
Example
13
Learning Lexical Translation Model We would like to estimate the lexical translation probabilities t(e|f) from a parallel corpus... but we do not have the alignments Chicken and egg problem – if we had the alignments, → we could estimate the parameters of our generative model – if we had the parameters, → we could estimate the alignments
14
EM Algorithm Incomplete data – if we had complete data, we could estimate model – if we had model, we could fill in the gaps in the data Expectation Maximization (EM) in a nutshell – initialize model parameters (e.g. uniform) – assign probabilities to the missing data – estimate model parameters from completed data – iterate steps 2–3 until convergence
15
EM Algorithm Initial step: all alignments equally likely Model learns that, e.g., la is often aligned with the
16
EM Algorithm After one iteration Alignments, e.g., between la and the are more likely
17
EM Algorithm Convergence Inherent hidden structure revealed by EM
18
EM Algorithm Parameter estimation from the aligned corpus
19
EM Algorithm EM Algorithm consists of two steps Expectation-Step: Apply model to the data – parts of the model are hidden (here: alignments) – using the model, assign probabilities to possible values Maximization-Step: Estimate model from data – take assigned values as fact – collect counts (weighted by probabilities) – estimate model from counts Iterate these steps until convergence
20
EM Algorithm We need to be able to compute: – Expectation-Step: probability of alignments – Maximization-Step: count collection
21
EM Algorithm
22
EM Algorithm : Expectation Step We need to compute p(a|e, f) Applying the chain rule: We already have the formula for p(e, a|f) (definition of Model 1)
23
EM Algorithm: Expectation Step We need to compute p(e|f)
24
EM Algorithm : Expectation Step Combining what we have:
25
EM Algorithm : Maximization Step Now we have to collect counts Evidence from a sentence pair E,F that word e is a translation of word f:
26
EM Algorithm : Maximization Step After collecting these counts over a corpus, we can estimate the model: Now – iterate until convergence
28
das Haus the house das Haus the house das Haus the house das Haus the house Counting: Expectation: Sentence 1
29
das Buch the book das Buch the book das Buch the book das Buch the book Counting: Expectation: Sentence 2
30
ein Buch a book ein Buch a book ein Buch a book ein Buch a book Counting: Expectation: Sentence 3
31
Maximization
32
Machine Translation Our translation model cannot decide between small and little Sometime one is preferred over the other: – small step: 2,070,000 occurrences in the Google index – little step: 257,000 occurrences in the Google index Language model – estimate how likely a string is English – based on n-gram statistics
33
Machine Translation We would like to integrate a language model Bayes rule
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.