Presentation is loading. Please wait.

Presentation is loading. Please wait.

Natural Language Processing Expectation Maximization.

Similar presentations


Presentation on theme: "Natural Language Processing Expectation Maximization."— Presentation transcript:

1 Natural Language Processing Expectation Maximization

2 Word Based Model How to translate a word → look up in dictionary – Haus — house, building, home, household, shell Multiple translations – some more frequent than others – for instance: house, and building most common – special cases: Haus of a snail is its shell

3 Collect Statistics Look at a parallel corpus (German text along with English translation)

4 Estimate Translation Probabilities

5 Alignment In a parallel text (or when we translate), we align words in one language with the words in the other

6 Alignment Function Formalizing alignment with an alignment function Mapping an English target word at position i to a German source word at position j with a function a : i → j Example a : {1 → 1, 2 → 2, 3 → 3, 4 → 4}

7 Reordering Words may be reordered during translation a : {1 → 3, 2 → 4, 3 → 2, 4 → 1}

8 One-to-Many Translation A source word may translate into multiple target words a : {1 → 1, 2 → 2, 3 → 3, 4 → 4, 5 → 4}

9 Dropping Words Words may be dropped when translated (German article das is dropped) a : {1 → 2, 2 → 3, 3 → 4}

10 Inserting Words Words may be added during translation – The English just does not have an equivalent in German – We still need to map it to something: special null token a : {1 → 1, 2 → 2, 3 → 3, 4 → 0, 5 → 4}

11 IBM Model 1 Generative model: break up translation process into smaller steps – IBM Model 1 only uses lexical translation Translation probability – for a foreign sentence f = (f 1,..., f lf ) of length l f – to an English sentence e = (e 1,..., e le ) of length l e – with an alignment of each English word e j to a foreign word f i according to the alignment function a : j → i

12 Example

13 Learning Lexical Translation Model We would like to estimate the lexical translation probabilities t(e|f) from a parallel corpus... but we do not have the alignments Chicken and egg problem – if we had the alignments, → we could estimate the parameters of our generative model – if we had the parameters, → we could estimate the alignments

14 EM Algorithm Incomplete data – if we had complete data, we could estimate model – if we had model, we could fill in the gaps in the data Expectation Maximization (EM) in a nutshell – initialize model parameters (e.g. uniform) – assign probabilities to the missing data – estimate model parameters from completed data – iterate steps 2–3 until convergence

15 EM Algorithm Initial step: all alignments equally likely Model learns that, e.g., la is often aligned with the

16 EM Algorithm After one iteration Alignments, e.g., between la and the are more likely

17 EM Algorithm Convergence Inherent hidden structure revealed by EM

18 EM Algorithm Parameter estimation from the aligned corpus

19 EM Algorithm EM Algorithm consists of two steps Expectation-Step: Apply model to the data – parts of the model are hidden (here: alignments) – using the model, assign probabilities to possible values Maximization-Step: Estimate model from data – take assigned values as fact – collect counts (weighted by probabilities) – estimate model from counts Iterate these steps until convergence

20 EM Algorithm We need to be able to compute: – Expectation-Step: probability of alignments – Maximization-Step: count collection

21 EM Algorithm

22 EM Algorithm : Expectation Step We need to compute p(a|e, f) Applying the chain rule: We already have the formula for p(e, a|f) (definition of Model 1)

23 EM Algorithm: Expectation Step We need to compute p(e|f)

24 EM Algorithm : Expectation Step Combining what we have:

25 EM Algorithm : Maximization Step Now we have to collect counts Evidence from a sentence pair E,F that word e is a translation of word f:

26 EM Algorithm : Maximization Step After collecting these counts over a corpus, we can estimate the model: Now – iterate until convergence

27

28 das Haus the house das Haus the house das Haus the house das Haus the house Counting: Expectation: Sentence 1

29 das Buch the book das Buch the book das Buch the book das Buch the book Counting: Expectation: Sentence 2

30 ein Buch a book ein Buch a book ein Buch a book ein Buch a book Counting: Expectation: Sentence 3

31 Maximization

32 Machine Translation Our translation model cannot decide between small and little Sometime one is preferred over the other: – small step: 2,070,000 occurrences in the Google index – little step: 257,000 occurrences in the Google index Language model – estimate how likely a string is English – based on n-gram statistics

33 Machine Translation We would like to integrate a language model Bayes rule


Download ppt "Natural Language Processing Expectation Maximization."

Similar presentations


Ads by Google