Download presentation
Presentation is loading. Please wait.
Published byHolger Lorenzen Modified over 5 years ago
1
CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011
2
Outline for today Interactive Session ! Brief Review of MT Examples
Brief EM review
3
Statistical Machine Translation
P(e|f) = P(f|e)*P(e)/P(f) maxe P(e|f) = maxe (P(f|e)*P(e)) Language Models (P(e)) help alleviate shortcomings of P(f|e)
4
Concepts Translation Probabilities (t) Distortion Probabilities (d)
Fertility (ø) NULL
5
PA2 Requirements Naïve Model IBM Model 1 IBM Model 2
Integration with Decoder
6
IBM Model 1 Simplest of the IBM models
Does not consider word order (bag-of-words approach) Does not model one-to-many alignments Computationally inexpensive Useful for parameter estimations that are passed on to more elaborate models
7
IBM Model 1 We only learn the translation probabilities.
8
IBM Model 1 Steps Initialize the probabilities uniformly. E-Step
M –Step Calculate Repeat until convergence Let’s do an example
9
IBM Model 2 In model two we learn translation probabilities and also distortion probabilities.
10
IBM Model 2 IBM Model 2 tries to learn the alignment probabilities in addition to the translation probabilities. The alignment probabilities are handled at an abstract level, by grouping alignment pairs into buckets. Let the number of buckets be N (indexed from 0:N-1) For a pair , let n = ,the pair is placed is bucket n if n<N-1 or in the Nth bucket if n>=N.
11
IBM Model 2 In Model 2, during the EM step we also collect fractional counts of each bucket and subsequently normalize the same to have a true probability distribution. Many possible implementations – Variable number of Buckets Signed Buckets Hand Fixed Weights
12
EM Revisited Similar to k-means Soft Count v/s Hard Counts
13
Tips Start Early Read Knight’s Tutorial
Plan your approach before you start
14
Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.