Download presentation
Presentation is loading. Please wait.
Published byGodwin Stone Modified over 8 years ago
1
Spring 2010 Lecture 2 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn, Kevin Knight, Chris Quirk LING 575: Seminar on statistical machine translation
2
Overview Centauri/Arcturan puzzle Word level translation models IBM Model 1 IBM Model 2 HMM Model IBM Model 3 IBM Model 4 & 5 (brief overview) Word alignment evaluation Definition Measures Symmetrization Translation using noisy channel
3
Centauri/Arcturan [Knight, 1997] Think how to translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
4
1a. ok-voon ororok sprok. 1b. at-voon bichat dat. 7a. lalok farok ororok lalok sprok izok enemok. 7b. wat jjat bichat wat dat vat eneat. 2a. ok-drubel ok-voon anok plok sprok. 2b. at-drubel at-voon pippat rrat dat. 8a. lalok brok anok plok nok. 8b. iat lat pippat rrat nnat. 3a. erok sprok izok hihok ghirok. 3b. totat dat arrat vat hilat. 9a. wiwok nok izok kantok ok-yurp. 9b. totat nnat quat oloat at-yurp. 4a. ok-voon anok drok brok jok. 4b. at-voon krat pippat sat lat. 10a. lalok mok nok yorok ghirok clok. 10b. wat nnat gat mat bat hilat. 5a. wiwok farok izok stok. 5b. totat jjat quat cat. 11a. lalok nok crrrok hihok yorok zanzanok. 11b. wat nnat arrat mat zanzanat. 6a. lalok sprok izok jok stok. 6b. wat dat krat quat cat. 12a. lalok rarok nok izok hihok mok. 12b. wat nnat forat arrat vat gat. Centauri/Arcturan [Knight, 1997] Think how to translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
5
1a. ok-voon ororok sprok. 1b. at-voon bichat dat. 7a. lalok farok ororok lalok sprok izok enemok. 7b. wat jjat bichat wat dat vat eneat. 2a. ok-drubel ok-voon anok plok sprok. 2b. at-drubel at-voon pippat rrat dat. 8a. lalok brok anok plok nok. 8b. iat lat pippat rrat nnat. 3a. erok sprok izok hihok ghirok. 3b. totat dat arrat vat hilat. 9a. wiwok nok izok kantok ok-yurp. 9b. totat nnat quat oloat at-yurp. 4a. ok-voon anok drok brok jok. 4b. at-voon krat pippat sat lat. 10a. lalok mok nok yorok ghirok clok. 10b. wat nnat gat mat bat hilat. 5a. wiwok farok izok stok. 5b. totat jjat quat cat. 11a. lalok nok crrrok hihok yorok zanzanok. 11b. wat nnat arrat mat zanzanat. 6a. lalok sprok izok jok stok. 6b. wat dat krat quat cat. 12a. lalok rarok nok izok hihok mok. 12b. wat nnat forat arrat vat gat. Centauri/Arcturan [Knight, 1997] Think how to translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
6
1a. ok-voon ororok sprok. 1b. at-voon bichat dat. 7a. lalok farok ororok lalok sprok izok enemok. 7b. wat jjat bichat wat dat vat eneat. 2a. ok-drubel ok-voon anok plok sprok. 2b. at-drubel at-voon pippat rrat dat. 8a. lalok brok anok plok nok. 8b. iat lat pippat rrat nnat. 3a. erok sprok izok hihok ghirok. 3b. totat dat arrat vat hilat. 9a. wiwok nok izok kantok ok-yurp. 9b. totat nnat quat oloat at-yurp. 4a. ok-voon anok drok brok jok. 4b. at-voon krat pippat sat lat. 10a. lalok mok nok yorok ghirok clok. 10b. wat nnat gat mat bat hilat. 5a. wiwok farok izok stok. 5b. totat jjat quat cat. 11a. lalok nok crrrok hihok yorok zanzanok. 11b. wat nnat arrat mat zanzanat. 6a. lalok sprok izok jok stok. 6b. wat dat krat quat cat. 12a. lalok rarok nok izok hihok mok. 12b. wat nnat forat arrat vat gat. Centauri/Arcturan [Knight, 1997] Think how to translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
7
1a. ok-voon ororok sprok. 1b. at-voon bichat dat. 7a. lalok farok ororok lalok sprok izok enemok. 7b. wat jjat bichat wat dat vat eneat. 2a. ok-drubel ok-voon anok plok sprok. 2b. at-drubel at-voon pippat rrat dat. 8a. lalok brok anok plok nok. 8b. iat lat pippat rrat nnat. 3a. erok sprok izok hihok ghirok. 3b. totat dat arrat vat hilat. 9a. wiwok nok izok kantok ok-yurp. 9b. totat nnat quat oloat at-yurp. 4a. ok-voon anok drok brok jok. 4b. at-voon krat pippat sat lat. 10a. lalok mok nok yorok ghirok clok. 10b. wat nnat gat mat bat hilat. 5a. wiwok farok izok stok. 5b. totat jjat quat cat. 11a. lalok nok crrrok hihok yorok zanzanok. 11b. wat nnat arrat mat zanzanat. 6a. lalok sprok izok jok stok. 6b. wat dat krat quat cat. 12a. lalok rarok nok izok hihok mok. 12b. wat nnat forat arrat vat gat. Centauri/Arcturan [Knight, 1997] Think how to translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
8
1a. ok-voon ororok sprok. 1b. at-voon bichat dat. 7a. lalok farok ororok lalok sprok izok enemok. 7b. wat jjat bichat wat dat vat eneat. 2a. ok-drubel ok-voon anok plok sprok. 2b. at-drubel at-voon pippat rrat dat. 8a. lalok brok anok plok nok. 8b. iat lat pippat rrat nnat. 3a. erok sprok izok hihok ghirok. 3b. totat dat arrat vat hilat. 9a. wiwok nok izok kantok ok-yurp. 9b. totat nnat quat oloat at-yurp. 4a. ok-voon anok drok brok jok. 4b. at-voon krat pippat sat lat. 10a. lalok mok nok yorok ghirok clok. 10b. wat nnat gat mat bat hilat. 5a. wiwok farok izok stok. 5b. totat jjat quat cat. 11a. lalok nok crrrok hihok yorok zanzanok. 11b. wat nnat arrat mat zanzanat. 6a. lalok sprok izok jok stok. 6b. wat dat krat quat cat. 12a. lalok rarok nok izok hihok mok. 12b. wat nnat forat arrat vat gat. Centauri/Arcturan [Knight, 1997] Think how to translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
9
1a. ok-voon ororok sprok. 1b. at-voon bichat dat. 7a. lalok farok ororok lalok sprok izok enemok. 7b. wat jjat bichat wat dat vat eneat. 2a. ok-drubel ok-voon anok plok sprok. 2b. at-drubel at-voon pippat rrat dat. 8a. lalok brok anok plok nok. 8b. iat lat pippat rrat nnat. 3a. erok sprok izok hihok ghirok. 3b. totat dat arrat vat hilat. 9a. wiwok nok izok kantok ok-yurp. 9b. totat nnat quat oloat at-yurp. 4a. ok-voon anok drok brok jok. 4b. at-voon krat pippat sat lat. 10a. lalok mok nok yorok ghirok clok. 10b. wat nnat gat mat bat hilat. 5a. wiwok farok izok stok. 5b. totat jjat quat cat. 11a. lalok nok crrrok hihok yorok zanzanok. 11b. wat nnat arrat mat zanzanat. 6a. lalok sprok izok jok stok. 6b. wat dat krat quat cat. 12a. lalok rarok nok izok hihok mok. 12b. wat nnat forat arrat vat gat. Centauri/Arcturan [Knight, 1997] Think how to translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
10
1a. ok-voon ororok sprok. 1b. at-voon bichat dat. 7a. lalok farok ororok lalok sprok izok enemok. 7b. wat jjat bichat wat dat vat eneat. 2a. ok-drubel ok-voon anok plok sprok. 2b. at-drubel at-voon pippat rrat dat. 8a. lalok brok anok plok nok. 8b. iat lat pippat rrat nnat. 3a. erok sprok izok hihok ghirok. 3b. totat dat arrat vat hilat. 9a. wiwok nok izok kantok ok-yurp. 9b. totat nnat quat oloat at-yurp. 4a. ok-voon anok drok brok jok. 4b. at-voon krat pippat sat lat. 10a. lalok mok nok yorok ghirok clok. 10b. wat nnat gat mat bat hilat. 5a. wiwok farok izok stok. 5b. totat jjat quat cat. 11a. lalok nok crrrok hihok yorok zanzanok. 11b. wat nnat arrat mat zanzanat. 6a. lalok sprok izok jok stok. 6b. wat dat krat quat cat. 12a. lalok rarok nok izok hihok mok. 12b. wat nnat forat arrat vat gat. Centauri/Arcturan [Knight, 1997] Think how to translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp
11
1a. ok-voon ororok sprok. 1b. at-voon bichat dat. 7a. lalok farok ororok lalok sprok izok enemok. 7b. wat jjat bichat wat dat vat eneat. 2a. ok-drubel ok-voon anok plok sprok. 2b. at-drubel at-voon pippat rrat dat. 8a. lalok brok anok plok nok. 8b. iat lat pippat rrat nnat. 3a. erok sprok izok hihok ghirok. 3b. totat dat arrat vat hilat. 9a. wiwok nok izok kantok ok-yurp. 9b. totat nnat quat oloat at-yurp. 4a. ok-voon anok drok brok jok. 4b. at-voon krat pippat sat lat. 10a. lalok mok nok yorok ghirok clok. 10b. wat nnat gat mat bat hilat. 5a. wiwok farok izok stok. 5b. totat jjat quat cat. 11a. lalok nok crrrok hihok yorok zanzanok. 11b. wat nnat arrat mat zanzanat. 6a. lalok sprok izok jok stok. 6b. wat dat krat quat cat. 12a. lalok rarok nok izok hihok mok. 12b. wat nnat forat arrat vat gat. Centauri/Arcturan [Knight, 1997] Think how to translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp jjat arrat mat bat oloat at-yurp
12
It was really Spanish/English 1a. Garcia and associates. 1b. Garcia y asociados. 7a. the clients and the associates are enemies. 7b. los clients y los asociados son enemigos. 2a. Carlos Garcia has three associates. 2b. Carlos Garcia tiene tres asociados. 8a. the company has three groups. 8b. la empresa tiene tres grupos. 3a. his associates are not strong. 3b. sus asociados no son fuertes. 9a. its groups are in Europe. 9b. sus grupos estan en Europa. 4a. Garcia has a company also. 4b. Garcia tambien tiene una empresa. 10a. the modern groups sell strong pharmaceuticals. 10b. los grupos modernos venden medicinas fuertes. 5a. its clients are angry. 5b. sus clientes estan enfadados. 11a. the groups do not sell zenzanine. 11b. los grupos no venden zanzanina. 6a. the associates are also angry. 6b. los asociados tambien estan enfadados. 12a. the small groups are not modern. 12b. los grupos pequenos no son modernos. Translate: Clients do not sell pharmaceuticals in Europe.
13
Principles applied Derive word-level correspondences between sentences Prefer one-to-one translation Prefer consistent translation (small number of senses) Prefer monotone translation Words can be dropped Look at target sentences to estimate fluency
14
Word-based translation models
15
Word-level translation models The IBM word translation models assign a probability to a target sentence e given a source sentence f, using word-level correspondences We will discuss the following models IBM Model 1 IBM Model 2 HMM Model (not an IBM model but related) IBM Model 3 IBM Models 4 & 5 (only briefly)
16
Alignment in IBM Models 1 source word for each target For every target word token e at position j there exists a unique source word token f at position i such that f is a translation of e
17
Alignment function
18
Words may be reordered The alignment does not need to be monotone: can have crossing correspondences
19
One-to-many translation A source word may correspond to more than one target word
20
Deleting words Not all words from the source need to have a corresponding target word (some source words are deleted) The German article das may be dropped
21
Inserting words Some target words may not correspond to any word in the source A special NULL token is introduced at position 0; it is aligned to all target words that are inserted
22
Disadvantage of Alignment Function The IBM models and HMM use this definition of translation correspondence Problem: Cannot represent one target word token corresponding to multiple source word tokens E.g. German target, English source very small house klitzeklein Haus More general alignment: each target word token corresponds to a set of source word tokens
23
IBM Model 1 012 12 012 12 012 12 012 12 9
25
Generative process for IBM-1 le=4 select a(1) with a(i|4)=0.2 1 134 select a(2) with a(i|4)=0.2 select a(3) with a(i|4)=0.2 select a(4) with a(i|4)=0.2 theissmallthe
26
IBM Model 1 Target words are dependent only on their corresponding source words, not on any other source or target words Only parameters of model
27
Example
28
IBM Model 1 translation probability Using law of total probability.
29
How to estimate parameters If we observe parallel sentences with alignments, can estimate lexical probability through relative frequency This is maximum likelihood estimation for multinomials (remember homework assignments from 570)
30
Estimating parameters with incomplete data We don’t have parallel sentence with word alignments Alignments are hidden (data is incomplete) We can still estimate the model parameters by maximum likelihood Not as straightforward as counting and normalizing but not too bad EM algorithm: a simple, intuitive method to maximize likelihood Other general non-linear optimization algorithms (projected gradient, LBFGS, etc. )
31
EM algorithm Incomplete data If we had complete data, we could estimate model parameters If we had model parameters, we could compute probabilities of missing data (hidden variables) Expectation Maximization (EM) in a nutshell Initialize model parameters (e.g. uniform or break symmetries) Assign probabilities to missing data Estimate new parameters given completed data Iterate until convergence
32
EM Example
35
Convergence after several iterations
36
EM for IBM Model 1
37
EM for IBM 1 example Ignoring the NULL word in the source for simplicity. Also ignoring a constant factor (independent of a) for each alignment.
38
EM for IBM Model 1
39
Doesn’t look easy to sum up: exponentially many things to add!
40
EM for IBM Model 1
41
Re-arranging the sum Due to strong independence assumptions we can sum over the alignments efficiently.
42
Collecting counts for M-step Here is our final expression for the probability of an alignment given a sentence pair.
43
Collecting counts for M-step The expected count for word f translating to word e given sentence pair e,f: Can be efficiently computed as follows, using similar rearranging:
44
M-step for IBM Model 1 After collecting counts from all sentence pairs, we add them up and re-normalize to get new lexical translation probabilities:
45
IBM Model 2
47
Generative process for IBM-2 le=4 select a(1) with a(i|1,4,4) 1 134 select a(2) with a(i|2,4,4) select a(3) with a(i|3,4,4) select a(4) with a(i|4,4,4) theissmallthe
48
Parameter estimation for IBM Model 2 Very similar to IBM Model 1: the model factors in the same way The only difference is that instead of uniform alignment probabilities, we use learned position- dependent probabilities (sometimes called distortion probabilities) Collect expected counts for lexical translation and distortion probabilities for the M-step
49
HMM Model
50
Generative process for HMM le=4 select a(1) with d(i|-1,4) 1 234 select a(2) with d(i|1,4) select a(3) with d(i|2,4) select a(4) with d(i|3,4) houseissmallthe Using
51
HMM alignment model Hidden Markov Model like ones for POS tagging with some differences The state space is the space of integers from 0 to source length It is globally conditioned on the source sentence 1 234 houseissmallthe
52
Parameter Estimation for HMM model
53
Local Optima in IBM-2 and HMM These models have multiple different local optima in the general case Good starting points are important for local search algorithms Initialize parameters of IBM-2 using a trained IBM-1 model Initialize HMM from IBM-1 or IBM-2 Such initialization schemes can have large impact on performance (some results later) See Och & Ney 03 [from optional word translation models readings] for more details
54
IBM Model 3 Motivation For the IBM models 1 and 2 the alignments of all target words are independent For HMM the alignment of a target word depends only on the alignment of the previous target word This may lead to situations where one source word is aligned to a large number of target words Because the model does not remember how many target words have already aligned to a source word Can’t encode a preference for one-to-one alignment IBM Model 3 adds the capability to keep track of the fertility of source words Counts how many target words a source word generates
55
IBM Model 3 generative process Marydidnotslapthegreenwitch 11211113 Marianounalaverdebrujadababofetadaa Marianounalaverdebrujadababofetadaa 11211113NULL For each target word placeholder, generate a target word given the aligned source word using t(e|f)
56
IBM 3 Probability Multiple ways to generate sentence e and alignment a given source sentence f Due to words with fertility >1 and unobserved source of inserted words slap 213 unadabaa unadababofetadaa 213NULL bofetada slap 213 dabaunaa dababofetadaa 213 NULL a
57
IBM Model 3 probability Sum up all ways to generate a target and alignment
58
Dependencies among hidden variables
59
IBM Model 4 & 5 Distortion model in IBM 3 is absolute Target position j depends only on corresponding source position i IBM 4 adds a relative distortion model, capturing the intuition that words move in groups (the placement of target words aligned to i depends on the placement of target words aligned to i-1). IBM 3 and IBM 4 are deficient Words in the target could get placed on top of each other with non- zero probability so some mass is lost IBM model 5 addresses the deficiency
60
IBM Model 4 Example
61
Word alignment evaluation
62
Evaluating IBM Models Can use them for translation But can also evaluate their performance on the derived word-to-word correspondences We will use this evaluation method to compare models Need manually defined (gold-standard) word alignments Need a measure to compare the model’s output to the gold standard
63
Evaluation of word alignment
64
Word Alignment Standards Can have many-to-many alignments; one source to several target and one target to several source.
65
Symmetrizing Word Alignments Because of the asymmetric nature of these models, performance can be improved by running in both direction and combining the alignments.
66
Symmetrizing Word Alignments Can also use union or selective union using a growing heuristic
67
Comparison of models on alignment Summary of model characteristics (from Och & Ney 03 )
68
Comparison of models on alignment AER of models (from Och & Ney 03) Model Training 0.5K 2K 8K 34K
69
Effect of Symmetrization Performance of models (from Och & Ney 03) Other improvements by Och & Ney: smoothing very important; adding a dictionary can help (see paper for more details)
70
Translation with word-based models
71
Using word-based models for translation Can use the word-based model directly More accurate if we use a noisy-channel model Can incorporate a target language model to improve fluency The target language model can be trained on monolingual data which we usually have much more of
72
Using word-based models for translation We have introduced a set of models that can be used to score candidate translations for a given source sentence Haven’t talked about how to find the best possible translation Will discuss it when we talk about decoding in phrase- based models In brief, decoding is very hard even for IBM-1
73
Summary Introduced word-based translation models The concept of alignment IBM-Model 1 (uniform alignment) IBM-Model 2 (absolute distortion model) HMM Model (relative distortion model) IBM-Model 3 (fertility and absolute distortion) IBM-Model 4 (fertility and relative distortion) IBM-Model 5 (like IBM-4 but fixes deficiency) Parameter estimation for word-based translation models Exact if we have strong independence assumptions for the hidden variables Approximate for models with fertility Use simpler models to initialize more complex ones and find good alignments Translation using a word-based model Noisy channel model allows the incorporation of a language model
74
Assignments and next time HW1 will be posted online tomorrow April 7 Will be due midnight PST on April 21 Next time Will give a brief overview of other word-alignment models (for paper presentation ideas) Will talk about phrase translation models Read Chapter 5 Finish reading Chapter 4
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.