Download presentation
Presentation is loading. Please wait.
Published byEleanor Sherman Modified over 9 years ago
1
Stephan Vogel - Machine Translation1 Statistical Machine Translation Word Alignment Stephan Vogel MT Class Spring Semester 2011
2
Stephan Vogel - Machine Translation2 Overview lWord alignment – some observations lModels IBM2 and IBM1: 0 th -order position model lHMM alignment model: 1 st -order position model lIBM3: fertility lIBM4: plus relative distortion
3
Stephan Vogel - Machine Translation3 Alignment Example Observations: lMostly 1-1 lSome 1-to-many lSome 1-to-nothing lOften monotone lNot always clear-cut lEnglish ‘eight’ is a time lGerman has ‘acht Uhr’ lCould also leave ‘Uhr’ unaligned
4
Stephan Vogel - Machine Translation4 Evaluating Alignment lGiven some manually aligned data (ref) and automatically aligned data (hyp) links can be lCorrect, i.e. link in hyp matches link in ref: true positive (tp) lWrong, i.e. link in hyp but not in ref: false positive (fp) lMissing, i.e. link in ref but not in hyp: false negative (fn) lEvaluation measures lPrecision: P = tp / (tp + fp) = correct / links_in_hyp lRecall: R = tp / (tp + fn) = correct / links_in_ref lAlignment Error Rate: AER = 1 – F = 1 – 2tp / (2tp +fp +fn)
5
Stephan Vogel - Machine Translation5 Sure and Possible Links lSometimes it is difficult for human annotators to decide lDifferentiate between sure and possible links lEn: Det Noun - Ch: Noun, don’t align Det, or align to NULL? lEn: Det Noun - Ar: DetNoun, should Det be aligned to DetNoun? lAlignment Error Rate with sure and possible links (Och 2000) lA = generated links lS = sure links (no finding a sure link is an error) lP = possible links (putting a link which is not possible is an error)
6
Stephan Vogel - Machine Translation6 Word Alignment Models lIBM1 – lexical probabilities only lIBM2 – lexicon plus absolut position lIBM3 – plus fertilities lIBM4 – inverted relative position alignment lIBM5 – non-deficient version of model 4 lHMM – lexicon plus relative position lBiBr – Bilingual Bracketing, lexical probabilites plus reordering via parallel segmentation lSyntactical alignment models [Brown et.al. 1993, Vogel et.al. 1996, Och et al 2000, Wu 1997, Yamada et al. 2003, and many others]
7
Stephan Vogel - Machine Translation7 GIZA++ Alignment Toolkit lAll standard alignment models (IBM1 … IBM5, HMM) are implemented in GIZA++ lThis toolkit was started (as GIZA) at John Hopkins University workshop 1998 lExtended and improved by Franz Josef Och lNow used by many groups lKnown problems: lMemory when training on large corpora lWrites many large files (depends on your parameter setting) lExtensions for large corpora (Qin Gao) lDistributed GIZA: run on many machines, I/O bound lMultithreaded GIZA: run on one machine, multiple cores
8
Stephan Vogel - Machine Translation8 Notation lSource language lf: source (French) word lJ: length of source sentence lj: position in source sentence (target position) l : source sentence lTarget language le: target (English) word lI: length of target sentence li: position in target sentence (source position) l : target sentence lAlignment: relation mapping source to target positions li=a j : position i of e i which is aligned to j l : whole alignment
9
Stephan Vogel - Machine Translation9 SMT - Principle lTranslate a ‘French’ string into an ‘English’ string lBayes’ decision rule for translation: lWhy this inversion of the translation direction? lDecomposition of dependencies: makes modeling easier lCooperation of two knowledge sources for final decision lNote: IBM paper and GIZA call e source and f target
10
Stephan Vogel - Machine Translation10 Alignment as Hidden Variable l‘Hidden alignments’ to capture word-to-word correspondences lMapping A subset of [1, …, J]x[1, …, I] lNumber of connections: J * I (each source word with each target word lNumber of alignments: 2 JI (each connection yes/no) lSummation over all alignments lTo many alignments, summation not feasible
11
Stephan Vogel - Machine Translation11 Restricted Alignment lEach source word has one connection lAlignment mapping becomes function: j -> i = a j lNumber of alignments is now: I J lSum over all alignments: lNot possible to enumerate lIn some situations full summation possible through Dynamic Programming lIn other situations: take only best alignment and perhaps some alignments close to the best one
12
Stephan Vogel - Machine Translation12 Empty Position (Null Word) lSometimes a word has no correspondence lAlignment function aligns each source word to one target word, i.e. cannot skip source word lSolution: lIntroduce empty position 0 with null word e 0 l‘Skip’ source word f j by aligning it to e 0 lTarget sentence is extended to: lAlignment is extended to:
13
Stephan Vogel - Machine Translation13 Translation Model lSum over all alignment l3 probability distributions: lLength: lAlignment: lLexicon:
14
Stephan Vogel - Machine Translation14 Model Assumptions Decompose interaction into pairwise dependencies lLength: Source length only dependent on target length (very weak) lAlignment: lZero order model: target position only dependent on source position lFirst order model: target position only dependent on previous target position lLexicon: source word only dependent on aligned word
15
Stephan Vogel - Machine Translation15 Mixture Model lInterpretation as mixture model by direct decomposition lAgain, simplifying model assumptions applied
16
Stephan Vogel - Machine Translation16 Training IBM2 lExpectation-Maximization (EM) Algorithm lDefine posterior weight (i.e. sum over column = 1) lLexicon probabilities lAlignment probabilities count how often word pairs are aligned Turn counts into probabilities
17
Stephan Vogel - Machine Translation17 IBM1 Model lAssume uniform probability for position alignment lAlignment probability lIn training: only collect counts for word pairs
18
Stephan Vogel - Machine Translation18 Training for IBM1 Model – Pseudo Code # Accumulation (over corpus) For each sentence pair For each source position j Sum = 0.0 For each target position i Sum += p(f j |e i ) For each target position i Count(f j,e i ) += p(f j |e i )/Sum # Re-estimate probabilities (over count table) For each target word e Sum = 0.0 For each source word f Sum += Count(f,e) For each source word f p(f|e) = Count(f,e)/Sum # Repeat for several iterations
19
Stephan Vogel - Machine Translation19 HMM Alignment Model lIdea: relative position model Source Target Entire word groups (phrases) are moved with respect to source position
20
Stephan Vogel - Machine Translation20 HMM Alignment lFirst order model: target position dependent on previous target position (captures movement of entire phrases) lAlignment probability: lMaximum approximation:
21
Stephan Vogel - Machine Translation21 Viterbi Training on HMM Model # Accumulation (over corpus) # find Viterbi path For each sentence pair For each source position j For each target position i P best = 0; t = p(f j |e i ) For each target position i’ P prev = P(j-1,i’) a = p(i|i’,I,J) P new = P prev *t*a if (P new > P best ) P best = P new BackPointer(j,i) = i’ # update counts i = argmax{ BackPointer( J, I ) } For each j from J downto 1 Count(f_j, e_i)++ Count(i,iprev,I,J)++ i = BackPoint(j,i) # renormalize … P prev a = p(i | i’,I,J) t = p(f j | e i ) P new =P prev *a*t
22
Stephan Vogel - Machine Translation22 HMM Forward-Backward Training lGamma : Probability to emit f j when in state i in sentence s lSum over all paths through (j,i) j i
23
Stephan Vogel - Machine Translation23 HMM Forward-Backward Training lEpsilon: Probability to transit from state i’ into i lSum over all paths through (j-1,i’) and (j,i), emitting f j j-1 i j
24
Stephan Vogel - Machine Translation24 Forward Probabilities lDefined as: lRecursion: lInitial condition: j i
25
Stephan Vogel - Machine Translation25 Backward Probabilities lDefined as: lRecursion: lInitial condition: j i
26
Stephan Vogel - Machine Translation26 Forward-Backward lCalculate Gamma and Epsilon with Alpha and Beta: lGammas: lEpsilons:
27
Stephan Vogel - Machine Translation27 Parameter Re-Estimation lLexicon probabilities lAlignment probabilities:
28
Stephan Vogel - Machine Translation28 Forward-Backward Training – Pseudo Code # Accumulation For each sentence-pair { Forward. (Calculate Alpha’s) Backward. (Calculate Beta’s) Calculate Xi’s and Gamma’s. For each source word { Increase LexiconCount(f_j|e_i) by Gamma(j,i). Increase AlignCount(i|i’) by Epsilon(j,i,i’). } # Update Normalize LexiconCount to get P(f_j|e_i). Normalize AlignCount to get P(i|i’).
29
Stephan Vogel - Machine Translation29 Example HMM Training
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.