Download presentation
Presentation is loading. Please wait.
Published byNeil Neve Modified over 10 years ago
1
Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, 2009 11/20/20141
2
Overview Brief recap on SMT & morphological analysis Motivation Enriched translation model – Twin phrase-table construction – Merging phrase tables Experiments Conclusion 11/20/20142
3
SMT overview – alignment Parallel data 11/20/2014 These are, first and foremost, messages of concern at the economic and social problems that we are experiencing, in spite of a period of sustained growth stemming from years of efforts by all our fellow citizens. Ensinnäkin kohtaamiemme taloudellisten ja sosiaalisten vaikeuksien vuoksi on havaittavissa huolestumista, vaikka kasvu on kestävällä pohjalla ja tulosta vuosien ponnisteluista, kaikkien kansalaistemme taholta. Alignment: one-to-many (1-M) Marianodabaunabotefadaalabrujaverde NULLMarydidnotslapthegreenwitch Source Target 3
4
SMT overview – translation model Intersect alignment 1-M + M-1 M – M Extracting phrases from M-M alignment translation model (phrase table). 11/20/2014 problems ||| ongelmat ||| 0.372611 0.597858 0.114146 0.13882 2.718 problems ||| ongelmasta ||| 0.352941 0.423077 0.000836237 0.0012435 2.718 … problems ||| vaikeuksista ||| 0.0696946 0.105991 0.0124042 0.0130002 2.718 problems ||| vaikeuksien ||| 0.0410959 0.062069 0.000836237 0.0010174 2.718 Phrase penalty Translation probabilities English eForeign f 4 Lexical probabilities
5
Recap - Morphological analysis Morpheme: minimal meaning-bearing unit English: machine + s, present + ed, etc. Finnish: oppositio + kansa + n + edusta + ja = opposition of parliament member Morfessor (Creutz & Lagus, 2007): segment words, unsupervised manner un/PRE + fortunate/STM + ly/SUF 11/20/20145
6
Motivation Problem: – Multiple word forms in morphology-complex language, e.g. ongelmat, ongelmasta, etc. – Rare words often occur and are hard to align incorrect entries in normal (word-align) phrase table. Solution: – Construct morpheme-align phrase table (PT) to aggregate better statistics for rare words. – Combine word- and morpheme-align PTs to produce even better translation model in a proper way. 11/20/20146
7
Overview Brief recap on SMT & morphological analysis Motivation Enriched translation model – Twin phrase-table construction – Merging phrase tables Experiments Conclusion 11/20/20147
8
Twin phrase-table (PT) construction 11/20/2014 GIZA++ Decoding Word alignment Morpheme alignment WordMorpheme PT m PT wm Phrase Extraction PT w Morphological segmentation Phrase Extraction GIZA++ PT merging problem/STM+ s/SUF ||| ongelma/STM+ t/SUF problem/STM+ s/SUF ||| vaikeu/STM+ ksi/SUF+ sta/SUF problems ||| vaikeuksista 8
9
Existing PT-merging methods Add-feature - (Nakov, 2008; Chen et. al. 2009): F1 = F2 = F3 = heuristic-driven Interpolation - (Wu & Wang, 2007) : – tran(f|e) = α * tran 1 (f|e) + (1- α) * tran 2 (f|e) – lex(f|e) = β * lex 1 (f|e) + (1- β) * lex 2 (f|e) not consider score “meaning” 11/20/2014 1 if from 1 st PT 0.5 otherwise 1 if from 2 nd PT 0.5 otherwise 1 if from both PTs 0.5 otherwise 9
10
Our merging method – normalizing translation probabilities tran 1 (e|f) =count 1 (e, f) / ∑ e count 1 (e, f) tran 2 (e|f) =count 2 (e, f) / ∑ e count 2 (e, f) 11/20/201410 problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE
11
Our merging method – normalizing translation probabilities tran(vaikeuksista | problems) =1/2=0.5 tran(ongelmasta | problems) =1/2=0.5 tran(ongelmat | problems) = 3/4 = 0.75 tran(vaikeuksista | problems) = 1/4 = 0.25 Undesired translation! tran(vaikeuksista | problems) = (0.5 + 0.25)/2 = 0.375 tran(ongelmat | problems) = (0 + 0.75)/2 = 0.375 tran(ongelmasta | problems) = (0.5 + 0)/2 = 0.25 Interpolation (ratio = 0.5) 11/20/201411 problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE
12
Our merging method – normalizing translation probabilities tran 1 (e|f) =count 1 (e, f) / ∑ e count 1 (e, f) tran 2 (e|f) =count 2 (e, f) / ∑ e count 2 (e, f) 11/20/201412 Normalization tran(e|f) =[ count 1 (e, f) + count 2 (e, f)] / [ ∑ e count 1 (e, f) + ∑ e count 2 (e, f) ] problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE
13
Our merging method – normalizing translation probabilities tran(vaikeuksista | problems) =1/2=0.5 tran(ongelmasta | problems) =1/2=0.5 tran(ongelmat | problems) = 3/4 = 0.75 tran(vaikeuksista | problems) = 1/4 = 0.25 tran(vaikeuksista | problems) = (1 + 1)/(2+4) = 0.33 tran(ongelmat | problems) = (0 + 3)/(2 + 4) = 0.5 tran(ongelmasta | problems) = (1 + 0)/(2 + 4) = 0.17 Desired translation! Normalization 11/20/201413 problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE
14
Our merging method – full lexical probability interpolation lex(vaikeuksista | problems) = w 1 lex(ongelmasta | problems) = w 2 lex(vaikeu + ksi + sta | problem + s) = m 1 lex(ongelma + t | problem + s) = m 3 lex(vaikeuksista | problems) = (w 1 + m 1 )/2 lex(ongelmat | problems) = (w 2 + 0)/2 lex(ongelmasta | problems) = (0 + m 3 ) /2 Normal Interpolation (ratio = 0.5) Missing interpolated probabilities ! PT m lexical model P(vaikeuksista|problems) P(ongelmasta|problems) P(vaikeu|problem), P(ongelma|problem), P(t|s), P(ksi|s),P(sta|s) 11/20/201414 PT w lexical model Estimate lex(ongelma + sta | problem + s) using PT m lexical model m 2 Estimate lex(ongelmat | problems) using PT w lexical model w 3 Full Interpolation
15
Overview Brief recap on SMT & morphological analysis Motivation Enriched translation model – Twin phrase-table construction – Merging phrase tables Experiments Conclusion 11/20/201415
16
Experiments – dataset 2005 ACL shared task (Koehn & Monz, 2005) 11/20/201416
17
Experiments – baselines w-system: uses PT w translate at word-level m-system: uses PT m translate at morpheme-level m-BLEU: BLEU where each token unit is a morpheme 11/20/201417
18
Experiments – our system Improvements over m-system and w-system are statistically significant using sign test by (Collins et al. 2005) 11/20/201418
19
Conclusion Our contributions: Enrich the translation model without using additional data. Propose a principal way to merge phrase tables generated at different granularities. 11/20/201419
20
Q & A Thank you !!! 11/20/201420
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.