Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, /20/20141
Overview Brief recap on SMT & morphological analysis Motivation Enriched translation model – Twin phrase-table construction – Merging phrase tables Experiments Conclusion 11/20/20142
SMT overview – alignment Parallel data 11/20/2014 These are, first and foremost, messages of concern at the economic and social problems that we are experiencing, in spite of a period of sustained growth stemming from years of efforts by all our fellow citizens. Ensinnäkin kohtaamiemme taloudellisten ja sosiaalisten vaikeuksien vuoksi on havaittavissa huolestumista, vaikka kasvu on kestävällä pohjalla ja tulosta vuosien ponnisteluista, kaikkien kansalaistemme taholta. Alignment: one-to-many (1-M) Marianodabaunabotefadaalabrujaverde NULLMarydidnotslapthegreenwitch Source Target 3
SMT overview – translation model Intersect alignment 1-M + M-1 M – M Extracting phrases from M-M alignment translation model (phrase table). 11/20/2014 problems ||| ongelmat ||| problems ||| ongelmasta ||| … problems ||| vaikeuksista ||| problems ||| vaikeuksien ||| Phrase penalty Translation probabilities English eForeign f 4 Lexical probabilities
Recap - Morphological analysis Morpheme: minimal meaning-bearing unit English: machine + s, present + ed, etc. Finnish: oppositio + kansa + n + edusta + ja = opposition of parliament member Morfessor (Creutz & Lagus, 2007): segment words, unsupervised manner un/PRE + fortunate/STM + ly/SUF 11/20/20145
Motivation Problem: – Multiple word forms in morphology-complex language, e.g. ongelmat, ongelmasta, etc. – Rare words often occur and are hard to align incorrect entries in normal (word-align) phrase table. Solution: – Construct morpheme-align phrase table (PT) to aggregate better statistics for rare words. – Combine word- and morpheme-align PTs to produce even better translation model in a proper way. 11/20/20146
Overview Brief recap on SMT & morphological analysis Motivation Enriched translation model – Twin phrase-table construction – Merging phrase tables Experiments Conclusion 11/20/20147
Twin phrase-table (PT) construction 11/20/2014 GIZA++ Decoding Word alignment Morpheme alignment WordMorpheme PT m PT wm Phrase Extraction PT w Morphological segmentation Phrase Extraction GIZA++ PT merging problem/STM+ s/SUF ||| ongelma/STM+ t/SUF problem/STM+ s/SUF ||| vaikeu/STM+ ksi/SUF+ sta/SUF problems ||| vaikeuksista 8
Existing PT-merging methods Add-feature - (Nakov, 2008; Chen et. al. 2009): F1 = F2 = F3 = heuristic-driven Interpolation - (Wu & Wang, 2007) : – tran(f|e) = α * tran 1 (f|e) + (1- α) * tran 2 (f|e) – lex(f|e) = β * lex 1 (f|e) + (1- β) * lex 2 (f|e) not consider score “meaning” 11/20/ if from 1 st PT 0.5 otherwise 1 if from 2 nd PT 0.5 otherwise 1 if from both PTs 0.5 otherwise 9
Our merging method – normalizing translation probabilities tran 1 (e|f) =count 1 (e, f) / ∑ e count 1 (e, f) tran 2 (e|f) =count 2 (e, f) / ∑ e count 2 (e, f) 11/20/ problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE
Our merging method – normalizing translation probabilities tran(vaikeuksista | problems) =1/2=0.5 tran(ongelmasta | problems) =1/2=0.5 tran(ongelmat | problems) = 3/4 = 0.75 tran(vaikeuksista | problems) = 1/4 = 0.25 Undesired translation! tran(vaikeuksista | problems) = ( )/2 = tran(ongelmat | problems) = ( )/2 = tran(ongelmasta | problems) = ( )/2 = 0.25 Interpolation (ratio = 0.5) 11/20/ problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE
Our merging method – normalizing translation probabilities tran 1 (e|f) =count 1 (e, f) / ∑ e count 1 (e, f) tran 2 (e|f) =count 2 (e, f) / ∑ e count 2 (e, f) 11/20/ Normalization tran(e|f) =[ count 1 (e, f) + count 2 (e, f)] / [ ∑ e count 1 (e, f) + ∑ e count 2 (e, f) ] problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE
Our merging method – normalizing translation probabilities tran(vaikeuksista | problems) =1/2=0.5 tran(ongelmasta | problems) =1/2=0.5 tran(ongelmat | problems) = 3/4 = 0.75 tran(vaikeuksista | problems) = 1/4 = 0.25 tran(vaikeuksista | problems) = (1 + 1)/(2+4) = 0.33 tran(ongelmat | problems) = (0 + 3)/(2 + 4) = 0.5 tran(ongelmasta | problems) = (1 + 0)/(2 + 4) = 0.17 Desired translation! Normalization 11/20/ problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE
Our merging method – full lexical probability interpolation lex(vaikeuksista | problems) = w 1 lex(ongelmasta | problems) = w 2 lex(vaikeu + ksi + sta | problem + s) = m 1 lex(ongelma + t | problem + s) = m 3 lex(vaikeuksista | problems) = (w 1 + m 1 )/2 lex(ongelmat | problems) = (w 2 + 0)/2 lex(ongelmasta | problems) = (0 + m 3 ) /2 Normal Interpolation (ratio = 0.5) Missing interpolated probabilities ! PT m lexical model P(vaikeuksista|problems) P(ongelmasta|problems) P(vaikeu|problem), P(ongelma|problem), P(t|s), P(ksi|s),P(sta|s) 11/20/ PT w lexical model Estimate lex(ongelma + sta | problem + s) using PT m lexical model m 2 Estimate lex(ongelmat | problems) using PT w lexical model w 3 Full Interpolation
Overview Brief recap on SMT & morphological analysis Motivation Enriched translation model – Twin phrase-table construction – Merging phrase tables Experiments Conclusion 11/20/201415
Experiments – dataset 2005 ACL shared task (Koehn & Monz, 2005) 11/20/201416
Experiments – baselines w-system: uses PT w translate at word-level m-system: uses PT m translate at morpheme-level m-BLEU: BLEU where each token unit is a morpheme 11/20/201417
Experiments – our system Improvements over m-system and w-system are statistically significant using sign test by (Collins et al. 2005) 11/20/201418
Conclusion Our contributions: Enrich the translation model without using additional data. Propose a principal way to merge phrase tables generated at different granularities. 11/20/201419
Q & A Thank you !!! 11/20/201420