Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, 2009 11/20/20141.

Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, 2009 11/20/20141

Overview Brief recap on SMT & morphological analysis Motivation Enriched translation model – Twin phrase-table construction – Merging phrase tables Experiments Conclusion 11/20/20142

SMT overview – alignment Parallel data 11/20/2014 These are, first and foremost, messages of concern at the economic and social problems that we are experiencing, in spite of a period of sustained growth stemming from years of efforts by all our fellow citizens. Ensinnäkin kohtaamiemme taloudellisten ja sosiaalisten vaikeuksien vuoksi on havaittavissa huolestumista, vaikka kasvu on kestävällä pohjalla ja tulosta vuosien ponnisteluista, kaikkien kansalaistemme taholta. Alignment: one-to-many (1-M) Marianodabaunabotefadaalabrujaverde NULLMarydidnotslapthegreenwitch Source Target 3

SMT overview – translation model Intersect alignment 1-M + M-1  M – M Extracting phrases from M-M alignment  translation model (phrase table). 11/20/2014 problems ||| ongelmat ||| 0.372611 0.597858 0.114146 0.13882 2.718 problems ||| ongelmasta ||| 0.352941 0.423077 0.000836237 0.0012435 2.718 … problems ||| vaikeuksista ||| 0.0696946 0.105991 0.0124042 0.0130002 2.718 problems ||| vaikeuksien ||| 0.0410959 0.062069 0.000836237 0.0010174 2.718 Phrase penalty Translation probabilities English eForeign f 4 Lexical probabilities

Recap - Morphological analysis Morpheme: minimal meaning-bearing unit English: machine + s, present + ed, etc. Finnish: oppositio + kansa + n + edusta + ja = opposition of parliament member Morfessor (Creutz & Lagus, 2007): segment words, unsupervised manner un/PRE + fortunate/STM + ly/SUF 11/20/20145

Motivation Problem: – Multiple word forms in morphology-complex language, e.g. ongelmat, ongelmasta, etc. – Rare words often occur and are hard to align  incorrect entries in normal (word-align) phrase table. Solution: – Construct morpheme-align phrase table (PT) to aggregate better statistics for rare words. – Combine word- and morpheme-align PTs to produce even better translation model in a proper way. 11/20/20146

Twin phrase-table (PT) construction 11/20/2014 GIZA++ Decoding Word alignment Morpheme alignment WordMorpheme PT m PT wm Phrase Extraction PT w Morphological segmentation Phrase Extraction GIZA++ PT merging problem/STM+ s/SUF ||| ongelma/STM+ t/SUF problem/STM+ s/SUF ||| vaikeu/STM+ ksi/SUF+ sta/SUF problems ||| vaikeuksista 8

Existing PT-merging methods Add-feature - (Nakov, 2008; Chen et. al. 2009): F1 = F2 = F3 =  heuristic-driven Interpolation - (Wu & Wang, 2007) : – tran(f|e) = α * tran 1 (f|e) + (1- α) * tran 2 (f|e) – lex(f|e) = β * lex 1 (f|e) + (1- β) * lex 2 (f|e)  not consider score “meaning” 11/20/2014 1 if from 1 st PT 0.5 otherwise 1 if from 2 nd PT 0.5 otherwise 1 if from both PTs 0.5 otherwise 9

Our merging method – normalizing translation probabilities tran 1 (e|f) =count 1 (e, f) / ∑ e count 1 (e, f) tran 2 (e|f) =count 2 (e, f) / ∑ e count 2 (e, f) 11/20/201410 problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE

Our merging method – normalizing translation probabilities tran(vaikeuksista | problems) =1/2=0.5 tran(ongelmasta | problems) =1/2=0.5 tran(ongelmat | problems) = 3/4 = 0.75 tran(vaikeuksista | problems) = 1/4 = 0.25 Undesired translation! tran(vaikeuksista | problems) = (0.5 + 0.25)/2 = 0.375 tran(ongelmat | problems) = (0 + 0.75)/2 = 0.375 tran(ongelmasta | problems) = (0.5 + 0)/2 = 0.25 Interpolation (ratio = 0.5) 11/20/201411 problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE

Our merging method – normalizing translation probabilities tran 1 (e|f) =count 1 (e, f) / ∑ e count 1 (e, f) tran 2 (e|f) =count 2 (e, f) / ∑ e count 2 (e, f) 11/20/201412 Normalization tran(e|f) =[ count 1 (e, f) + count 2 (e, f)] / [ ∑ e count 1 (e, f) + ∑ e count 2 (e, f) ] problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE

Our merging method – normalizing translation probabilities tran(vaikeuksista | problems) =1/2=0.5 tran(ongelmasta | problems) =1/2=0.5 tran(ongelmat | problems) = 3/4 = 0.75 tran(vaikeuksista | problems) = 1/4 = 0.25 tran(vaikeuksista | problems) = (1 + 1)/(2+4) = 0.33 tran(ongelmat | problems) = (0 + 3)/(2 + 4) = 0.5 tran(ongelmasta | problems) = (1 + 0)/(2 + 4) = 0.17 Desired translation! Normalization 11/20/201413 problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE

Experiments – dataset 2005 ACL shared task (Koehn & Monz, 2005) 11/20/201416

Experiments – baselines w-system: uses PT w translate at word-level m-system: uses PT m translate at morpheme-level m-BLEU: BLEU where each token unit is a morpheme 11/20/201417

Experiments – our system Improvements over m-system and w-system are statistically significant using sign test by (Collins et al. 2005) 11/20/201418

Conclusion Our contributions: Enrich the translation model without using additional data. Propose a principal way to merge phrase tables generated at different granularities. 11/20/201419

Q & A Thank you !!! 11/20/201420

Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, 2009 11/20/20141.

Similar presentations

Presentation on theme: "Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, 2009 11/20/20141."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, 2009 11/20/20141.

Similar presentations

Presentation on theme: "Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, 2009 11/20/20141."— Presentation transcript:

Similar presentations

About project

Feedback