Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bridging the Gap: Machine Translation for Lesser Resourced Languages

Similar presentations


Presentation on theme: "Bridging the Gap: Machine Translation for Lesser Resourced Languages"— Presentation transcript:

1 Bridging the Gap: Machine Translation for Lesser Resourced Languages
Christian Monson, Ariadna Font Llitjós, Lori Levin, Alon Lavie, Alison Alvarez, Roberto Aranovich, Jaime Carbonell, Robert Frederking, Erik Peterson, Kathrin Probst

2 Inupiaq Katrina Quechua Mapudungun
100’s of Speakers Katrina 100’s of Speakers Quechua 6 Million Speakers Mapudungun 900,000 Speakers

3 Machine Translation (MT)
Source Language Target Language

4 Machine Translation (MT)
Source Language Target Language Direct Statistical MT Example Based MT

5 Machine Translation (MT)
Transfer Rule Based MT Morphologial Analysis Syntactic Parsing Text Generation + Source Language Target Language Direct Statistical MT Example Based MT

6 Machine Translation (MT)
Interlingua Semantic Analysis Sentence Planning Transfer Rule Based MT Morphologial Analysis Syntactic Parsing Text Generation + Source Language Target Language Direct Statistical MT Example Based MT

7 Machine Translation (MT)
Interlingua + High quality - Expertise intensive development cycle Semantic Analysis Transfer Rule Based MT Morphologial Analysis Syntactic Parsing Text Generation + Source Language Target Language Direct Statistical MT Example Based MT

8 Machine Translation (MT)
Interlingua + Short development time - Requires large bilingual corpus Semantic Analysis Transfer Rule Based MT Morphologial Analysis Syntactic Parsing Text Generation + Source Language Target Language Direct Statistical MT Example Based MT

9 Machine Translation (MT)
Interlingua Semantic Analysis Our Approach Transfer Rule Based MT Morphologial Analysis Syntactic Parsing Text Generation + Source Language Target Language Direct Statistical MT Example Based MT

10 Machine Translation (MT)
Interlingua + High quality - Expertise intensive development cycle Semantic Analysis Transfer Rule Based MT Morphologial Analysis Syntactic Parsing Text Generation + Source Language Target Language Direct Statistical MT Example Based MT

11 Machine Translation (MT)
Interlingua + High quality - Expertise intensive development cycle Semantic Analysis Morphologial Analysis Syntactic Parsing Text Generation + Automate the development of deep-analysis MT Source Language Target Language

12 Our Position Linguistic Structure and Bilingual Informants
help automate the development of deep-analysis machine translation systems

13 Sub-Problems Morphology Induction Syntax Refinement

14 Morphology Induction 1. Linguistic Structure 2. Bilingual Informants

15 Morphology Induction 1. Linguistic Structure 2. Bilingual Informants

16 Paradigms Organize Morphology
Mapudungun Loc Asp pa tu pu ka Ø Hab Mode Report Pol / Mood Tense Obj Agr ke pe (ü)rke la a fi ki fu Ø nu afu Subj Agr / Mood (ü)n li chi yu

17 Paradigm Discovery in 3 Steps
Search out partial paradigms in a network of candidates Cluster overlapping partial paradigms Filter the clusters, keeping the largest clusters most likely to model true paradigms e.er.erá.ido.ieron.ió 28: deb, escog, ofrec, roconoc, vend, ... e.ido.ieron.ir.irá.ió 28: asist, dirig, exig, ocurr, sufr, ... e.erá.ido.ieron.ió 28: deb, escog, ... e.er.ido.ieron.ió 46: deb, parec, recog... e.ido.ieron.irá.ió 28: asist, dirig, ... e.ido.ieron.ir.ió 39: asist, bat, sal, ... e.er.erá.ieron.ió 32: deb, padec, romp, ... e.ido.ieron.ió 86: asist, deb, hund,... e.erá.ieron.ió 32: deb, padec, ... er.ido.ieron.ió 58: ascend, ejerc, recog, ... ido.ieron.ir.ió 44: interrump, sal, ... azar.e.ido.ieron.ir.ió 1: sal A portion of a Spanish paradigm candidate network

18 Morpho Challenge 2007 Unsupervised Morphology Induction Competition
English 3rd Place Overall Bested the Strong Baseline Morfessor (Creutz, 2006) German 1st Place when Combined with Morfessor

19 Morpho Challenge 2007 Unsupervised Morphology Induction Competition
English 3rd Place Overall Bested the Strong Baseline Morfessor (Creutz, 2006) German 1st Place when Combined with Morfessor No Mapudungun yet Agglutinative sequences of suffixes coming soon

20 Our Machine Translation Architecture
INPUT TEXT Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations

21 Our Machine Translation Architecture
INPUT TEXT Morphology Analysis Lexicon Morphology Analysis Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations

22 Our Machine Translation Architecture
INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Machine Translation System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations

23 Our Machine Translation Architecture
INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Machine Translation System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations Morphology Generation Lexicon Morphology Generation

24 Our Machine Translation Architecture
INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Machine Translation System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations Morphology Generation Lexicon Morphology Generation OUTPUT TEXT

25 Our Machine Translation Architecture
INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Machine Translation System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations Morphology Generation Lexicon Morphology Generation OUTPUT TEXT

26 Our Machine Translation Architecture
INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Machine Translation System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations Morphology Generation Lexicon Morphology Generation OUTPUT TEXT

27 Sub-Problems Morphology Induction Syntax Refinement

28 Syntax Refinement 1. Linguistic Structure 2. Bilingual Informants

29 Syntax Refinement 1. Linguistic Structure 2. Bilingual Informants

30 Linguistic Structure: Syntax
English I didn’t see Maria Mapudungun pelafiñ Maria Spanish No vi a María

31 Linguistic Structure: Syntax
English I didn’t see Maria Mapudungun pelafiñ Maria pe -la -fi -ñ Maria see -neg -3.obj -1.subj.indicative Maria Spanish No vi a María No vi a María neg see.1.subj.past.indicative acc Maria

32 pe-la-fi-ñ Maria V pe

33 pe-la-fi-ñ Maria V pe VSuff Negation = + la

34 pe-la-fi-ñ Maria V pe VSuffG Pass all features up VSuff la

35 pe-la-fi-ñ Maria V pe VSuffG VSuff object person = 3 VSuff fi la

36 pe-la-fi-ñ Maria V pe VSuffG Pass all features up from both children

37 pe-la-fi-ñ Maria V pe VSuffG VSuff person = 1 number = sg mood = ind

38 pe-la-fi-ñ Maria V VSuffG pe VSuffG VSuff
Pass all features up from both children VSuffG VSuff ñ VSuff fi la

39 pe-la-fi-ñ Maria Pass all features up from both children V Check that:
1) negation = + 2) tense is undefined V VSuffG pe VSuffG VSuff VSuffG VSuff ñ VSuff fi la

40 pe-la-fi-ñ Maria V NP V VSuffG N person = 3 number = sg human = + pe

41 pe-la-fi-ñ Maria S Check that NP is human = + Pass features up from V
VP V NP V VSuffG N pe VSuffG VSuff N VSuffG VSuff ñ Maria VSuff fi la

42 Transfer to Spanish: Top-Down
VP VP V NP V VSuffG N pe VSuffG VSuff N VSuffG VSuff ñ Maria VSuff fi la

43 Transfer to Spanish: Top-Down
Pass all features to Spanish side S S VP VP V NP V “a” NP V VSuffG N pe VSuffG VSuff N VSuffG VSuff ñ Maria VSuff fi la

44 Transfer to Spanish: Top-Down
Pass all features down VP VP V NP V “a” NP V VSuffG N pe VSuffG VSuff N VSuffG VSuff ñ Maria VSuff fi la

45 Transfer to Spanish: Top-Down
Pass object features down VP VP V NP V “a” NP V VSuffG N pe VSuffG VSuff N VSuffG VSuff ñ Maria VSuff fi la

46 Transfer to Spanish: Top-Down
VP VP V NP V “a” NP V VSuffG N Accusative marker on objects is introduced because human = + pe VSuffG VSuff N VSuffG VSuff ñ Maria VSuff fi la

47 Transfer to Spanish: Top-Down
VP VP::VP [VBar NP] -> [VBar "a" NP] ( (X1::Y1) (X2::Y3) ((X2 type) = (*NOT* personal)) ((X2 human) =c +) (X0 = X1) ((X0 object) = X2) (Y0 = X0) ((Y0 object) = (X0 object)) (Y1 = Y0) (Y3 = (Y0 object)) ((Y1 objmarker person) = (Y3 person)) ((Y1 objmarker number) = (Y3 number)) ((Y1 objmarker gender) = (Y3 gender))) VP V NP V “a” NP V VSuffG N pe VSuffG VSuff N VSuffG VSuff ñ Maria VSuff fi la

48 Transfer to Spanish: Top-Down
Pass person, number, and mood features to Spanish Verb VP VP V NP V “a” NP Assign tense = past V VSuffG N “no” V pe VSuffG VSuff N VSuffG VSuff ñ Maria VSuff fi la

49 Transfer to Spanish: Top-Down
VP VP V NP V “a” NP V VSuffG N “no” V pe VSuffG VSuff N VSuffG VSuff ñ Maria Introduced because negation = + VSuff fi la

50 Transfer to Spanish: Top-Down
VP VP V NP V “a” NP V VSuffG N “no” V pe VSuffG VSuff N ver VSuffG VSuff ñ Maria VSuff fi la

51 Transfer to Spanish: Top-Down
VP VP V NP V “a” NP V VSuffG N “no” V pe VSuffG VSuff N ver vi VSuffG VSuff ñ Maria person = 1 number = sg mood = indicative tense = past VSuff fi la

52 Transfer to Spanish: Top-Down
Pass features over to Spanish side VP VP V NP V “a” NP V VSuffG N “no” V N pe VSuffG VSuff N vi N VSuffG VSuff ñ Maria María VSuff fi la

53 I didn’t see Maria S S VP VP V NP V “a” NP V VSuffG N “no” V N pe
vi N VSuffG VSuff ñ Maria María VSuff fi la

54 Syntax Refinement 1. Linguistic Structure 2. Bilingual Informants

55 Syntax Refinement Architecture
INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Run-Time MT System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations Morphology Generation Lexicon Morphology Generation OUTPUT TEXT

56 Syntax Refinement Architecture
INPUT TEXT Rule Refinement Grammar & Lexicon Morphology Analysis Online Translation Correction Tool Run-Time MT System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations Morphology Generation OUTPUT TEXT

57 Syntax Refinement Architecture
INPUT TEXT Rule Refinement Grammar & Lexicon Morphology Analysis Online Translation Correction Tool Run-Time MT System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations

58 Syntax Refinement Architecture
INPUT TEXT Rule Refinement Grammar & Lexicon Morphology Analysis Online Translation Correction Tool Run-Time MT System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations Morphologhy Generation OUTPUT TEXT

59 Children played a game Translation Correction Tool (TCTool): online GUI to elicit correction of MT output from non-expert bilingual speakers

60

61

62 The children played a game

63 Refining the Grammar S NP VP N VP N PolP NP niños V Det N V un N
jugaron juego

64 Refining the Grammar los S NP VP N VP N PolP NP niños V Det N V un N
jugaron juego

65 Refining the Grammar los S NP VP N VP N PolP NP niños V Det N V un N
jugaron juego

66 Syntax Refinement Summary
Increases translation quality on unseen data English-Spanish experiments (Font Llitjós et al, 2007, MT Summit) Generalizes to a Mapudungun-Spanish machine translation system Today I’ve shown you an example of grammar expansion, but the ARR can also automatically augment the lexicon (see paper).

67 Overall Summary Linguistic Structure and Bilingual Informants
help automate the development of deep-analysis machine translation systems: Morphology Induction Syntax Refinement

68 Thank You!


Download ppt "Bridging the Gap: Machine Translation for Lesser Resourced Languages"

Similar presentations


Ads by Google