Download presentation
Presentation is loading. Please wait.
Published byAlexis Barton Modified over 8 years ago
1
Bridging the Gap: Machine Translation for Lesser Resourced Languages
Christian Monson, Ariadna Font Llitjós, Lori Levin, Alon Lavie, Alison Alvarez, Roberto Aranovich, Jaime Carbonell, Robert Frederking, Erik Peterson, Kathrin Probst
2
Inupiaq Katrina Quechua Mapudungun
100’s of Speakers Katrina 100’s of Speakers Quechua 6 Million Speakers Mapudungun 900,000 Speakers
3
Machine Translation (MT)
Source Language Target Language
4
Machine Translation (MT)
Source Language Target Language Direct Statistical MT Example Based MT
5
Machine Translation (MT)
Transfer Rule Based MT Morphologial Analysis Syntactic Parsing Text Generation + Source Language Target Language Direct Statistical MT Example Based MT
6
Machine Translation (MT)
Interlingua Semantic Analysis Sentence Planning Transfer Rule Based MT Morphologial Analysis Syntactic Parsing Text Generation + Source Language Target Language Direct Statistical MT Example Based MT
7
Machine Translation (MT)
Interlingua + High quality - Expertise intensive development cycle Semantic Analysis Transfer Rule Based MT Morphologial Analysis Syntactic Parsing Text Generation + Source Language Target Language Direct Statistical MT Example Based MT
8
Machine Translation (MT)
Interlingua + Short development time - Requires large bilingual corpus Semantic Analysis Transfer Rule Based MT Morphologial Analysis Syntactic Parsing Text Generation + Source Language Target Language Direct Statistical MT Example Based MT
9
Machine Translation (MT)
Interlingua Semantic Analysis Our Approach Transfer Rule Based MT Morphologial Analysis Syntactic Parsing Text Generation + Source Language Target Language Direct Statistical MT Example Based MT
10
Machine Translation (MT)
Interlingua + High quality - Expertise intensive development cycle Semantic Analysis Transfer Rule Based MT Morphologial Analysis Syntactic Parsing Text Generation + Source Language Target Language Direct Statistical MT Example Based MT
11
Machine Translation (MT)
Interlingua + High quality - Expertise intensive development cycle Semantic Analysis Morphologial Analysis Syntactic Parsing Text Generation + Automate the development of deep-analysis MT Source Language Target Language
12
Our Position Linguistic Structure and Bilingual Informants
help automate the development of deep-analysis machine translation systems
13
Sub-Problems Morphology Induction Syntax Refinement
14
Morphology Induction 1. Linguistic Structure 2. Bilingual Informants
15
Morphology Induction 1. Linguistic Structure 2. Bilingual Informants
16
Paradigms Organize Morphology
Mapudungun Loc Asp pa tu pu ka Ø Hab Mode Report Pol / Mood Tense Obj Agr ke pe (ü)rke la a fi ki fu Ø nu afu Subj Agr / Mood (ü)n li chi yu …
17
Paradigm Discovery in 3 Steps
Search out partial paradigms in a network of candidates Cluster overlapping partial paradigms Filter the clusters, keeping the largest clusters most likely to model true paradigms e.er.erá.ido.ieron.ió 28: deb, escog, ofrec, roconoc, vend, ... e.ido.ieron.ir.irá.ió 28: asist, dirig, exig, ocurr, sufr, ... e.erá.ido.ieron.ió 28: deb, escog, ... e.er.ido.ieron.ió 46: deb, parec, recog... e.ido.ieron.irá.ió 28: asist, dirig, ... e.ido.ieron.ir.ió 39: asist, bat, sal, ... e.er.erá.ieron.ió 32: deb, padec, romp, ... e.ido.ieron.ió 86: asist, deb, hund,... e.erá.ieron.ió 32: deb, padec, ... er.ido.ieron.ió 58: ascend, ejerc, recog, ... ido.ieron.ir.ió 44: interrump, sal, ... azar.e.ido.ieron.ir.ió 1: sal A portion of a Spanish paradigm candidate network
18
Morpho Challenge 2007 Unsupervised Morphology Induction Competition
English 3rd Place Overall Bested the Strong Baseline Morfessor (Creutz, 2006) German 1st Place when Combined with Morfessor
19
Morpho Challenge 2007 Unsupervised Morphology Induction Competition
English 3rd Place Overall Bested the Strong Baseline Morfessor (Creutz, 2006) German 1st Place when Combined with Morfessor No Mapudungun yet Agglutinative sequences of suffixes coming soon
20
Our Machine Translation Architecture
INPUT TEXT Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations
21
Our Machine Translation Architecture
INPUT TEXT Morphology Analysis Lexicon Morphology Analysis Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations
22
Our Machine Translation Architecture
INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Machine Translation System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations
23
Our Machine Translation Architecture
INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Machine Translation System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations Morphology Generation Lexicon Morphology Generation
24
Our Machine Translation Architecture
INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Machine Translation System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations Morphology Generation Lexicon Morphology Generation OUTPUT TEXT
25
Our Machine Translation Architecture
INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Machine Translation System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations Morphology Generation Lexicon Morphology Generation OUTPUT TEXT
26
Our Machine Translation Architecture
INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Machine Translation System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations Morphology Generation Lexicon Morphology Generation OUTPUT TEXT
27
Sub-Problems Morphology Induction Syntax Refinement
28
Syntax Refinement 1. Linguistic Structure 2. Bilingual Informants
29
Syntax Refinement 1. Linguistic Structure 2. Bilingual Informants
30
Linguistic Structure: Syntax
English I didn’t see Maria Mapudungun pelafiñ Maria Spanish No vi a María
31
Linguistic Structure: Syntax
English I didn’t see Maria Mapudungun pelafiñ Maria pe -la -fi -ñ Maria see -neg -3.obj -1.subj.indicative Maria Spanish No vi a María No vi a María neg see.1.subj.past.indicative acc Maria
32
pe-la-fi-ñ Maria V pe
33
pe-la-fi-ñ Maria V pe VSuff Negation = + la
34
pe-la-fi-ñ Maria V pe VSuffG Pass all features up VSuff la
35
pe-la-fi-ñ Maria V pe VSuffG VSuff object person = 3 VSuff fi la
36
pe-la-fi-ñ Maria V pe VSuffG Pass all features up from both children
37
pe-la-fi-ñ Maria V pe VSuffG VSuff person = 1 number = sg mood = ind
38
pe-la-fi-ñ Maria V VSuffG pe VSuffG VSuff
Pass all features up from both children VSuffG VSuff ñ VSuff fi la
39
pe-la-fi-ñ Maria Pass all features up from both children V Check that:
1) negation = + 2) tense is undefined V VSuffG pe VSuffG VSuff VSuffG VSuff ñ VSuff fi la
40
pe-la-fi-ñ Maria V NP V VSuffG N person = 3 number = sg human = + pe
41
pe-la-fi-ñ Maria S Check that NP is human = + Pass features up from V
VP V NP V VSuffG N pe VSuffG VSuff N VSuffG VSuff ñ Maria VSuff fi la
42
Transfer to Spanish: Top-Down
VP VP V NP V VSuffG N pe VSuffG VSuff N VSuffG VSuff ñ Maria VSuff fi la
43
Transfer to Spanish: Top-Down
Pass all features to Spanish side S S VP VP V NP V “a” NP V VSuffG N pe VSuffG VSuff N VSuffG VSuff ñ Maria VSuff fi la
44
Transfer to Spanish: Top-Down
Pass all features down VP VP V NP V “a” NP V VSuffG N pe VSuffG VSuff N VSuffG VSuff ñ Maria VSuff fi la
45
Transfer to Spanish: Top-Down
Pass object features down VP VP V NP V “a” NP V VSuffG N pe VSuffG VSuff N VSuffG VSuff ñ Maria VSuff fi la
46
Transfer to Spanish: Top-Down
VP VP V NP V “a” NP V VSuffG N Accusative marker on objects is introduced because human = + pe VSuffG VSuff N VSuffG VSuff ñ Maria VSuff fi la
47
Transfer to Spanish: Top-Down
VP VP::VP [VBar NP] -> [VBar "a" NP] ( (X1::Y1) (X2::Y3) ((X2 type) = (*NOT* personal)) ((X2 human) =c +) (X0 = X1) ((X0 object) = X2) (Y0 = X0) ((Y0 object) = (X0 object)) (Y1 = Y0) (Y3 = (Y0 object)) ((Y1 objmarker person) = (Y3 person)) ((Y1 objmarker number) = (Y3 number)) ((Y1 objmarker gender) = (Y3 gender))) VP V NP V “a” NP V VSuffG N pe VSuffG VSuff N VSuffG VSuff ñ Maria VSuff fi la
48
Transfer to Spanish: Top-Down
Pass person, number, and mood features to Spanish Verb VP VP V NP V “a” NP Assign tense = past V VSuffG N “no” V pe VSuffG VSuff N VSuffG VSuff ñ Maria VSuff fi la
49
Transfer to Spanish: Top-Down
VP VP V NP V “a” NP V VSuffG N “no” V pe VSuffG VSuff N VSuffG VSuff ñ Maria Introduced because negation = + VSuff fi la
50
Transfer to Spanish: Top-Down
VP VP V NP V “a” NP V VSuffG N “no” V pe VSuffG VSuff N ver VSuffG VSuff ñ Maria VSuff fi la
51
Transfer to Spanish: Top-Down
VP VP V NP V “a” NP V VSuffG N “no” V pe VSuffG VSuff N ver vi VSuffG VSuff ñ Maria person = 1 number = sg mood = indicative tense = past VSuff fi la
52
Transfer to Spanish: Top-Down
Pass features over to Spanish side VP VP V NP V “a” NP V VSuffG N “no” V N pe VSuffG VSuff N vi N VSuffG VSuff ñ Maria María VSuff fi la
53
I didn’t see Maria S S VP VP V NP V “a” NP V VSuffG N “no” V N pe
vi N VSuffG VSuff ñ Maria María VSuff fi la
54
Syntax Refinement 1. Linguistic Structure 2. Bilingual Informants
55
Syntax Refinement Architecture
INPUT TEXT Morphology Analysis Lexicon Grammar & Lexicon Morphology Analysis Run-Time MT System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations Morphology Generation Lexicon Morphology Generation OUTPUT TEXT
56
Syntax Refinement Architecture
INPUT TEXT Rule Refinement Grammar & Lexicon Morphology Analysis Online Translation Correction Tool Run-Time MT System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations Morphology Generation OUTPUT TEXT
57
Syntax Refinement Architecture
INPUT TEXT Rule Refinement Grammar & Lexicon Morphology Analysis Online Translation Correction Tool Run-Time MT System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations
58
Syntax Refinement Architecture
INPUT TEXT Rule Refinement Grammar & Lexicon Morphology Analysis Online Translation Correction Tool Run-Time MT System Finish feedback loop Given an arbitrary small set of linguistic resources, for example a small grammar and a small lexicon, if we add a RR component at the end of our Translation process, we can use bilingual speaker feedback to AUGMENT and IMPROVE the initial resources (G and L). The approach I am proposing can be generalized to any rule-based system. We chose to implement our work on this system developed at CMU Propagate corrections to the underlying representations that produce translations Morphologhy Generation OUTPUT TEXT
59
Children played a game Translation Correction Tool (TCTool): online GUI to elicit correction of MT output from non-expert bilingual speakers
62
The children played a game
63
Refining the Grammar S NP VP N VP N PolP NP niños V Det N V un N
jugaron juego
64
Refining the Grammar los S NP VP N VP N PolP NP niños V Det N V un N
jugaron juego
65
Refining the Grammar los S NP VP N VP N PolP NP niños V Det N V un N
jugaron juego
66
Syntax Refinement Summary
Increases translation quality on unseen data English-Spanish experiments (Font Llitjós et al, 2007, MT Summit) Generalizes to a Mapudungun-Spanish machine translation system Today I’ve shown you an example of grammar expansion, but the ARR can also automatically augment the lexicon (see paper).
67
Overall Summary Linguistic Structure and Bilingual Informants
help automate the development of deep-analysis machine translation systems: Morphology Induction Syntax Refinement
68
Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.