Download presentation
Presentation is loading. Please wait.
Published byDrusilla Dixon Modified over 8 years ago
1
Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September 22, 2004
2
October 11, 2002AMTA 20022 dot = language
3
October 11, 2002AMTA 20023 Motivation Resource-poor scenarios -Indigenous communities have difficult access to crucial information that directly affects their life (such as land laws, health warnings, etc.) -Formalize a potentially endangered language Affordable technologies, such as -spell-checkers, -on-line dictionaries, -Machine Translation (MT) systems, -computer assisted tutoring
4
October 11, 2002AMTA 20024 AVENUE Partners LanguageCountryInstitutions Mapudungun (in place) Chile Universidad de la Frontera, Institute for Indigenous Studies, Ministry of Education Quechua (started) Peru Ministry of Education Iñupiaq (discussion) US (Alaska) Ilisagvik College, Barrow school district, Alaska Rural Systemic Initiative, Trans- Arctic and Antarctic Institute, Alaska Native Language Center Siona (discussion) Colombia OAS-CICAD, Plante, Department of the Interior
5
October 11, 2002AMTA 20025 Chile Official Language: Spanish Population: ~15 million ~1/2 million Mapuche people Language: Mapudungun Mapudungun for the Mapuche
6
October 11, 2002AMTA 20026 What’s Machine Translation (MT)? Japanese sentence Swahili sentence
7
October 11, 2002AMTA 20027 Speech to Speech MT
8
October 11, 2002AMTA 20028 Why Machine Translation for resource-poor (indigenous) languages? Commercial MT economically feasible for only a handful of major languages with large resources (corpora, human developers) Benefits include: –Better government access to indigenous communities (Epidemics, crop failures, etc.) –Better indigenous communities participation in information-rich activities (health care, education, government) without giving up their languages. –Language preservation –Civilian and military applications (disaster relief)
9
October 11, 2002AMTA 20029 MT for resource-poor languages: Challenges Minimal amount of parallel text (oral tradition) Possibly competing standards for orthography/spelling Often relatively few trained linguists Access to native informants possible Need to minimize development time and cost
10
October 11, 2002AMTA 200210 Interlingua Transfer rules Corpus-based methods analysis interpretation generation I saw you Yo vi tú Machine Translation Pyramid
11
October 11, 2002AMTA 200211 AVENUE MT system overview \spa Una mujer se quedó en casa \map Kie domo mlewey ruka mew \eng One woman stayed at home. {VP,3} VP::VP : [VP NP] -> [VP NP] ( (X1::Y1) (X2::Y2) ((x2 case) = acc) ((x0 obj) = x2) ((x0 agr) = (x1 agr)) (y2 == (y0 obj)) ((y0 tense) = (x0 tense)) ((y0 agr) = (y1 agr))) V::V |: [stayed] -> [quedó] ((X1::Y1) ((x0 form) = stay) ((x0 actform) = stayed) ((x0 tense) = past-pp) ((y0 agr pers) = 3) ((y0 agr num) = sg))
12
October 11, 2002AMTA 200212 Avenue MT system overview Learning Module Transfer Rules Lexical Resources Run Time Transfer System Lattice Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Handcrafted rules Morphology Morpho- logical analyzer
13
October 11, 2002AMTA 200213 Avenue overview: my research Learning Module Transfer Rules Lexical Resources Run Time Transfer System Lattice Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Handcrafted rules Morphology Morpho- logical analyzer
14
Interactive and Automatic Refinement of Translation Rules Or: How to recycle corrections of MT output back into the MT system by adjusting and adapting the grammar and lexical rules
15
October 11, 2002AMTA 200215 Error correction by non-expert bilingual users
16
October 11, 2002AMTA 200216 Interactive elicitation of MT errors Assumptions: non-expert bilingual users can reliably detect and minimally correct MT errors, given: –SL sentence (I saw you) –TL sentence (Yo vi tú) –word-to-word alignments (I-yo, saw-vi, you-tú) –(context) using an online GUI: the Translation Correction Tool (TCTool) Goal: simplify MT correction task maximally
17
October 11, 2002AMTA 200217 Translation Correction Tool Actions:
18
October 11, 2002AMTA 200218 SL + best TL picked by user
19
October 11, 2002AMTA 200219 Changing word order
20
October 11, 2002AMTA 200220 Changing “grande” into “gran”
21
October 11, 2002AMTA 200221
22
October 11, 2002AMTA 200222
23
October 11, 2002AMTA 200223 Automatic Rule Refinement Framework Find best RR operations given a: grammar (G), lexicon (L), (set of) source language sentence(s) (SL), (set of) target language sentence(s) (TL), its parse tree (P), and minimal correction of TL (TL’) such that TQ2 > TQ1 Which can also be expressed as: max TQ (TL|TL’,P,SL,RR(G,L))
24
October 11, 2002AMTA 200224 Types of RR operations Grammar: –R0 R0 + R1 [=R0’ + contr] Cov[R0] Cov[R0,R1] –R0 R1 [=R0 + constr] Cov[R0] Cov[R1] –R0 R1[=R0 + constr= -] R2[=R0’ + constr=c +] Cov[R0] Cov[R1,R2] Lexicon –Lex0 Lex0 + Lex1[=Lex0 + constr] –Lex0 Lex1[=Lex0 + constr] –Lex0 Lex1[ Lex0 + TLword] – Lex1 (adding lexical item)
25
October 11, 2002AMTA 200225 Questions & Discussion Thanks!
26
October 11, 2002AMTA 200226 Formalizing Error Information W i = error W i ’ = correction W c = clue word Example: SL: the red car - TL: *el auto roja TL’: el auto rojo W i = roja W i ’ = rojo W c = auto
27
October 11, 2002AMTA 200227 Finding Triggering Features Once we have user’s correction (W i ’), we can compare it with W i at the feature level and find which is the triggering feature. If set is empty, need to postulate a new binary feature Delta function:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.