Presentation is loading. Please wait.

Presentation is loading. Please wait.

Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September.

Similar presentations


Presentation on theme: "Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September."— Presentation transcript:

1 Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September 22, 2004

2 October 11, 2002AMTA 20022 dot = language

3 October 11, 2002AMTA 20023 Motivation Resource-poor scenarios -Indigenous communities have difficult access to crucial information that directly affects their life (such as land laws, health warnings, etc.) -Formalize a potentially endangered language Affordable technologies, such as -spell-checkers, -on-line dictionaries, -Machine Translation (MT) systems, -computer assisted tutoring

4 October 11, 2002AMTA 20024 AVENUE Partners LanguageCountryInstitutions Mapudungun (in place) Chile Universidad de la Frontera, Institute for Indigenous Studies, Ministry of Education Quechua (started) Peru Ministry of Education Iñupiaq (discussion) US (Alaska) Ilisagvik College, Barrow school district, Alaska Rural Systemic Initiative, Trans- Arctic and Antarctic Institute, Alaska Native Language Center Siona (discussion) Colombia OAS-CICAD, Plante, Department of the Interior

5 October 11, 2002AMTA 20025 Chile Official Language: Spanish Population: ~15 million ~1/2 million Mapuche people Language: Mapudungun Mapudungun for the Mapuche

6 October 11, 2002AMTA 20026 What’s Machine Translation (MT)? Japanese sentence Swahili sentence

7 October 11, 2002AMTA 20027 Speech to Speech MT

8 October 11, 2002AMTA 20028 Why Machine Translation for resource-poor (indigenous) languages? Commercial MT economically feasible for only a handful of major languages with large resources (corpora, human developers) Benefits include: –Better government access to indigenous communities (Epidemics, crop failures, etc.) –Better indigenous communities participation in information-rich activities (health care, education, government) without giving up their languages. –Language preservation –Civilian and military applications (disaster relief)

9 October 11, 2002AMTA 20029 MT for resource-poor languages: Challenges Minimal amount of parallel text (oral tradition) Possibly competing standards for orthography/spelling Often relatively few trained linguists Access to native informants possible Need to minimize development time and cost

10 October 11, 2002AMTA 200210 Interlingua Transfer rules Corpus-based methods analysis interpretation generation I saw you Yo vi tú Machine Translation Pyramid

11 October 11, 2002AMTA 200211 AVENUE MT system overview \spa Una mujer se quedó en casa \map Kie domo mlewey ruka mew \eng One woman stayed at home. {VP,3} VP::VP : [VP NP] -> [VP NP] ( (X1::Y1) (X2::Y2) ((x2 case) = acc) ((x0 obj) = x2) ((x0 agr) = (x1 agr)) (y2 == (y0 obj)) ((y0 tense) = (x0 tense)) ((y0 agr) = (y1 agr))) V::V |: [stayed] -> [quedó] ((X1::Y1) ((x0 form) = stay) ((x0 actform) = stayed) ((x0 tense) = past-pp) ((y0 agr pers) = 3) ((y0 agr num) = sg))

12 October 11, 2002AMTA 200212 Avenue MT system overview Learning Module Transfer Rules Lexical Resources Run Time Transfer System Lattice Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Handcrafted rules Morphology Morpho- logical analyzer

13 October 11, 2002AMTA 200213 Avenue overview: my research Learning Module Transfer Rules Lexical Resources Run Time Transfer System Lattice Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Handcrafted rules Morphology Morpho- logical analyzer

14 Interactive and Automatic Refinement of Translation Rules Or: How to recycle corrections of MT output back into the MT system by adjusting and adapting the grammar and lexical rules

15 October 11, 2002AMTA 200215 Error correction by non-expert bilingual users

16 October 11, 2002AMTA 200216 Interactive elicitation of MT errors Assumptions: non-expert bilingual users can reliably detect and minimally correct MT errors, given: –SL sentence (I saw you) –TL sentence (Yo vi tú) –word-to-word alignments (I-yo, saw-vi, you-tú) –(context) using an online GUI: the Translation Correction Tool (TCTool) Goal: simplify MT correction task maximally

17 October 11, 2002AMTA 200217 Translation Correction Tool Actions:

18 October 11, 2002AMTA 200218 SL + best TL picked by user

19 October 11, 2002AMTA 200219 Changing word order

20 October 11, 2002AMTA 200220 Changing “grande” into “gran”

21 October 11, 2002AMTA 200221

22 October 11, 2002AMTA 200222

23 October 11, 2002AMTA 200223 Automatic Rule Refinement Framework Find best RR operations given a: grammar (G), lexicon (L), (set of) source language sentence(s) (SL), (set of) target language sentence(s) (TL), its parse tree (P), and minimal correction of TL (TL’) such that TQ2 > TQ1 Which can also be expressed as: max TQ (TL|TL’,P,SL,RR(G,L))

24 October 11, 2002AMTA 200224 Types of RR operations Grammar: –R0  R0 + R1 [=R0’ + contr] Cov[R0]  Cov[R0,R1] –R0  R1 [=R0 + constr] Cov[R0]  Cov[R1] –R0  R1[=R0 + constr= -]  R2[=R0’ + constr=c +] Cov[R0]  Cov[R1,R2] Lexicon –Lex0  Lex0 + Lex1[=Lex0 + constr] –Lex0  Lex1[=Lex0 + constr] –Lex0  Lex1[  Lex0 +  TLword] –   Lex1 (adding lexical item)

25 October 11, 2002AMTA 200225 Questions & Discussion Thanks!

26 October 11, 2002AMTA 200226 Formalizing Error Information W i = error W i ’ = correction W c = clue word Example: SL: the red car - TL: *el auto roja  TL’: el auto rojo W i = roja W i ’ = rojo W c = auto

27 October 11, 2002AMTA 200227 Finding Triggering Features Once we have user’s correction (W i ’), we can compare it with W i at the feature level and find which is the triggering feature. If  set is empty, need to postulate a new binary feature  Delta function:


Download ppt "Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September."

Similar presentations


Ads by Google