Presentation is loading. Please wait.

Presentation is loading. Please wait.

AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students.

Similar presentations


Presentation on theme: "AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students."— Presentation transcript:

1 AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students and Staff: Erik Peterson, Christian Monson, Ariadna Font Llitjós, Alison Alvarez, Roberto Aranovich, Rodolfo Vega

2 Mar 1, 2006AVENUE/LETRAS2 Outline Scientific Objectives Framework Overview Learning Morphology Elicitation Learning Transfer Rules Automatic Rule Refinement Language Prototypes New Directions

3 Mar 1, 2006AVENUE/LETRAS3 Why Machine Translation for Languages with Limited Resources? We are in the age of information explosion –The internet+web+Google  anyone can get the information they want anytime… But what about the text in all those other languages? –How do they read all this English stuff? –How do we read all the stuff that they put online? MT for these languages would Enable: –Better government access to native indigenous and minority communities –Better minority and native community participation in information-rich activities (health care, education, government) without giving up their languages. –Civilian and military applications (disaster relief) –Language preservation

4 Mar 1, 2006AVENUE/LETRAS4 The Roadmap to Learning-based MT Automatic acquisition of necessary language resources and knowledge using machine learning methodologies A framework for integrating the acquired MT resources into effective MT prototype systems Effective integration of acquired knowledge with statistical/distributional information

5 Mar 1, 2006AVENUE/LETRAS5 CMU’s AVENUE Approach Elicitation: use bilingual native informants to produce a small high-quality word-aligned bilingual corpus of translated phrases and sentences Transfer-rule Learning: apply ML-based methods to automatically acquire syntactic transfer rules for translation between the two languages –Learn from major language to minor language –Translate from minor language to major language XFER + Decoder: –XFER engine produces a lattice of possible transferred structures at all levels –Decoder searches and selects the best scoring combination Rule Refinement: automatically refine and correct the acquired transfer rules via a process of interaction with bilingual informants which help the system identify translation errors Morphology Learning: unsupervised learning of morpheme structure of words based on their organization into paradigms and distributional information

6 Mar 1, 2006AVENUE/LETRAS6 AVENUE MT Approach Interlingua Syntactic Parsing Semantic Analysis Sentence Planning Text Generation Source (e.g. Quechua) Target (e.g. English) Transfer Rules Direct: SMT, EBMT AVENUE: Automate Rule Learning

7 Mar 1, 2006AVENUE/LETRAS7 Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Morphology Analyzer Learning Module Handcrafted rules INPUT TEXT OUTPUT TEXT

8 Mar 1, 2006AVENUE/LETRAS8 Transfer Rule Formalism Type information Part-of-speech/constituent information Alignments x-side constraints y-side constraints xy-constraints, e.g. ((Y1 AGR) = (X1 AGR)) ; SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) )

9 Mar 1, 2006AVENUE/LETRAS9 Transfer Rule Formalism (II) Value constraints Agreement constraints ;SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) )

10 Mar 1, 2006AVENUE/LETRAS10 Transfer and Decoding Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Morphology Analyzer Learning Module Handcrafted rules INPUT TEXT OUTPUT TEXT

11 Mar 1, 2006AVENUE/LETRAS11 The Transfer Engine Analysis Source text is parsed into its grammatical structure. Determines transfer application ordering. Example: ראיתי את האיש הזקן ( I) saw *acc the man the old S VP V P NP D N D Adj ראיתי את האיש הזקן Transfer A target language tree is created by reordering, insertion, and deletion. S NP VP N V NP DET Adj N I saw the old man Source words translated with transfer lexicon. Generation Target language constraints are checked, target morphology applied, and final translation produced. E.g. “saw” in past tense selected. Final translation: “I saw the old man”

12 Mar 1, 2006AVENUE/LETRAS12 Symbolic Decoder System rarely finds a full parse/transfer for complete input sentence XFER engine produces comprehensive lattice of segment translations Decoder selects best combination of translation segments Search for optimal scoring path of partial translations, based on multiple features: –Target Language Model scores –XFER Rule Scores –Path Fragmentation –Other features… Symbolic decoding essential for scenarios where there is insufficient data for training large target LM –Effective Rule Scoring is crucial

13 Mar 1, 2006AVENUE/LETRAS13 Morphology Learning Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Morphology Analyzer Learning Module Handcrafted rules INPUT TEXT OUTPUT TEXT

14 Mar 1, 2006AVENUE/LETRAS14 The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun

15 Mar 1, 2006AVENUE/LETRAS15 The Challenge of Morphology Mapudungun -ke-fu-n-leAllkütu

16 Mar 1, 2006AVENUE/LETRAS16 The Challenge of Morphology Mapudungun -ke -past -fu -indic.1sg -n -habitual -le -prog. Allkütu Listen

17 Mar 1, 2006AVENUE/LETRAS17 The Challenge of Morphology Mapudungun -ke -past -fu -indic.1sg -n -habitual -le -prog. Allkütu Listen I

18 Mar 1, 2006AVENUE/LETRAS18 The Challenge of Morphology Mapudungun Iused to -ke -past -fu -indic.1sg -n -habitual -le -prog. Allkütu Listen

19 Mar 1, 2006AVENUE/LETRAS19 The Challenge of Morphology Mapudungun Iused tolisten -ke -past -fu -indic.1sg -n -habitual -le -prog. Allkütu Listen

20 Mar 1, 2006AVENUE/LETRAS20 The Challenge of Morphology Mapudungun Iused tolisten -ke -past -fu -indic.1sg -n -habitual -le -prog. Allkütu Listen Tasks for Morphology Segment Words Map Morphemes onto Features

21 Mar 1, 2006AVENUE/LETRAS21 The Challenge of Morphology Tasks for Morphology Segment Words Map Morphemes onto Features Learn these tasks –unsupervised –from data –for any language

22 Mar 1, 2006AVENUE/LETRAS22 Leverage the Natural Structure of Morphology Paradigm –Set of affixes that interchangeably attach to a set of stems Our Approach

23 Mar 1, 2006AVENUE/LETRAS23 Ø.s blame solve Example Vocabulary blame blamed blames roamed roaming roams solve solves solving Our Approach Leverage the Natural Structure of Morphology Paradigm –Set of affixes that interchangeably attach to a set of stems

24 Mar 1, 2006AVENUE/LETRAS24 Ø.s blame solve Ø.s.d blame Example Vocabulary blame blamed blames roamed roaming roams solve solves solving Our Approach Leverage the Natural Structure of Morphology Paradigm –Set of affixes that interchangeably attach to a set of stems

25 Mar 1, 2006AVENUE/LETRAS25 Ø.s blame solve Ø.s.d blame Example Vocabulary blame blamed blames roamed roaming roams solve solves solving Our Approach Leverage the Natural Structure of Morphology Paradigm –Set of affixes that interchangeably attach to a set of stems

26 Mar 1, 2006AVENUE/LETRAS26 Ø.s blame solve Ø.s.d blame s blame roam solve Example Vocabulary blame blamed blames roamed roaming roams solve solves solving Our Approach Leverage the Natural Structure of Morphology Paradigm –Set of affixes that interchangeably attach to a set of stems

27 Mar 1, 2006AVENUE/LETRAS27 Ø.s blame solve Ø.s.d blame s blame roam solve Example Vocabulary blame blamed blames roamed roaming roams solve solves solving Our Approach Leverage the Natural Structure of Morphology Paradigm –Set of affixes that interchangeably attach to a set of stems

28 Mar 1, 2006AVENUE/LETRAS28 Ø.s blame solve Ø.s.d blame s blame roam solve e.es blam solv Example Vocabulary blame blamed blames roamed roaming roams solve solves solving Our Approach

29 Mar 1, 2006AVENUE/LETRAS29 Ø.s blame solve Example Vocabulary blame blamed blames roamed roaming roams solve solves solving Ø.s.d blame s blame roam solve e.es blam solv Our Approach

30 Mar 1, 2006AVENUE/LETRAS30 e.es blam solv e.ed blam es blam solv Ø.s.d blame Ø.s blame solve Ø blame blames blamed roams roamed roaming solve solves solving e.es.ed blam ed blam roam d blame roame Ø.d blame s.d blame s blame roam solve es.ed blam e blam solv me.mes bla me.med bla mes bla me.mes.med bla med bla roa mes.med bla me bla

31 Mar 1, 2006AVENUE/LETRAS31 a.as.o.os 43 african, cas, jurídic, l,... a.as.o.os.tro 1 cas a.as.os 50 afectad, cas, jurídic, l,... a.as.o 59 cas, citad, jurídic, l,... a.o.os 105 impuest, indonesi, italian, jurídic,... a.as 199 huelg, incluid, industri, inundad,... a.os 134 impedid, impuest, indonesi, inundad,... as.os 68 cas, implicad, inundad, jurídic,... a.o 214 id, indi, indonesi, inmediat,... as.o 85 intern, jurídic, just, l,... a.tro 2 cas.cen a 1237 huelg, ib, id, iglesi,... as 404 huelg, huelguist, incluid, industri,... os 534 humorístic, human, hígad, impedid,... o 1139 hub, hug, human, huyend,... tro 16 catas, ce, cen, cua,... as.o.os 54 cas, implicad, jurídic, l,... o.os 268 human, implicad, indici, indocumentad,... Spanish Newswire Corpus 40,011 Tokens 6,975 Types 31

32 Mar 1, 2006AVENUE/LETRAS32 a.as.o.os 43 african, cas, jurídic, l,... a.as.o.os.tro 1 cas a.as.os 50 afectad, cas, jurídic, l,... a.as.o 59 cas, citad, jurídic, l,... a.o.os 105 impuest, indonesi, italian, jurídic,... a.as 199 huelg, incluid, industri, inundad,... a.os 134 impedid, impuest, indonesi, inundad,... as.os 68 cas, implicad, inundad, jurídic,... a.o 214 id, indi, indonesi, inmediat,... as.o 85 intern, jurídic, just, l,... a.tro 2 cas.cen a 1237 huelg, ib, id, iglesi,... as 404 huelg, huelguist, incluid, industri,... os 534 humorístic, human, hígad, impedid,... o 1139 hub, hug, human, huyend,... tro 16 catas, ce, cen, cua,... as.o.os 54 cas, implicad, jurídic, l,... o.os 268 human, implicad, indici, indocumentad,... 32 Suffixes Stems Level 5 = 5 suffixes Stem Type Count

33 Mar 1, 2006AVENUE/LETRAS33 a.as.o.os 43 african, cas, jurídic, l,... Adjective Paradigm 33 a.as.o.os.tro 1 cas a.tro 2 cas.cen tro 16 catas, ce, cen, cua,... a.as.os 50 afectad, cas, jurídic, l,... a.as.o 59 cas, citad, jurídic, l,... a.o.os 105 impuest, indonesi, italian, jurídic,... a.as 199 huelg, incluid, industri, inundad,... a.os 134 impedid, impuest, indonesi, inundad,... as.os 68 cas, implicad, inundad, jurídic,... a.o 214 id, indi, indonesi, inmediat,... as.o 85 intern, jurídic, just, l,... a 1237 huelg, ib, id, iglesi,... as 404 huelg, huelguist, incluid, industri,... os 534 humorístic, human, hígad, impedid,... o 1139 hub, hug, human, huyend,... as.o.os 54 cas, implicad, jurídic, l,... o.os 268 human, implicad, indici, indocumentad,... From the spurious suffix “tro”

34 Mar 1, 2006AVENUE/LETRAS34 a.as.o.os.tro 1 cas a.tro 2 cas.cen tro 16 catas, ce, cen, cua,... a.as.o.os 43 african, cas, jurídic, l,... a.as.os 50 afectad, cas, jurídic, l,... a.as.o 59 cas, citad, jurídic, l,... a.o.os 105 impuest, indonesi, italian, jurídic,... a.as 199 huelg, incluid, industri, inundad,... a.os 134 impedid, impuest, indonesi, inundad,... as.os 68 cas, implicad, inundad, jurídic,... a.o 214 id, indi, indonesi, inmediat,... as.o 85 intern, jurídic, just, l,... a 1237 huelg, ib, id, iglesi,... as 404 huelg, huelguist, incluid, industri,... os 534 humorístic, human, hígad, impedid,... o 1139 hub, hug, human, huyend,... as.o.os 54 cas, implicad, jurídic, l,... o.os 268 human, implicad, indici, indocumentad,... 34 Decreasing Stem Count Increasing Suffix Count Basic Search Procedure

35 Mar 1, 2006AVENUE/LETRAS35 Examples and Evaluation of Automatically Selected Suffix Sets Ø.ba.n.ndo ada.adas.ado.ados.aron.ó a.aba.ado.ados.ar.ará.arán ada.ado.ados.ar.o a.aciones.ación.adas.ado.ar ado.adores.o a.ada.adas.ado.ar.ará ado.ados.arse.e a.adas.ado.an.ar ado.ar.aron.arse.ará a.ado.ados.ar.ó do.dos.ndo.r.ron a.ado.an.arse.ó e.ida.ido a.ado.aron.arse.ó emos.ido.ía.ían aba.ada.ado.ar.o.os ida.ido.idos.ir.ió aciones.ación.ado.ados ido.iendo.ir aciones.ado.ados.aráido.ir.ro ación.ado.an.e 35 Global Suffix Evaluation Precision:0.506 Recall:0.517 F1:0.511 Key Correct Wrong

36 Mar 1, 2006AVENUE/LETRAS36 Next Steps for Morphology Induction Improve the Quality of Induced Paradigms –Current Work Convert Paradigms into a Segmenter –Soon Learn Mappings from Morphemes to Features –Future Goal

37 Mar 1, 2006AVENUE/LETRAS37 Elicitation Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Morphology Analyzer Learning Module Handcrafted rules INPUT TEXT OUTPUT TEXT

38 Mar 1, 2006AVENUE/LETRAS38 Purpose of Elicitation Provide a small but highly targeted corpus of hand aligned data –To support machine learning from a small data set –To cover all basic morpho- syntactic phenomena. newpair srcsent: Tú caíste tgtsent: eymi ütrünagimi aligned: ((1,1),(2,2)) context: tú = Juan [masculino, 2a persona del singular] comment: You (John) fell newpair srcsent: Tú estás cayendo tgtsent: eymi petu ütünagimi aligned: ((1,1),(2 3,2 3)) context: tú = Juan [masculino, 2a persona del singular] comment: You (John) are falling newpair srcsent: Tú caíste tgtsent: eymi ütrunagimi aligned: ((1,1),(2,2)) context: tú = María [femenino, 2a persona del singular] comment: You (Mary) fell

39 Mar 1, 2006AVENUE/LETRAS39 Purpose of Elicitation To get data from someone who is –Bilingual –Literate –Not experienced with linguistics

40 Mar 1, 2006AVENUE/LETRAS40 English-Hindi Example

41 Mar 1, 2006AVENUE/LETRAS41 English-Chinese Example

42 Mar 1, 2006AVENUE/LETRAS42 English-Arabic Example

43 Mar 1, 2006AVENUE/LETRAS43 The Elicitation Tool has been used with these languages Mapudungun Hindi Hebrew Quechua Aymara Thai Japanese Chinese Dutch Arabic

44 Mar 1, 2006AVENUE/LETRAS44 Elicitation Corpus: List of Minimal Pairs of Sentences in a Major Language Eliciting from Spanish Canto Canté Estoy cantando Cantaste Eliciting from English I sing I sang I am singing You sang

45 Mar 1, 2006AVENUE/LETRAS45 AVENUE Elicitation Corpora The Functional-Typological Corpus –Designed to elicit elements of meaning that may have morpho-syntactic realization The Structural Elicitation Corpus –Based on sentence structures from the Penn TreeBank

46 Mar 1, 2006 The Process List of semantic features and values The Corpus Feature Maps: which combinations of features and values are of interest … Clause- Level Noun- Phrase Tense & Aspect Modality Feature Structure Sets Feature Specification Reverse Annotated Feature Structure Sets: add English sentences Smaller Corpus Sampling

47 Mar 1, 2006AVENUE/LETRAS47 Feature Structures srcsent: Mary was not a leader. context: Translate this as though it were spoken to a peer co- worker; ((actor ((np-function fn-actor)(np-animacy anim-human)(np- biological-gender bio-gender-female) (np-general-type proper-noun-type)(np-identifiability identifiable)(np- specificity specific)…)) (pred ((np-function fn-predicate-nominal)(np-animacy anim- human)(np-biological-gender bio-gender-female) (np- general-type common-noun-type)(np-specificity specificity- neutral)…)) (c-v-lexical-aspect state)(c-copula-type copula-role)(c-secondary-type secondary-copula)(c-solidarity solidarity-neutral) (c-v-grammatical- aspect gram-aspect-neutral)(c-v-absolute-tense past) (c-v-phase- aspect phase-aspect-neutral) (c-general-type declarative-clause)(c- polarity polarity-negative)(c-my-causer-intentionality intentionality- n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative- n/a)(c-our-boundary boundary-n/a)…)

48 Mar 1, 2006AVENUE/LETRAS48 Feature Specification Defines Features and their values Sets default values for features Specifies feature requirements and restrictions Written in XML

49 Mar 1, 2006AVENUE/LETRAS49 Feature Specification Feature: c-copula-type (a copula is a verb like “be”; some languages do not have copulas) Values copula-n/a Restrictions: 1. ~(c-secondary-type secondary-copula) Notes: copula-role Restrictions: 1. (c-secondary-type secondary-copula) Notes: 1. A role is something like a job or a function. "He is a teacher" "This is a vegetable peeler" copula-identity Restrictions: 1. (c-secondary-type secondary-copula) Notes: 1. "Clark Kent is Superman" "Sam is the teacher" copula-location Restrictions: 1. (c-secondary-type secondary-copula) Notes: 1. "The book is on the table" There is a long list of locative relations later in the feature specification. copula-description Restrictions: 1. (c-secondary-type secondary-copula) Notes: 1. A description is an attribute. "The children are happy." "The books are long."

50 Mar 1, 2006AVENUE/LETRAS50 Feature Maps Some features interact in the grammar –English –s reflects person and number of the subject and tense of the verb. –In expressing the English present progressive tense, the auxiliary verb is in a different place in a question and a statement: He is running. Is he running? We need to check many, but not all combinations of features and values. Using unlimited feature combinations leads to an unmanageable number of sentences

51 Mar 1, 2006AVENUE/LETRAS51

52 Mar 1, 2006AVENUE/LETRAS52 Evidentiality Map Lexical Aspect Assertiveness Polarity Source Tense Gram. Aspect activity-accomplishment Assertiveness-asserted, Assetiveness-neutral Polarity-positive, Polarity-negative Hearsay, quotative, inferred, assumption Visual, Auditory, non- visual-or-auditory PastPresent, FuturePastPresent Perfective, progressive, habitual, neutral habitual, neutral, progressive Perfective, progressive, habitual, neutral habitual, neutral, progressive

53 Mar 1, 2006AVENUE/LETRAS53 Current Work Navigation –Start: large search space of all possible feature combinations –Finish: each feature has been eliminated as irrelevant or has been explored –Goal: dynamically find the most efficient path through the search space for each language.

54 Mar 1, 2006AVENUE/LETRAS54 Current Work Feature Detection –Which features have an effect on morphosyntax? –What is the effect? –Drives the Navigation process

55 Mar 1, 2006AVENUE/LETRAS55 Feature Detection: Spanish The girl saw a red book. ((1,1)(2,2)(3,3)(4,4)(5,6)(6,5)) La niña vió un libro rojo A girl saw a red book ((1,1)(2,2)(3,3)(4,4)(5,6)(6,5)) Una niña vió un libro rojo I saw the red book ((1,1)(2,2)(3,3)(4,5)(5,4)) Yo vi el libro rojo I saw a red book. ((1,1)(2,2)(3,3)(4,5)(5,4)) Yo vi un libro rojo Feature: definiteness Values: definite, indefinite Function-of-*: subj, obj Marked-on-head-of-*: no Marked-on-dependent: yes Marked-on-governor: no Marked-on-other: no Add/delete-word: no Change-in-alignment: no

56 Mar 1, 2006AVENUE/LETRAS56 Feature Detection: Chinese A girl saw a red book. ((1,2)(2,2)(3,3)(3,4)(4,5)(5,6)(5,7)(6,8)) 有 一个 女人 看见 了 一本 红色 的 书 。 The girl saw a red book. ((1,1)(2,1)(3,3)(3,4)(4,5)(5,6)(6,7)) 女人 看见 了 一本 红色的 书 Feature: definiteness Values: definite, indefinite Function-of-*: subject Marked-on-head-of-*: no Marked-on-dependent: no Marked-on-governor: no Add/delete-word: yes Change-in-alignment: no

57 Mar 1, 2006AVENUE/LETRAS57 Feature Detection: Chinese I saw the red book ((1, 3)(2, 4)(2, 5)(4, 1)(5, 2)) 红色的 书, 我 看见 了 I saw a red book. ((1,1)(2,2)(2,3)(2, 4)(4,5)(5,6)) 我 看见 了 一本 红色的 书 。 Feature: definitenes Values: definite, indefinite Function-of-*: object Marked-on-head-of-*: no Marked-on-dependent: no Marked-on-governor: no Add/delete-word: yes Change-in-alignment: yes

58 Mar 1, 2006AVENUE/LETRAS58 Feature Detection: Hebrew A girl saw a red book. ((2,1) (3,2)(5,4)(6,3)) ילדה ראתה ספר אדום The girl saw a red book ((1,1)(2,1)(3,2)(5,4)(6,3)) הילדה ראתה ספר אדום I saw a red book. ((2,1)(4,3)(5,2)) ראיתי ספר אדום I saw the red book. ((2,1)(3,3)(3,4)(4,4)(5,3)) ראיתי את הספר האדום Feature: definiteness Values: definite, indefinite Function-of-*: subj, obj Marked-on-head-of-*: yes Marked-on-dependent: yes Marked-on-governor: no Add-word: no Change-in-alignment: no

59 Mar 1, 2006AVENUE/LETRAS59 Feature Detection Feeds into… Corpus Navigation: which minimal pairs to pursue next. –Don’t pursue gender in Mapudungun –Do pursue definiteness in Hebrew Morphology Learning: –Morphological learner identifies the forms of the morphemes –Feature detection identifies the functions Rule learning: –Rule learner will have to learn a constraint for each morpho- syntactic marker that is discovered E.g., Adjectives and nouns agree in gender, number, and definiteness in Hebrew.

60 Mar 1, 2006AVENUE/LETRAS60 Rule Learning Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Morphology Analyzer Learning Module Handcrafted rules INPUT TEXT OUTPUT TEXT

61 Mar 1, 2006AVENUE/LETRAS61 Rule Learning - Overview Goal: Acquire Syntactic Transfer Rules Use available knowledge from the major- language side (grammatical structure) Three steps: 1.Flat Seed Generation: first guesses at transfer rules; flat syntactic structure 2.Compositionality Learning: use previously learned rules to learn hierarchical structure 3.Constraint Learning: refine rules by learning appropriate feature constraints

62 Mar 1, 2006AVENUE/LETRAS62 Flat Seed Rule Generation Learning Example: NP Eng: the big apple Heb: ha-tapuax ha-gadol Generated Seed Rule: NP::NP [ART ADJ N]  [ART N ART ADJ] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))

63 Mar 1, 2006AVENUE/LETRAS63 Flat Seed Rule Generation Create a “flat” transfer rule specific to the sentence pair, partially abstracted to POS –Words that are aligned word-to-word and have the same POS in both languages are generalized to their POS –Words that have complex alignments (or not the same POS) remain lexicalized One seed rule for each translation example No feature constraints associated with seed rules (but mark the example(s) from which it was learned)

64 Mar 1, 2006AVENUE/LETRAS64 Compositionality Learning Initial Flat Rules: S::S [ART ADJ N V ART N]  [ART N ART ADJ V P ART N] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) (X4::Y5) (X5::Y7) (X6::Y8)) NP::NP [ART ADJ N]  [ART N ART ADJ] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2)) NP::NP [ART N]  [ART N] ((X1::Y1) (X2::Y2)) Generated Compositional Rule: S::S [NP V NP]  [NP V P NP] ((X1::Y1) (X2::Y2) (X3::Y4))

65 Mar 1, 2006AVENUE/LETRAS65 Compositionality Learning Detection: traverse the c-structure of the English sentence, add compositional structure for translatable chunks Generalization: adjust constituent sequences and alignments Two implemented variants: –Safe Compositionality: there exists a transfer rule that correctly translates the sub-constituent –Maximal Compositionality: Generalize the rule if supported by the alignments, even in the absence of an existing transfer rule for the sub-constituent

66 Mar 1, 2006AVENUE/LETRAS66 Constraint Learning Input: Rules and their Example Sets S::S [NP V NP]  [NP V P NP] {ex1,ex12,ex17,ex26} ((X1::Y1) (X2::Y2) (X3::Y4)) NP::NP [ART ADJ N]  [ART N ART ADJ] {ex2,ex3,ex13} ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2)) NP::NP [ART N]  [ART N] {ex4,ex5,ex6,ex8,ex10,ex11} ((X1::Y1) (X2::Y2)) Output: Rules with Feature Constraints: S::S [NP V NP]  [NP V P NP] ((X1::Y1) (X2::Y2) (X3::Y4) (X1 NUM = X2 NUM) (Y1 NUM = Y2 NUM) (X1 NUM = Y1 NUM))

67 Mar 1, 2006AVENUE/LETRAS67 Constraint Learning Goal: add appropriate feature constraints to the acquired rules Methodology: –Preserve general structural transfer –Learn specific feature constraints from example set Seed rules are grouped into clusters of similar transfer structure (type, constituent sequences, alignments) Each cluster forms a version space: a partially ordered hypothesis space with a specific and a general boundary The seed rules in a group form the specific boundary of a version space The general boundary is the (implicit) transfer rule with the same type, constituent sequences, and alignments, but no feature constraints

68 Mar 1, 2006AVENUE/LETRAS68 Rule Refinement Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Morphology Analyzer Learning Module Handcrafted rules INPUT TEXT OUTPUT TEXT

69 Mar 1, 2006AVENUE/LETRAS69 Interactive and Automatic Refinement of Translation Rules Problem: Improve Machine Translation quality. Proposed Solution: Put bilingual speakers back into the loop; use their corrections to detect the source of the error and automatically improve the lexicon and the grammar. Approach: Automate post-editing efforts by feeding them back into the MT system.  Automatic refinement of translation rules that caused an error beyond post-editing. Goal: Improve MT coverage and overall quality.

70 Mar 1, 2006AVENUE/LETRAS70 Technical Challenges Elicit minimal MT information from non-expert users Automatically Refine and Expand Translation Rules minimally Manually written Automatically Learned Automatic Evaluation of Refinement process

71 71 Error Typology for Automatic Rule Refinement (simplified) Missing word Extra word Wrong word order Incorrect word Wrong agreement Local vs Long distance Word vs. phrase + Word change Sense Form Selectional restrictions Idiom Missing constraint Extra constraint

72 Mar 1, 2006AVENUE/LETRAS72 TCTool (Demo)Demo Add a word Delete a word Modify a word Change word order Actions: Interactive elicitation of error information precisionrecall error detection90%89% error classification72%71%

73 Mar 1, 2006AVENUE/LETRAS73 1. Refine a translation rule: R0  R1 (change R0 to make it more specific or more general) Types of Refinement Operations Automatic Rule Adaptation R0: R1: NP DET N ADJ NP DET ADJ N a nice house una casa bonito NP DET N ADJ NP DET ADJ N a nice house una casa bonita N gender = ADJ gender

74 Mar 1, 2006AVENUE/LETRAS74 2. Bifurcate a translation rule: R0  R0 (same, general rule)  R1 (add a new more specific rule) Types of Refinement Operations Automatic Rule Adaptation R0: NP DET N ADJ NP DET ADJ N NP DET ADJ N NP DET ADJ N R1: a nice house una casa bonita a great artist un gran artista ADJ type: pre-nominal

75 AVENUE/LETRAS75 Error Information Elicitation Refinement Operation Typology Automatic Rule Adaptation Change word order SL: Gaudí was a great artist MT system output: TL: Gaudí era un artista grande Ucorrection: *Gaudí era un artista grande Gaudí era un gran artista A concrete example clue word error correction

76 76 Finding Triggering Feature(s):  (error word, corrected word ) =   need to postulate a new binary feature: feat1 Blame assignment (from MT system output) tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> Automatic Rule Adaptation S,1 … NP,1 … NP,8 … Grammar ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc))

77 Mar 1, 2006AVENUE/LETRAS77 Refining Rules Bifurcate NP,8  NP,8 (R0) + NP,8’ (R1) (flip order of ADJ-N) {NP,8’} NP::NP : [DET ADJ N] -> [DET ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2 agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + )) Automatic Rule Adaptation

78 Mar 1, 2006AVENUE/LETRAS78 Refining Lexical Entries ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = -)) ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = +)) Automatic Rule Adaptation

79 Mar 1, 2006AVENUE/LETRAS79 Evaluating Improvement Automatic Rule Adaptation -Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present: -Corrected Translation Sentence -Original Translation Sentence (labelled as incorrect by the user) un artista gran un gran artista un grande artista *un artista grande

80 Mar 1, 2006AVENUE/LETRAS80 Evaluating Improvement Automatic Rule Adaptation -Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present: -Corrected Translation Sentence -Original Translation Sentence (labelled as incorrect by the user) *un artista gran un gran artista *un grande artista *un artista grande

81 Mar 1, 2006AVENUE/LETRAS81 Challenges and future work Credit and Blame assignment from TCTool Log Files and Xfer engine’s trace Order of corrections matters ~ explore rule interactions Explore the space between batch mode and fully interactive system Online TCTool always running to collect corrections from bilingual speakers  make it into a game with rewards for the best users

82 Mar 1, 2006AVENUE/LETRAS82 AVENUE Prototypes General XFER framework under development for past three years Prototype systems so far: –German-to-English, Dutch-to-English –Chinese-to-English –Hindi-to-English –Hebrew-to-English In progress or planned: –Mapudungun-to-Spanish –Quechua-to-Spanish –Native Alaskan languages (Inupiaq) to English –Native-Bolivian languages (Aymara) to Spanish –Native-Brazilian languages to Brazilian Portuguese

83 Mar 1, 2006AVENUE/LETRAS83 Mapudungun Indigenous Language of Chile and Argentina ~ 1 Million Mapuche Speakers

84 Mar 1, 2006AVENUE/LETRAS84 Collaboration Mapuche Language Experts –Universidad de la Frontera (UFRO) Instituto de Estudios Indígenas (IEI) –Institute for Indigenous Studies Chilean Funding –Chilean Ministry of Education (Mineduc) Bilingual and Multicultural Education Program Eliseo Cañulef Rosendo Huisca Hugo Carrasco Hector Painequeo Flor Caniupil Luis Caniupil Huaiquiñir Marcela Collio Calfunao Cristian Carrillan Anton Salvador Cañulef Carolina Huenchullan Arrúe Claudio Millacura Salas

85 Mar 1, 2006AVENUE/LETRAS85 Accomplishments Corpora Collection –Spoken Corpus Collected: Luis Caniupil Huaiquiñir Medical Domain 3 of 4 Mapudungun Dialects –120 hours of Nguluche –30 hours of Lafkenche –20 hours of Pwenche Transcribed in Mapudungun Translated into Spanish –Written Corpus ~ 200,000 words Bilingual Mapudungun – Spanish Historical and newspaper text nmlch-nmjm1_x_0405_nmjm_00: M: no pütokovilu kay ko C: no, si me lo tomaba con agua M: chumgechi pütokoki femuechi pütokon pu C: como se debe tomar, me lo tomé pués nmlch-nmjm1_x_0406_nmlch_00: M: Chengewerkelafuymiürke C: Ya no estabas como gente entonces!

86 Mar 1, 2006AVENUE/LETRAS86 Accomplishments Developed At UFRO –Bilingual Dictionary with Examples 1,926 entries –Spelling Corrected Mapudungun Word List 117,003 fully-inflected word forms –Segmented Word List 15,120 forms Stems translated into Spanish

87 Mar 1, 2006AVENUE/LETRAS87 Accomplishments Developed at LTI using Mapudungun language resources from UFRO –Spelling Checker Integrated into OpenOffice –Hand-built Morphological Analyzer –Prototype Machine Translation Systems Rule-Based Example-Based –Website: LenguasAmerindias.org

88 Mar 1, 2006AVENUE/LETRAS88 Quechua  Spanish MT V-Unit: funded Summer project in Cusco (Peru) June-August 2005 [preparations and data collection started earlier] Intensive Quechua course in Centro Bartolome de las Casas (CBC) Worked together with two Quechua native and one non-native speakers on developing infrastructure (correcting elicited translations, segmenting and translating list of most frequent words)

89 Mar 1, 2006AVENUE/LETRAS89 Quechua  Spanish Prototype MT System Stem Lexicon (semi-automatically generated): 753 lexical entries Suffix lexicon: 21 suffixes –(150 Cusihuaman) Quechua morphology analyzer 25 translation rules Spanish morphology generation module User-Studies: 10 sentences, 3 users (2 native, 1 non-native)

90 Mar 1, 2006AVENUE/LETRAS90 Challenges for Hebrew MT Paucity in existing language resources for Hebrew –No publicly available broad coverage morphological analyzer –No publicly available bilingual lexicons or dictionaries –No POS-tagged corpus or parse tree-bank corpus for Hebrew –No large Hebrew/English parallel corpus Scenario well suited for CMU transfer-based MT framework for languages with limited resources

91 Mar 1, 2006AVENUE/LETRAS91 Hebrew Morphology Example Input word: B$WRH 0 1 2 3 4 |--------B$WRH--------| |-----B-----|$WR|--H--| |--B--|-H--|--$WRH---|

92 Mar 1, 2006AVENUE/LETRAS92 Hebrew Morphology Example Y0: ((SPANSTART 0) Y1: ((SPANSTART 0) Y2: ((SPANSTART 1) (SPANEND 4) (SPANEND 2) (SPANEND 3) (LEX B$WRH) (LEX B) (LEX $WR) (POS N) (POS PREP)) (POS N) (GEN F) (GEN M) (NUM S) (NUM S) (STATUS ABSOLUTE)) (STATUS ABSOLUTE)) Y3: ((SPANSTART 3) Y4: ((SPANSTART 0) Y5: ((SPANSTART 1) (SPANEND 4) (SPANEND 1) (SPANEND 2) (LEX $LH) (LEX B) (LEX H) (POS POSS)) (POS PREP)) (POS DET)) Y6: ((SPANSTART 2) Y7: ((SPANSTART 0) (SPANEND 4) (SPANEND 4) (LEX $WRH) (LEX B$WRH) (POS N) (POS LEX)) (GEN F) (NUM S) (STATUS ABSOLUTE))

93 Mar 1, 2006AVENUE/LETRAS93 Sample Output (dev-data) maxwell anurpung comes from ghana for israel four years ago and since worked in cleaning in hotels in eilat a few weeks ago announced if management club hotel that for him to leave israel according to the government instructions and immigration police in a letter in broken english which spread among the foreign workers thanks to them hotel for their hard work and announced that will purchase for hm flight tickets for their countries from their money

94 Mar 1, 2006AVENUE/LETRAS94 Future Research Directions Automatic Transfer Rule Learning: –In the “large-data” scenario: from large volumes of uncontrolled parallel text automatically word-aligned –In the absence of morphology or POS annotated lexica –Learning mappings for non-compositional structures –Effective models for rule scoring for Decoding: using scores at runtime Pruning the large collections of learned rules –Learning Unification Constraints Integrated Xfer Engine and Decoder –Improved models for scoring tree-to-tree mappings, integration with LM and other knowledge sources in the course of the search

95 Mar 1, 2006AVENUE/LETRAS95 Future Research Directions Automatic Rule Refinement Morphology Learning Feature Detection and Corpus Navigation Prototypes for New Languages

96 Mar 1, 2006AVENUE/LETRAS96 Publications 2005, Carbonell, J. G., A. Lavie, L. Levin and A. Black, "Language Technologies for Humanitarian Aid". In Technology for Humanitarian Action, K. M. Cahill (ed.), pp. 111- 138, Fordham University Press, ISBN 0-8232-2393-0, 2005.Carbonell, J. G., A. Lavie, L. Levin and A. Black, "Language Technologies for Humanitarian Aid" 2005. Font Llitjós, A., R. Aranovich and L. Levin. "Building Machine translation systems for indigenous languages". Second Conference on the Indigenous Languages of Latin America (CILLA II), 27-29 October 2005, Texas, USA."Building Machine translation systems for indigenous languages" 2005, Font-Llitjos, A., J.G. Carbonell and A. Lavie. "A Framework for Interactive and Automatic Refinement of Transfer-based Machine Translation". In Proceedings of the 10th Annual Conference of the European Association for Machine Translation (EAMT-2005), Budapest, Hungary, May 2005.Font-Llitjos, A., J.G. Carbonell and A. Lavie. "A Framework for Interactive and Automatic Refinement of Transfer-based Machine Translation" 2004, Lavie, A., S. Wintner, Y. Eytani, E. Peterson and K. Probst. "Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System". In Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-2004), Baltimore, MD, October 2004. Pages 1-10.Lavie, A., S. Wintner, Y. Eytani, E. Peterson and K. Probst. "Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System" 2004, Probst, K. and A. Lavie. "A Structurally Diverse Minimal Corpus for Eliciting Structural Mappings between Languages". In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (AMTA-2004), Washington, DC, September 2004.Probst, K. and A. Lavie. "A Structurally Diverse Minimal Corpus for Eliciting Structural Mappings between Languages"

97 Mar 1, 2006AVENUE/LETRAS97 Publications 2004. Font Llitjós, A., K. Probst and J.G. Carbonell. "Error Analysis of Two Types of Grammar for the Purpose of Automatic Rule Refinement". In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (AMTA-2004), Washington, DC, September 2004."Error Analysis of Two Types of Grammar for the Purpose of Automatic Rule Refinement" 2004, Monson, C., A. Lavie, J. Carbonell and L. Levin "Unsupervised Induction of Natural Language Morphology Inflection Classes". In Proceedings of Workshop on Current Themes in Computational Phonology and Morphology at the 42th Annual Meeting of the Association of Computational Linguistics (ACL-2004), Barcelona, Spain, July 2004.Monson, C., A. Lavie, J. Carbonell and L. Levin "Unsupervised Induction of Natural Language Morphology Inflection Classes" 2004, Monson, C., L. Levin, R. Vega, R. Brown, A. Font Llitjos, A. Lavie, J. Carbonell, E. Cañulef, R. Huisca. "Data Collection and Analysis of Mapudungun Morphology for Spelling Correction". In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC-2004), Lisbon, Portugal, May 2004.Monson, C., L. Levin, R. Vega, R. Brown, A. Font Llitjos, A. Lavie, J. Carbonell, E. Cañulef, R. Huisca. "Data Collection and Analysis of Mapudungun Morphology for Spelling Correction" 2004. Font Llitjós, A. and J.G. Carbonell. "The Translation Correction Tool: English- Spanish user studies“. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC-2004). Lisbon, Portugal, May 2004."The Translation Correction Tool: English- Spanish user studies“ 2004, Lavie, A., K. Probst, E. Peterson, S. Vogel, L.Levin, A. Font-Llitjos and J. Carbonell. "A Trainable Transfer-based Machine Translation Approach for Languages with Limited Resources". In Proceedings of Workshop of the European Association for Machine Translation (EAMT-2004), Valletta, Malta, April 2004.Lavie, A., K. Probst, E. Peterson, S. Vogel, L.Levin, A. Font-Llitjos and J. Carbonell. "A Trainable Transfer-based Machine Translation Approach for Languages with Limited Resources"

98 Mar 1, 2006AVENUE/LETRAS98 Publications 2003, Lavie, A., S. Vogel, L. Levin, E. Peterson, K. Probst, A. Font Llitjos, R. Reynolds, J. Carbonell, and R. Cohen, "Experiments with a Hindi-to-English Transfer- based MT System under a Miserly Data Scenario". ACM Transactions on Asian Language Information Processing (TALIP), 2(2). June 2003. Pages 143-163.Lavie, A., S. Vogel, L. Levin, E. Peterson, K. Probst, A. Font Llitjos, R. Reynolds, J. Carbonell, and R. Cohen, "Experiments with a Hindi-to-English Transfer- based MT System under a Miserly Data Scenario" 2002, Probst, K., L. Levin, E. Peterson, A. Lavie, and J. Carbonell, "MT for Minority Languages Using Elicitation-Based Learning of Syntactic Transfer Rules". Machine Translation, 17(4). Pages 245-270.Probst, K., L. Levin, E. Peterson, A. Lavie, and J. Carbonell, "MT for Minority Languages Using Elicitation-Based Learning of Syntactic Transfer Rules" 2002, Carbonell, J., K. Probst, E. Peterson, C. Monson, A. Lavie, R. Brown and L. Levin. "Automatic Rule Learning for Resource Limited MT". In Proceedings of 5th Conference of the Association for Machine Translation in the Americas (AMTA-2002), Tiburon, CA, October 2002.Carbonell, J., K. Probst, E. Peterson, C. Monson, A. Lavie, R. Brown and L. Levin. "Automatic Rule Learning for Resource Limited MT" 2002, Levin, L., R. Vega, J. Carbonell, R. Brown, A. Lavie, E. Canulef and C. Huenchullan. "Data Collection and Language Technologies for Mapudungun". In Proceedings of International Workshop on Resources and Tools in Field Linguistics at the Third International Conference on Language Resources and Evaluation (LREC- 2002), Las Palmas, Canary Islands, Spain, June 2002.Levin, L., R. Vega, J. Carbonell, R. Brown, A. Lavie, E. Canulef and C. Huenchullan. "Data Collection and Language Technologies for Mapudungun" 2001, Probst, K., R. Brown, J. Carbonell, A. Lavie, L. Levin, and E. Peterson. "Design and Implementation of Controlled Elicitation for Machine Translation of Low-density Languages". In Proceedings of the MT-2010 Workshop at MT-Summit VIII, Santiago de Compostela, Spain, September 2001. Probst, K., R. Brown, J. Carbonell, A. Lavie, L. Levin, and E. Peterson. "Design and Implementation of Controlled Elicitation for Machine Translation of Low-density Languages"

99 Mar 1, 2006AVENUE/LETRAS99 Mapudungun-to-Spanish Example Mapudungun pelafiñ Maria Spanish No vi a María English I didn’t see Maria

100 Mar 1, 2006AVENUE/LETRAS100 Mapudungun-to-Spanish Example Mapudungun pelafiñ Maria pe-la-fi-ñMaria see-neg-3.obj-1.subj.indicativeMaria Spanish No vi a María negsee.1.subj.past.indicativeaccMaria English I didn’t see Maria

101 Mar 1, 2006AVENUE/LETRAS101 V pe pe-la-fi-ñ Maria

102 Mar 1, 2006AVENUE/LETRAS102 V pe pe-la-fi-ñ Maria VSuff la Negation = +

103 Mar 1, 2006AVENUE/LETRAS103 V pe pe-la-fi-ñ Maria VSuff la VSuffG Pass all features up

104 Mar 1, 2006AVENUE/LETRAS104 V pe pe-la-fi-ñ Maria VSuff la VSuffG VSuff fi object person = 3

105 Mar 1, 2006AVENUE/LETRAS105 V pe pe-la-fi-ñ Maria VSuff la VSuffGVSuff fi VSuffG Pass all features up from both children

106 Mar 1, 2006AVENUE/LETRAS106 V pe pe-la-fi-ñ Maria VSuff la VSuffGVSuff fi VSuffGVSuff ñ person = 1 number = sg mood = ind

107 Mar 1, 2006AVENUE/LETRAS107 V pe pe-la-fi-ñ Maria VSuff la VSuffGVSuff fi VSuffGVSuff ñ Pass all features up from both children VSuffG

108 Mar 1, 2006AVENUE/LETRAS108 V V pe pe-la-fi-ñ Maria VSuff la VSuffGVSuff fi VSuffGVSuff ñ Pass all features up from both children VSuffG Check that: 1) negation = + 2) tense is undefined

109 Mar 1, 2006AVENUE/LETRAS109 V pe pe-la-fi-ñ Maria VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG V NP N Maria N person = 3 number = sg human = +

110 Mar 1, 2006AVENUE/LETRAS110 Pass features up from V pe pe-la-fi-ñ Maria VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V Check that NP is human = + V VP

111 Mar 1, 2006AVENUE/LETRAS111 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S

112 Mar 1, 2006AVENUE/LETRAS112 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V Pass all features to Spanish side

113 Mar 1, 2006AVENUE/LETRAS113 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V Pass all features down

114 Mar 1, 2006AVENUE/LETRAS114 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V Pass object features down

115 Mar 1, 2006AVENUE/LETRAS115 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V Accusative marker on objects is introduced because human = +

116 Mar 1, 2006AVENUE/LETRAS116 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V VP::VP [VBar NP] -> [VBar "a" NP] ((X1::Y1) (X2::Y3) ((X2 type) = (*NOT* personal)) ((X2 human) =c +) (X0 = X1) ((X0 object) = X2) (Y0 = X0) ((Y0 object) = (X0 object)) (Y1 = Y0) (Y3 = (Y0 object)) ((Y1 objmarker person) = (Y3 person)) ((Y1 objmarker number) = (Y3 number)) ((Y1 objmarker gender) = (Y3 ender)))

117 Mar 1, 2006AVENUE/LETRAS117 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V V“no” Pass person, number, and mood features to Spanish Verb Assign tense = past

118 Mar 1, 2006AVENUE/LETRAS118 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V V“no” Introduced because negation = +

119 Mar 1, 2006AVENUE/LETRAS119 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V V“no” ver

120 Mar 1, 2006AVENUE/LETRAS120 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V V“no” ver vi person = 1 number = sg mood = indicative tense = past

121 Mar 1, 2006AVENUE/LETRAS121 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V V“no” vi N María N Pass features over to Spanish side

122 Mar 1, 2006AVENUE/LETRAS122 V pe I Didn’t see Maria VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V V“no” vi N María N

123 Mar 1, 2006AVENUE/LETRAS123


Download ppt "AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students."

Similar presentations


Ads by Google