AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students.

Slides:



Advertisements
Similar presentations
Enabling MT for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University.
Advertisements

The current status of Chinese- English EBMT -where are we now Joy (Ying Zhang) Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
NICE: Native language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown Carnegie Mellon University.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
Automatic Rule Learning for Resource-Limited Machine Translation Alon Lavie, Katharina Probst, Erik Peterson, Jaime Carbonell, Lori Levin, Ralf Brown Language.
Machine Translation with Scarce Resources The Avenue Project.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Building NLP Systems for Two Resource Scarce Indigenous Languages: Mapudungun and Quechua, and some other languages Christian Monson, Ariadna Font Llitjós,
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Data Elicitation for AVENUE Lori Levin Alison Alvarez Jeff Good (MPI Leipzig) Bob Frederking Erik Peterson Language Technologies Institute Carnegie Mellon.
Eliciting Features from Minor Languages The elicitation tool provides a simple interface for bilingual informants with no linguistic training and limited.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Statistical XFER: Hybrid Statistical Rule-based Machine Translation Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Multi-Engine MT for Quick MT. Missing Technology for Quick MT LingWear ISI MT NICE Core Rapid MT - Multi-Engine MT - Omnivorous resource usage - Pervasive.
Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System Alon Lavie Language Technologies Institute Carnegie Mellon University.
Rule Learning - Overview Goal: Syntactic Transfer Rules 1) Flat Seed Generation: produce rules from word- aligned sentence pairs, abstracted only to POS.
AMTEXT: Extraction-based MT for Arabic Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Laura Kieras, Peter Jansen Informant: Loubna El Abadi.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Carnegie Mellon Goal Recycle non-expert post-editing efforts to: - Refine translation rules automatically - Improve overall translation quality Proposed.
Data Collection and Language Technologies for Mapudungun Lori Levin, Rodolfo Vega, Jaime Carbonell, Ralf Brown, Alon Lavie Language Technologies Institute.
The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.
Hebrew-to-English XFER MT Project - Update Alon Lavie June 2, 2004.
Nov 17, 2005Learning-based MT1 Learning-based MT Approaches for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon.
Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer.
An Overview of the AVENUE Project Presented by Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh,
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
Automating Post-Editing To Improve MT Systems Ariadna Font Llitjós and Jaime Carbonell APE Workshop, AMTA – Boston August 12, 2006.
Language Technologies Institute School of Computer Science Carnegie Mellon University NSF, August 6, 2001 Machine Translation for Indigenous Languages.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Semi-Automated Elicitation Corpus Generation The elicitation tool provides a simple interface for bilingual informants with no linguistic training and.
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
NICE: Native Language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown, Erik Peterson, Katharina Probst,
Data Elicitation for AVENUE By: Alison Alvarez Lori Levin Bob Frederking Jeff Good (MPI Leipzig) Erik Peterson.
The CMU Mill-RADD Project: Recent Activities and Results Alon Lavie Language Technologies Institute Carnegie Mellon University.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Bridging the Gap: Machine Translation for Lesser Resourced Languages
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
Seed Generation and Seeded Version Space Learning Version 0.02 Katharina Probst Feb 28,2002.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
AVENUE: Machine Translation for Resource-Poor Languages NSF ITR
Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September.
FROM BITS TO BOTS: Women Everywhere, Leading the Way Lenore Blum, Anastassia Ailamaki, Manuela Veloso, Sonya Allin, Bernardine Dias, Ariadna Font Llitjós.
Nov 17, 2005Learning-based MT1 Learning-based MT Approaches for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
Enabling MT for Languages with Limited Resources Alon Lavie and Lori Levin Language Technologies Institute Carnegie Mellon University.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
The AVENUE Project: Automatic Rule Learning for Resource-Limited Machine Translation Faculty: Alon Lavie, Jaime Carbonell, Lori Levin, Ralf Brown Students:
Eliciting a corpus of word-aligned phrases for MT
Faculty: Alon Lavie, Jaime Carbonell, Lori Levin, Ralf Brown Students:
Ariadna Font Llitjós March 10, 2004
Basic Parsing with Context Free Grammars Chapter 13
Alon Lavie, Jaime Carbonell, Lori Levin,
Towards Interactive and Automatic Refinement of Translation Rules
Towards Interactive and Automatic Refinement of Translation Rules
Presentation transcript:

AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students and Staff: Erik Peterson, Christian Monson, Ariadna Font Llitjós, Alison Alvarez, Roberto Aranovich, Rodolfo Vega

Mar 1, 2006AVENUE/LETRAS2 Outline Scientific Objectives Framework Overview Learning Morphology Elicitation Learning Transfer Rules Automatic Rule Refinement Language Prototypes New Directions

Mar 1, 2006AVENUE/LETRAS3 Why Machine Translation for Languages with Limited Resources? We are in the age of information explosion –The internet+web+Google  anyone can get the information they want anytime… But what about the text in all those other languages? –How do they read all this English stuff? –How do we read all the stuff that they put online? MT for these languages would Enable: –Better government access to native indigenous and minority communities –Better minority and native community participation in information-rich activities (health care, education, government) without giving up their languages. –Civilian and military applications (disaster relief) –Language preservation

Mar 1, 2006AVENUE/LETRAS4 The Roadmap to Learning-based MT Automatic acquisition of necessary language resources and knowledge using machine learning methodologies A framework for integrating the acquired MT resources into effective MT prototype systems Effective integration of acquired knowledge with statistical/distributional information

Mar 1, 2006AVENUE/LETRAS5 CMU’s AVENUE Approach Elicitation: use bilingual native informants to produce a small high-quality word-aligned bilingual corpus of translated phrases and sentences Transfer-rule Learning: apply ML-based methods to automatically acquire syntactic transfer rules for translation between the two languages –Learn from major language to minor language –Translate from minor language to major language XFER + Decoder: –XFER engine produces a lattice of possible transferred structures at all levels –Decoder searches and selects the best scoring combination Rule Refinement: automatically refine and correct the acquired transfer rules via a process of interaction with bilingual informants which help the system identify translation errors Morphology Learning: unsupervised learning of morpheme structure of words based on their organization into paradigms and distributional information

Mar 1, 2006AVENUE/LETRAS6 AVENUE MT Approach Interlingua Syntactic Parsing Semantic Analysis Sentence Planning Text Generation Source (e.g. Quechua) Target (e.g. English) Transfer Rules Direct: SMT, EBMT AVENUE: Automate Rule Learning

Mar 1, 2006AVENUE/LETRAS7 Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Morphology Analyzer Learning Module Handcrafted rules INPUT TEXT OUTPUT TEXT

Mar 1, 2006AVENUE/LETRAS8 Transfer Rule Formalism Type information Part-of-speech/constituent information Alignments x-side constraints y-side constraints xy-constraints, e.g. ((Y1 AGR) = (X1 AGR)) ; SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) )

Mar 1, 2006AVENUE/LETRAS9 Transfer Rule Formalism (II) Value constraints Agreement constraints ;SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) )

Mar 1, 2006AVENUE/LETRAS10 Transfer and Decoding Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Morphology Analyzer Learning Module Handcrafted rules INPUT TEXT OUTPUT TEXT

Mar 1, 2006AVENUE/LETRAS11 The Transfer Engine Analysis Source text is parsed into its grammatical structure. Determines transfer application ordering. Example: ראיתי את האיש הזקן ( I) saw *acc the man the old S VP V P NP D N D Adj ראיתי את האיש הזקן Transfer A target language tree is created by reordering, insertion, and deletion. S NP VP N V NP DET Adj N I saw the old man Source words translated with transfer lexicon. Generation Target language constraints are checked, target morphology applied, and final translation produced. E.g. “saw” in past tense selected. Final translation: “I saw the old man”

Mar 1, 2006AVENUE/LETRAS12 Symbolic Decoder System rarely finds a full parse/transfer for complete input sentence XFER engine produces comprehensive lattice of segment translations Decoder selects best combination of translation segments Search for optimal scoring path of partial translations, based on multiple features: –Target Language Model scores –XFER Rule Scores –Path Fragmentation –Other features… Symbolic decoding essential for scenarios where there is insufficient data for training large target LM –Effective Rule Scoring is crucial

Mar 1, 2006AVENUE/LETRAS13 Morphology Learning Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Morphology Analyzer Learning Module Handcrafted rules INPUT TEXT OUTPUT TEXT

Mar 1, 2006AVENUE/LETRAS14 The Challenge of Morphology Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers) Allkütulekefun

Mar 1, 2006AVENUE/LETRAS15 The Challenge of Morphology Mapudungun -ke-fu-n-leAllkütu

Mar 1, 2006AVENUE/LETRAS16 The Challenge of Morphology Mapudungun -ke -past -fu -indic.1sg -n -habitual -le -prog. Allkütu Listen

Mar 1, 2006AVENUE/LETRAS17 The Challenge of Morphology Mapudungun -ke -past -fu -indic.1sg -n -habitual -le -prog. Allkütu Listen I

Mar 1, 2006AVENUE/LETRAS18 The Challenge of Morphology Mapudungun Iused to -ke -past -fu -indic.1sg -n -habitual -le -prog. Allkütu Listen

Mar 1, 2006AVENUE/LETRAS19 The Challenge of Morphology Mapudungun Iused tolisten -ke -past -fu -indic.1sg -n -habitual -le -prog. Allkütu Listen

Mar 1, 2006AVENUE/LETRAS20 The Challenge of Morphology Mapudungun Iused tolisten -ke -past -fu -indic.1sg -n -habitual -le -prog. Allkütu Listen Tasks for Morphology Segment Words Map Morphemes onto Features

Mar 1, 2006AVENUE/LETRAS21 The Challenge of Morphology Tasks for Morphology Segment Words Map Morphemes onto Features Learn these tasks –unsupervised –from data –for any language

Mar 1, 2006AVENUE/LETRAS22 Leverage the Natural Structure of Morphology Paradigm –Set of affixes that interchangeably attach to a set of stems Our Approach

Mar 1, 2006AVENUE/LETRAS23 Ø.s blame solve Example Vocabulary blame blamed blames roamed roaming roams solve solves solving Our Approach Leverage the Natural Structure of Morphology Paradigm –Set of affixes that interchangeably attach to a set of stems

Mar 1, 2006AVENUE/LETRAS24 Ø.s blame solve Ø.s.d blame Example Vocabulary blame blamed blames roamed roaming roams solve solves solving Our Approach Leverage the Natural Structure of Morphology Paradigm –Set of affixes that interchangeably attach to a set of stems

Mar 1, 2006AVENUE/LETRAS25 Ø.s blame solve Ø.s.d blame Example Vocabulary blame blamed blames roamed roaming roams solve solves solving Our Approach Leverage the Natural Structure of Morphology Paradigm –Set of affixes that interchangeably attach to a set of stems

Mar 1, 2006AVENUE/LETRAS26 Ø.s blame solve Ø.s.d blame s blame roam solve Example Vocabulary blame blamed blames roamed roaming roams solve solves solving Our Approach Leverage the Natural Structure of Morphology Paradigm –Set of affixes that interchangeably attach to a set of stems

Mar 1, 2006AVENUE/LETRAS27 Ø.s blame solve Ø.s.d blame s blame roam solve Example Vocabulary blame blamed blames roamed roaming roams solve solves solving Our Approach Leverage the Natural Structure of Morphology Paradigm –Set of affixes that interchangeably attach to a set of stems

Mar 1, 2006AVENUE/LETRAS28 Ø.s blame solve Ø.s.d blame s blame roam solve e.es blam solv Example Vocabulary blame blamed blames roamed roaming roams solve solves solving Our Approach

Mar 1, 2006AVENUE/LETRAS29 Ø.s blame solve Example Vocabulary blame blamed blames roamed roaming roams solve solves solving Ø.s.d blame s blame roam solve e.es blam solv Our Approach

Mar 1, 2006AVENUE/LETRAS30 e.es blam solv e.ed blam es blam solv Ø.s.d blame Ø.s blame solve Ø blame blames blamed roams roamed roaming solve solves solving e.es.ed blam ed blam roam d blame roame Ø.d blame s.d blame s blame roam solve es.ed blam e blam solv me.mes bla me.med bla mes bla me.mes.med bla med bla roa mes.med bla me bla

Mar 1, 2006AVENUE/LETRAS31 a.as.o.os 43 african, cas, jurídic, l,... a.as.o.os.tro 1 cas a.as.os 50 afectad, cas, jurídic, l,... a.as.o 59 cas, citad, jurídic, l,... a.o.os 105 impuest, indonesi, italian, jurídic,... a.as 199 huelg, incluid, industri, inundad,... a.os 134 impedid, impuest, indonesi, inundad,... as.os 68 cas, implicad, inundad, jurídic,... a.o 214 id, indi, indonesi, inmediat,... as.o 85 intern, jurídic, just, l,... a.tro 2 cas.cen a 1237 huelg, ib, id, iglesi,... as 404 huelg, huelguist, incluid, industri,... os 534 humorístic, human, hígad, impedid,... o 1139 hub, hug, human, huyend,... tro 16 catas, ce, cen, cua,... as.o.os 54 cas, implicad, jurídic, l,... o.os 268 human, implicad, indici, indocumentad,... Spanish Newswire Corpus 40,011 Tokens 6,975 Types 31

Mar 1, 2006AVENUE/LETRAS32 a.as.o.os 43 african, cas, jurídic, l,... a.as.o.os.tro 1 cas a.as.os 50 afectad, cas, jurídic, l,... a.as.o 59 cas, citad, jurídic, l,... a.o.os 105 impuest, indonesi, italian, jurídic,... a.as 199 huelg, incluid, industri, inundad,... a.os 134 impedid, impuest, indonesi, inundad,... as.os 68 cas, implicad, inundad, jurídic,... a.o 214 id, indi, indonesi, inmediat,... as.o 85 intern, jurídic, just, l,... a.tro 2 cas.cen a 1237 huelg, ib, id, iglesi,... as 404 huelg, huelguist, incluid, industri,... os 534 humorístic, human, hígad, impedid,... o 1139 hub, hug, human, huyend,... tro 16 catas, ce, cen, cua,... as.o.os 54 cas, implicad, jurídic, l,... o.os 268 human, implicad, indici, indocumentad, Suffixes Stems Level 5 = 5 suffixes Stem Type Count

Mar 1, 2006AVENUE/LETRAS33 a.as.o.os 43 african, cas, jurídic, l,... Adjective Paradigm 33 a.as.o.os.tro 1 cas a.tro 2 cas.cen tro 16 catas, ce, cen, cua,... a.as.os 50 afectad, cas, jurídic, l,... a.as.o 59 cas, citad, jurídic, l,... a.o.os 105 impuest, indonesi, italian, jurídic,... a.as 199 huelg, incluid, industri, inundad,... a.os 134 impedid, impuest, indonesi, inundad,... as.os 68 cas, implicad, inundad, jurídic,... a.o 214 id, indi, indonesi, inmediat,... as.o 85 intern, jurídic, just, l,... a 1237 huelg, ib, id, iglesi,... as 404 huelg, huelguist, incluid, industri,... os 534 humorístic, human, hígad, impedid,... o 1139 hub, hug, human, huyend,... as.o.os 54 cas, implicad, jurídic, l,... o.os 268 human, implicad, indici, indocumentad,... From the spurious suffix “tro”

Mar 1, 2006AVENUE/LETRAS34 a.as.o.os.tro 1 cas a.tro 2 cas.cen tro 16 catas, ce, cen, cua,... a.as.o.os 43 african, cas, jurídic, l,... a.as.os 50 afectad, cas, jurídic, l,... a.as.o 59 cas, citad, jurídic, l,... a.o.os 105 impuest, indonesi, italian, jurídic,... a.as 199 huelg, incluid, industri, inundad,... a.os 134 impedid, impuest, indonesi, inundad,... as.os 68 cas, implicad, inundad, jurídic,... a.o 214 id, indi, indonesi, inmediat,... as.o 85 intern, jurídic, just, l,... a 1237 huelg, ib, id, iglesi,... as 404 huelg, huelguist, incluid, industri,... os 534 humorístic, human, hígad, impedid,... o 1139 hub, hug, human, huyend,... as.o.os 54 cas, implicad, jurídic, l,... o.os 268 human, implicad, indici, indocumentad, Decreasing Stem Count Increasing Suffix Count Basic Search Procedure

Mar 1, 2006AVENUE/LETRAS35 Examples and Evaluation of Automatically Selected Suffix Sets Ø.ba.n.ndo ada.adas.ado.ados.aron.ó a.aba.ado.ados.ar.ará.arán ada.ado.ados.ar.o a.aciones.ación.adas.ado.ar ado.adores.o a.ada.adas.ado.ar.ará ado.ados.arse.e a.adas.ado.an.ar ado.ar.aron.arse.ará a.ado.ados.ar.ó do.dos.ndo.r.ron a.ado.an.arse.ó e.ida.ido a.ado.aron.arse.ó emos.ido.ía.ían aba.ada.ado.ar.o.os ida.ido.idos.ir.ió aciones.ación.ado.ados ido.iendo.ir aciones.ado.ados.aráido.ir.ro ación.ado.an.e 35 Global Suffix Evaluation Precision:0.506 Recall:0.517 F1:0.511 Key Correct Wrong

Mar 1, 2006AVENUE/LETRAS36 Next Steps for Morphology Induction Improve the Quality of Induced Paradigms –Current Work Convert Paradigms into a Segmenter –Soon Learn Mappings from Morphemes to Features –Future Goal

Mar 1, 2006AVENUE/LETRAS37 Elicitation Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Morphology Analyzer Learning Module Handcrafted rules INPUT TEXT OUTPUT TEXT

Mar 1, 2006AVENUE/LETRAS38 Purpose of Elicitation Provide a small but highly targeted corpus of hand aligned data –To support machine learning from a small data set –To cover all basic morpho- syntactic phenomena. newpair srcsent: Tú caíste tgtsent: eymi ütrünagimi aligned: ((1,1),(2,2)) context: tú = Juan [masculino, 2a persona del singular] comment: You (John) fell newpair srcsent: Tú estás cayendo tgtsent: eymi petu ütünagimi aligned: ((1,1),(2 3,2 3)) context: tú = Juan [masculino, 2a persona del singular] comment: You (John) are falling newpair srcsent: Tú caíste tgtsent: eymi ütrunagimi aligned: ((1,1),(2,2)) context: tú = María [femenino, 2a persona del singular] comment: You (Mary) fell

Mar 1, 2006AVENUE/LETRAS39 Purpose of Elicitation To get data from someone who is –Bilingual –Literate –Not experienced with linguistics

Mar 1, 2006AVENUE/LETRAS40 English-Hindi Example

Mar 1, 2006AVENUE/LETRAS41 English-Chinese Example

Mar 1, 2006AVENUE/LETRAS42 English-Arabic Example

Mar 1, 2006AVENUE/LETRAS43 The Elicitation Tool has been used with these languages Mapudungun Hindi Hebrew Quechua Aymara Thai Japanese Chinese Dutch Arabic

Mar 1, 2006AVENUE/LETRAS44 Elicitation Corpus: List of Minimal Pairs of Sentences in a Major Language Eliciting from Spanish Canto Canté Estoy cantando Cantaste Eliciting from English I sing I sang I am singing You sang

Mar 1, 2006AVENUE/LETRAS45 AVENUE Elicitation Corpora The Functional-Typological Corpus –Designed to elicit elements of meaning that may have morpho-syntactic realization The Structural Elicitation Corpus –Based on sentence structures from the Penn TreeBank

Mar 1, 2006 The Process List of semantic features and values The Corpus Feature Maps: which combinations of features and values are of interest … Clause- Level Noun- Phrase Tense & Aspect Modality Feature Structure Sets Feature Specification Reverse Annotated Feature Structure Sets: add English sentences Smaller Corpus Sampling

Mar 1, 2006AVENUE/LETRAS47 Feature Structures srcsent: Mary was not a leader. context: Translate this as though it were spoken to a peer co- worker; ((actor ((np-function fn-actor)(np-animacy anim-human)(np- biological-gender bio-gender-female) (np-general-type proper-noun-type)(np-identifiability identifiable)(np- specificity specific)…)) (pred ((np-function fn-predicate-nominal)(np-animacy anim- human)(np-biological-gender bio-gender-female) (np- general-type common-noun-type)(np-specificity specificity- neutral)…)) (c-v-lexical-aspect state)(c-copula-type copula-role)(c-secondary-type secondary-copula)(c-solidarity solidarity-neutral) (c-v-grammatical- aspect gram-aspect-neutral)(c-v-absolute-tense past) (c-v-phase- aspect phase-aspect-neutral) (c-general-type declarative-clause)(c- polarity polarity-negative)(c-my-causer-intentionality intentionality- n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative- n/a)(c-our-boundary boundary-n/a)…)

Mar 1, 2006AVENUE/LETRAS48 Feature Specification Defines Features and their values Sets default values for features Specifies feature requirements and restrictions Written in XML

Mar 1, 2006AVENUE/LETRAS49 Feature Specification Feature: c-copula-type (a copula is a verb like “be”; some languages do not have copulas) Values copula-n/a Restrictions: 1. ~(c-secondary-type secondary-copula) Notes: copula-role Restrictions: 1. (c-secondary-type secondary-copula) Notes: 1. A role is something like a job or a function. "He is a teacher" "This is a vegetable peeler" copula-identity Restrictions: 1. (c-secondary-type secondary-copula) Notes: 1. "Clark Kent is Superman" "Sam is the teacher" copula-location Restrictions: 1. (c-secondary-type secondary-copula) Notes: 1. "The book is on the table" There is a long list of locative relations later in the feature specification. copula-description Restrictions: 1. (c-secondary-type secondary-copula) Notes: 1. A description is an attribute. "The children are happy." "The books are long."

Mar 1, 2006AVENUE/LETRAS50 Feature Maps Some features interact in the grammar –English –s reflects person and number of the subject and tense of the verb. –In expressing the English present progressive tense, the auxiliary verb is in a different place in a question and a statement: He is running. Is he running? We need to check many, but not all combinations of features and values. Using unlimited feature combinations leads to an unmanageable number of sentences

Mar 1, 2006AVENUE/LETRAS51

Mar 1, 2006AVENUE/LETRAS52 Evidentiality Map Lexical Aspect Assertiveness Polarity Source Tense Gram. Aspect activity-accomplishment Assertiveness-asserted, Assetiveness-neutral Polarity-positive, Polarity-negative Hearsay, quotative, inferred, assumption Visual, Auditory, non- visual-or-auditory PastPresent, FuturePastPresent Perfective, progressive, habitual, neutral habitual, neutral, progressive Perfective, progressive, habitual, neutral habitual, neutral, progressive

Mar 1, 2006AVENUE/LETRAS53 Current Work Navigation –Start: large search space of all possible feature combinations –Finish: each feature has been eliminated as irrelevant or has been explored –Goal: dynamically find the most efficient path through the search space for each language.

Mar 1, 2006AVENUE/LETRAS54 Current Work Feature Detection –Which features have an effect on morphosyntax? –What is the effect? –Drives the Navigation process

Mar 1, 2006AVENUE/LETRAS55 Feature Detection: Spanish The girl saw a red book. ((1,1)(2,2)(3,3)(4,4)(5,6)(6,5)) La niña vió un libro rojo A girl saw a red book ((1,1)(2,2)(3,3)(4,4)(5,6)(6,5)) Una niña vió un libro rojo I saw the red book ((1,1)(2,2)(3,3)(4,5)(5,4)) Yo vi el libro rojo I saw a red book. ((1,1)(2,2)(3,3)(4,5)(5,4)) Yo vi un libro rojo Feature: definiteness Values: definite, indefinite Function-of-*: subj, obj Marked-on-head-of-*: no Marked-on-dependent: yes Marked-on-governor: no Marked-on-other: no Add/delete-word: no Change-in-alignment: no

Mar 1, 2006AVENUE/LETRAS56 Feature Detection: Chinese A girl saw a red book. ((1,2)(2,2)(3,3)(3,4)(4,5)(5,6)(5,7)(6,8)) 有 一个 女人 看见 了 一本 红色 的 书 。 The girl saw a red book. ((1,1)(2,1)(3,3)(3,4)(4,5)(5,6)(6,7)) 女人 看见 了 一本 红色的 书 Feature: definiteness Values: definite, indefinite Function-of-*: subject Marked-on-head-of-*: no Marked-on-dependent: no Marked-on-governor: no Add/delete-word: yes Change-in-alignment: no

Mar 1, 2006AVENUE/LETRAS57 Feature Detection: Chinese I saw the red book ((1, 3)(2, 4)(2, 5)(4, 1)(5, 2)) 红色的 书, 我 看见 了 I saw a red book. ((1,1)(2,2)(2,3)(2, 4)(4,5)(5,6)) 我 看见 了 一本 红色的 书 。 Feature: definitenes Values: definite, indefinite Function-of-*: object Marked-on-head-of-*: no Marked-on-dependent: no Marked-on-governor: no Add/delete-word: yes Change-in-alignment: yes

Mar 1, 2006AVENUE/LETRAS58 Feature Detection: Hebrew A girl saw a red book. ((2,1) (3,2)(5,4)(6,3)) ילדה ראתה ספר אדום The girl saw a red book ((1,1)(2,1)(3,2)(5,4)(6,3)) הילדה ראתה ספר אדום I saw a red book. ((2,1)(4,3)(5,2)) ראיתי ספר אדום I saw the red book. ((2,1)(3,3)(3,4)(4,4)(5,3)) ראיתי את הספר האדום Feature: definiteness Values: definite, indefinite Function-of-*: subj, obj Marked-on-head-of-*: yes Marked-on-dependent: yes Marked-on-governor: no Add-word: no Change-in-alignment: no

Mar 1, 2006AVENUE/LETRAS59 Feature Detection Feeds into… Corpus Navigation: which minimal pairs to pursue next. –Don’t pursue gender in Mapudungun –Do pursue definiteness in Hebrew Morphology Learning: –Morphological learner identifies the forms of the morphemes –Feature detection identifies the functions Rule learning: –Rule learner will have to learn a constraint for each morpho- syntactic marker that is discovered E.g., Adjectives and nouns agree in gender, number, and definiteness in Hebrew.

Mar 1, 2006AVENUE/LETRAS60 Rule Learning Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Morphology Analyzer Learning Module Handcrafted rules INPUT TEXT OUTPUT TEXT

Mar 1, 2006AVENUE/LETRAS61 Rule Learning - Overview Goal: Acquire Syntactic Transfer Rules Use available knowledge from the major- language side (grammatical structure) Three steps: 1.Flat Seed Generation: first guesses at transfer rules; flat syntactic structure 2.Compositionality Learning: use previously learned rules to learn hierarchical structure 3.Constraint Learning: refine rules by learning appropriate feature constraints

Mar 1, 2006AVENUE/LETRAS62 Flat Seed Rule Generation Learning Example: NP Eng: the big apple Heb: ha-tapuax ha-gadol Generated Seed Rule: NP::NP [ART ADJ N]  [ART N ART ADJ] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))

Mar 1, 2006AVENUE/LETRAS63 Flat Seed Rule Generation Create a “flat” transfer rule specific to the sentence pair, partially abstracted to POS –Words that are aligned word-to-word and have the same POS in both languages are generalized to their POS –Words that have complex alignments (or not the same POS) remain lexicalized One seed rule for each translation example No feature constraints associated with seed rules (but mark the example(s) from which it was learned)

Mar 1, 2006AVENUE/LETRAS64 Compositionality Learning Initial Flat Rules: S::S [ART ADJ N V ART N]  [ART N ART ADJ V P ART N] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) (X4::Y5) (X5::Y7) (X6::Y8)) NP::NP [ART ADJ N]  [ART N ART ADJ] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2)) NP::NP [ART N]  [ART N] ((X1::Y1) (X2::Y2)) Generated Compositional Rule: S::S [NP V NP]  [NP V P NP] ((X1::Y1) (X2::Y2) (X3::Y4))

Mar 1, 2006AVENUE/LETRAS65 Compositionality Learning Detection: traverse the c-structure of the English sentence, add compositional structure for translatable chunks Generalization: adjust constituent sequences and alignments Two implemented variants: –Safe Compositionality: there exists a transfer rule that correctly translates the sub-constituent –Maximal Compositionality: Generalize the rule if supported by the alignments, even in the absence of an existing transfer rule for the sub-constituent

Mar 1, 2006AVENUE/LETRAS66 Constraint Learning Input: Rules and their Example Sets S::S [NP V NP]  [NP V P NP] {ex1,ex12,ex17,ex26} ((X1::Y1) (X2::Y2) (X3::Y4)) NP::NP [ART ADJ N]  [ART N ART ADJ] {ex2,ex3,ex13} ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2)) NP::NP [ART N]  [ART N] {ex4,ex5,ex6,ex8,ex10,ex11} ((X1::Y1) (X2::Y2)) Output: Rules with Feature Constraints: S::S [NP V NP]  [NP V P NP] ((X1::Y1) (X2::Y2) (X3::Y4) (X1 NUM = X2 NUM) (Y1 NUM = Y2 NUM) (X1 NUM = Y1 NUM))

Mar 1, 2006AVENUE/LETRAS67 Constraint Learning Goal: add appropriate feature constraints to the acquired rules Methodology: –Preserve general structural transfer –Learn specific feature constraints from example set Seed rules are grouped into clusters of similar transfer structure (type, constituent sequences, alignments) Each cluster forms a version space: a partially ordered hypothesis space with a specific and a general boundary The seed rules in a group form the specific boundary of a version space The general boundary is the (implicit) transfer rule with the same type, constituent sequences, and alignments, but no feature constraints

Mar 1, 2006AVENUE/LETRAS68 Rule Refinement Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Morphology Analyzer Learning Module Handcrafted rules INPUT TEXT OUTPUT TEXT

Mar 1, 2006AVENUE/LETRAS69 Interactive and Automatic Refinement of Translation Rules Problem: Improve Machine Translation quality. Proposed Solution: Put bilingual speakers back into the loop; use their corrections to detect the source of the error and automatically improve the lexicon and the grammar. Approach: Automate post-editing efforts by feeding them back into the MT system.  Automatic refinement of translation rules that caused an error beyond post-editing. Goal: Improve MT coverage and overall quality.

Mar 1, 2006AVENUE/LETRAS70 Technical Challenges Elicit minimal MT information from non-expert users Automatically Refine and Expand Translation Rules minimally Manually written Automatically Learned Automatic Evaluation of Refinement process

71 Error Typology for Automatic Rule Refinement (simplified) Missing word Extra word Wrong word order Incorrect word Wrong agreement Local vs Long distance Word vs. phrase + Word change Sense Form Selectional restrictions Idiom Missing constraint Extra constraint

Mar 1, 2006AVENUE/LETRAS72 TCTool (Demo)Demo Add a word Delete a word Modify a word Change word order Actions: Interactive elicitation of error information precisionrecall error detection90%89% error classification72%71%

Mar 1, 2006AVENUE/LETRAS73 1. Refine a translation rule: R0  R1 (change R0 to make it more specific or more general) Types of Refinement Operations Automatic Rule Adaptation R0: R1: NP DET N ADJ NP DET ADJ N a nice house una casa bonito NP DET N ADJ NP DET ADJ N a nice house una casa bonita N gender = ADJ gender

Mar 1, 2006AVENUE/LETRAS74 2. Bifurcate a translation rule: R0  R0 (same, general rule)  R1 (add a new more specific rule) Types of Refinement Operations Automatic Rule Adaptation R0: NP DET N ADJ NP DET ADJ N NP DET ADJ N NP DET ADJ N R1: a nice house una casa bonita a great artist un gran artista ADJ type: pre-nominal

AVENUE/LETRAS75 Error Information Elicitation Refinement Operation Typology Automatic Rule Adaptation Change word order SL: Gaudí was a great artist MT system output: TL: Gaudí era un artista grande Ucorrection: *Gaudí era un artista grande Gaudí era un gran artista A concrete example clue word error correction

76 Finding Triggering Feature(s):  (error word, corrected word ) =   need to postulate a new binary feature: feat1 Blame assignment (from MT system output) tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> Automatic Rule Adaptation S,1 … NP,1 … NP,8 … Grammar ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc))

Mar 1, 2006AVENUE/LETRAS77 Refining Rules Bifurcate NP,8  NP,8 (R0) + NP,8’ (R1) (flip order of ADJ-N) {NP,8’} NP::NP : [DET ADJ N] -> [DET ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2 agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + )) Automatic Rule Adaptation

Mar 1, 2006AVENUE/LETRAS78 Refining Lexical Entries ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = -)) ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = +)) Automatic Rule Adaptation

Mar 1, 2006AVENUE/LETRAS79 Evaluating Improvement Automatic Rule Adaptation -Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present: -Corrected Translation Sentence -Original Translation Sentence (labelled as incorrect by the user) un artista gran un gran artista un grande artista *un artista grande

Mar 1, 2006AVENUE/LETRAS80 Evaluating Improvement Automatic Rule Adaptation -Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present: -Corrected Translation Sentence -Original Translation Sentence (labelled as incorrect by the user) *un artista gran un gran artista *un grande artista *un artista grande

Mar 1, 2006AVENUE/LETRAS81 Challenges and future work Credit and Blame assignment from TCTool Log Files and Xfer engine’s trace Order of corrections matters ~ explore rule interactions Explore the space between batch mode and fully interactive system Online TCTool always running to collect corrections from bilingual speakers  make it into a game with rewards for the best users

Mar 1, 2006AVENUE/LETRAS82 AVENUE Prototypes General XFER framework under development for past three years Prototype systems so far: –German-to-English, Dutch-to-English –Chinese-to-English –Hindi-to-English –Hebrew-to-English In progress or planned: –Mapudungun-to-Spanish –Quechua-to-Spanish –Native Alaskan languages (Inupiaq) to English –Native-Bolivian languages (Aymara) to Spanish –Native-Brazilian languages to Brazilian Portuguese

Mar 1, 2006AVENUE/LETRAS83 Mapudungun Indigenous Language of Chile and Argentina ~ 1 Million Mapuche Speakers

Mar 1, 2006AVENUE/LETRAS84 Collaboration Mapuche Language Experts –Universidad de la Frontera (UFRO) Instituto de Estudios Indígenas (IEI) –Institute for Indigenous Studies Chilean Funding –Chilean Ministry of Education (Mineduc) Bilingual and Multicultural Education Program Eliseo Cañulef Rosendo Huisca Hugo Carrasco Hector Painequeo Flor Caniupil Luis Caniupil Huaiquiñir Marcela Collio Calfunao Cristian Carrillan Anton Salvador Cañulef Carolina Huenchullan Arrúe Claudio Millacura Salas

Mar 1, 2006AVENUE/LETRAS85 Accomplishments Corpora Collection –Spoken Corpus Collected: Luis Caniupil Huaiquiñir Medical Domain 3 of 4 Mapudungun Dialects –120 hours of Nguluche –30 hours of Lafkenche –20 hours of Pwenche Transcribed in Mapudungun Translated into Spanish –Written Corpus ~ 200,000 words Bilingual Mapudungun – Spanish Historical and newspaper text nmlch-nmjm1_x_0405_nmjm_00: M: no pütokovilu kay ko C: no, si me lo tomaba con agua M: chumgechi pütokoki femuechi pütokon pu C: como se debe tomar, me lo tomé pués nmlch-nmjm1_x_0406_nmlch_00: M: Chengewerkelafuymiürke C: Ya no estabas como gente entonces!

Mar 1, 2006AVENUE/LETRAS86 Accomplishments Developed At UFRO –Bilingual Dictionary with Examples 1,926 entries –Spelling Corrected Mapudungun Word List 117,003 fully-inflected word forms –Segmented Word List 15,120 forms Stems translated into Spanish

Mar 1, 2006AVENUE/LETRAS87 Accomplishments Developed at LTI using Mapudungun language resources from UFRO –Spelling Checker Integrated into OpenOffice –Hand-built Morphological Analyzer –Prototype Machine Translation Systems Rule-Based Example-Based –Website: LenguasAmerindias.org

Mar 1, 2006AVENUE/LETRAS88 Quechua  Spanish MT V-Unit: funded Summer project in Cusco (Peru) June-August 2005 [preparations and data collection started earlier] Intensive Quechua course in Centro Bartolome de las Casas (CBC) Worked together with two Quechua native and one non-native speakers on developing infrastructure (correcting elicited translations, segmenting and translating list of most frequent words)

Mar 1, 2006AVENUE/LETRAS89 Quechua  Spanish Prototype MT System Stem Lexicon (semi-automatically generated): 753 lexical entries Suffix lexicon: 21 suffixes –(150 Cusihuaman) Quechua morphology analyzer 25 translation rules Spanish morphology generation module User-Studies: 10 sentences, 3 users (2 native, 1 non-native)

Mar 1, 2006AVENUE/LETRAS90 Challenges for Hebrew MT Paucity in existing language resources for Hebrew –No publicly available broad coverage morphological analyzer –No publicly available bilingual lexicons or dictionaries –No POS-tagged corpus or parse tree-bank corpus for Hebrew –No large Hebrew/English parallel corpus Scenario well suited for CMU transfer-based MT framework for languages with limited resources

Mar 1, 2006AVENUE/LETRAS91 Hebrew Morphology Example Input word: B$WRH | B$WRH | |-----B-----|$WR|--H--| |--B--|-H--|--$WRH---|

Mar 1, 2006AVENUE/LETRAS92 Hebrew Morphology Example Y0: ((SPANSTART 0) Y1: ((SPANSTART 0) Y2: ((SPANSTART 1) (SPANEND 4) (SPANEND 2) (SPANEND 3) (LEX B$WRH) (LEX B) (LEX $WR) (POS N) (POS PREP)) (POS N) (GEN F) (GEN M) (NUM S) (NUM S) (STATUS ABSOLUTE)) (STATUS ABSOLUTE)) Y3: ((SPANSTART 3) Y4: ((SPANSTART 0) Y5: ((SPANSTART 1) (SPANEND 4) (SPANEND 1) (SPANEND 2) (LEX $LH) (LEX B) (LEX H) (POS POSS)) (POS PREP)) (POS DET)) Y6: ((SPANSTART 2) Y7: ((SPANSTART 0) (SPANEND 4) (SPANEND 4) (LEX $WRH) (LEX B$WRH) (POS N) (POS LEX)) (GEN F) (NUM S) (STATUS ABSOLUTE))

Mar 1, 2006AVENUE/LETRAS93 Sample Output (dev-data) maxwell anurpung comes from ghana for israel four years ago and since worked in cleaning in hotels in eilat a few weeks ago announced if management club hotel that for him to leave israel according to the government instructions and immigration police in a letter in broken english which spread among the foreign workers thanks to them hotel for their hard work and announced that will purchase for hm flight tickets for their countries from their money

Mar 1, 2006AVENUE/LETRAS94 Future Research Directions Automatic Transfer Rule Learning: –In the “large-data” scenario: from large volumes of uncontrolled parallel text automatically word-aligned –In the absence of morphology or POS annotated lexica –Learning mappings for non-compositional structures –Effective models for rule scoring for Decoding: using scores at runtime Pruning the large collections of learned rules –Learning Unification Constraints Integrated Xfer Engine and Decoder –Improved models for scoring tree-to-tree mappings, integration with LM and other knowledge sources in the course of the search

Mar 1, 2006AVENUE/LETRAS95 Future Research Directions Automatic Rule Refinement Morphology Learning Feature Detection and Corpus Navigation Prototypes for New Languages

Mar 1, 2006AVENUE/LETRAS96 Publications 2005, Carbonell, J. G., A. Lavie, L. Levin and A. Black, "Language Technologies for Humanitarian Aid". In Technology for Humanitarian Action, K. M. Cahill (ed.), pp , Fordham University Press, ISBN , 2005.Carbonell, J. G., A. Lavie, L. Levin and A. Black, "Language Technologies for Humanitarian Aid" Font Llitjós, A., R. Aranovich and L. Levin. "Building Machine translation systems for indigenous languages". Second Conference on the Indigenous Languages of Latin America (CILLA II), October 2005, Texas, USA."Building Machine translation systems for indigenous languages" 2005, Font-Llitjos, A., J.G. Carbonell and A. Lavie. "A Framework for Interactive and Automatic Refinement of Transfer-based Machine Translation". In Proceedings of the 10th Annual Conference of the European Association for Machine Translation (EAMT-2005), Budapest, Hungary, May 2005.Font-Llitjos, A., J.G. Carbonell and A. Lavie. "A Framework for Interactive and Automatic Refinement of Transfer-based Machine Translation" 2004, Lavie, A., S. Wintner, Y. Eytani, E. Peterson and K. Probst. "Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System". In Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-2004), Baltimore, MD, October Pages 1-10.Lavie, A., S. Wintner, Y. Eytani, E. Peterson and K. Probst. "Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System" 2004, Probst, K. and A. Lavie. "A Structurally Diverse Minimal Corpus for Eliciting Structural Mappings between Languages". In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (AMTA-2004), Washington, DC, September 2004.Probst, K. and A. Lavie. "A Structurally Diverse Minimal Corpus for Eliciting Structural Mappings between Languages"

Mar 1, 2006AVENUE/LETRAS97 Publications Font Llitjós, A., K. Probst and J.G. Carbonell. "Error Analysis of Two Types of Grammar for the Purpose of Automatic Rule Refinement". In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (AMTA-2004), Washington, DC, September 2004."Error Analysis of Two Types of Grammar for the Purpose of Automatic Rule Refinement" 2004, Monson, C., A. Lavie, J. Carbonell and L. Levin "Unsupervised Induction of Natural Language Morphology Inflection Classes". In Proceedings of Workshop on Current Themes in Computational Phonology and Morphology at the 42th Annual Meeting of the Association of Computational Linguistics (ACL-2004), Barcelona, Spain, July 2004.Monson, C., A. Lavie, J. Carbonell and L. Levin "Unsupervised Induction of Natural Language Morphology Inflection Classes" 2004, Monson, C., L. Levin, R. Vega, R. Brown, A. Font Llitjos, A. Lavie, J. Carbonell, E. Cañulef, R. Huisca. "Data Collection and Analysis of Mapudungun Morphology for Spelling Correction". In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC-2004), Lisbon, Portugal, May 2004.Monson, C., L. Levin, R. Vega, R. Brown, A. Font Llitjos, A. Lavie, J. Carbonell, E. Cañulef, R. Huisca. "Data Collection and Analysis of Mapudungun Morphology for Spelling Correction" Font Llitjós, A. and J.G. Carbonell. "The Translation Correction Tool: English- Spanish user studies“. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC-2004). Lisbon, Portugal, May 2004."The Translation Correction Tool: English- Spanish user studies“ 2004, Lavie, A., K. Probst, E. Peterson, S. Vogel, L.Levin, A. Font-Llitjos and J. Carbonell. "A Trainable Transfer-based Machine Translation Approach for Languages with Limited Resources". In Proceedings of Workshop of the European Association for Machine Translation (EAMT-2004), Valletta, Malta, April 2004.Lavie, A., K. Probst, E. Peterson, S. Vogel, L.Levin, A. Font-Llitjos and J. Carbonell. "A Trainable Transfer-based Machine Translation Approach for Languages with Limited Resources"

Mar 1, 2006AVENUE/LETRAS98 Publications 2003, Lavie, A., S. Vogel, L. Levin, E. Peterson, K. Probst, A. Font Llitjos, R. Reynolds, J. Carbonell, and R. Cohen, "Experiments with a Hindi-to-English Transfer- based MT System under a Miserly Data Scenario". ACM Transactions on Asian Language Information Processing (TALIP), 2(2). June Pages Lavie, A., S. Vogel, L. Levin, E. Peterson, K. Probst, A. Font Llitjos, R. Reynolds, J. Carbonell, and R. Cohen, "Experiments with a Hindi-to-English Transfer- based MT System under a Miserly Data Scenario" 2002, Probst, K., L. Levin, E. Peterson, A. Lavie, and J. Carbonell, "MT for Minority Languages Using Elicitation-Based Learning of Syntactic Transfer Rules". Machine Translation, 17(4). Pages Probst, K., L. Levin, E. Peterson, A. Lavie, and J. Carbonell, "MT for Minority Languages Using Elicitation-Based Learning of Syntactic Transfer Rules" 2002, Carbonell, J., K. Probst, E. Peterson, C. Monson, A. Lavie, R. Brown and L. Levin. "Automatic Rule Learning for Resource Limited MT". In Proceedings of 5th Conference of the Association for Machine Translation in the Americas (AMTA-2002), Tiburon, CA, October 2002.Carbonell, J., K. Probst, E. Peterson, C. Monson, A. Lavie, R. Brown and L. Levin. "Automatic Rule Learning for Resource Limited MT" 2002, Levin, L., R. Vega, J. Carbonell, R. Brown, A. Lavie, E. Canulef and C. Huenchullan. "Data Collection and Language Technologies for Mapudungun". In Proceedings of International Workshop on Resources and Tools in Field Linguistics at the Third International Conference on Language Resources and Evaluation (LREC- 2002), Las Palmas, Canary Islands, Spain, June 2002.Levin, L., R. Vega, J. Carbonell, R. Brown, A. Lavie, E. Canulef and C. Huenchullan. "Data Collection and Language Technologies for Mapudungun" 2001, Probst, K., R. Brown, J. Carbonell, A. Lavie, L. Levin, and E. Peterson. "Design and Implementation of Controlled Elicitation for Machine Translation of Low-density Languages". In Proceedings of the MT-2010 Workshop at MT-Summit VIII, Santiago de Compostela, Spain, September Probst, K., R. Brown, J. Carbonell, A. Lavie, L. Levin, and E. Peterson. "Design and Implementation of Controlled Elicitation for Machine Translation of Low-density Languages"

Mar 1, 2006AVENUE/LETRAS99 Mapudungun-to-Spanish Example Mapudungun pelafiñ Maria Spanish No vi a María English I didn’t see Maria

Mar 1, 2006AVENUE/LETRAS100 Mapudungun-to-Spanish Example Mapudungun pelafiñ Maria pe-la-fi-ñMaria see-neg-3.obj-1.subj.indicativeMaria Spanish No vi a María negsee.1.subj.past.indicativeaccMaria English I didn’t see Maria

Mar 1, 2006AVENUE/LETRAS101 V pe pe-la-fi-ñ Maria

Mar 1, 2006AVENUE/LETRAS102 V pe pe-la-fi-ñ Maria VSuff la Negation = +

Mar 1, 2006AVENUE/LETRAS103 V pe pe-la-fi-ñ Maria VSuff la VSuffG Pass all features up

Mar 1, 2006AVENUE/LETRAS104 V pe pe-la-fi-ñ Maria VSuff la VSuffG VSuff fi object person = 3

Mar 1, 2006AVENUE/LETRAS105 V pe pe-la-fi-ñ Maria VSuff la VSuffGVSuff fi VSuffG Pass all features up from both children

Mar 1, 2006AVENUE/LETRAS106 V pe pe-la-fi-ñ Maria VSuff la VSuffGVSuff fi VSuffGVSuff ñ person = 1 number = sg mood = ind

Mar 1, 2006AVENUE/LETRAS107 V pe pe-la-fi-ñ Maria VSuff la VSuffGVSuff fi VSuffGVSuff ñ Pass all features up from both children VSuffG

Mar 1, 2006AVENUE/LETRAS108 V V pe pe-la-fi-ñ Maria VSuff la VSuffGVSuff fi VSuffGVSuff ñ Pass all features up from both children VSuffG Check that: 1) negation = + 2) tense is undefined

Mar 1, 2006AVENUE/LETRAS109 V pe pe-la-fi-ñ Maria VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG V NP N Maria N person = 3 number = sg human = +

Mar 1, 2006AVENUE/LETRAS110 Pass features up from V pe pe-la-fi-ñ Maria VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V Check that NP is human = + V VP

Mar 1, 2006AVENUE/LETRAS111 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S

Mar 1, 2006AVENUE/LETRAS112 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V Pass all features to Spanish side

Mar 1, 2006AVENUE/LETRAS113 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V Pass all features down

Mar 1, 2006AVENUE/LETRAS114 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V Pass object features down

Mar 1, 2006AVENUE/LETRAS115 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V Accusative marker on objects is introduced because human = +

Mar 1, 2006AVENUE/LETRAS116 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V VP::VP [VBar NP] -> [VBar "a" NP] ((X1::Y1) (X2::Y3) ((X2 type) = (*NOT* personal)) ((X2 human) =c +) (X0 = X1) ((X0 object) = X2) (Y0 = X0) ((Y0 object) = (X0 object)) (Y1 = Y0) (Y3 = (Y0 object)) ((Y1 objmarker person) = (Y3 person)) ((Y1 objmarker number) = (Y3 number)) ((Y1 objmarker gender) = (Y3 ender)))

Mar 1, 2006AVENUE/LETRAS117 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V V“no” Pass person, number, and mood features to Spanish Verb Assign tense = past

Mar 1, 2006AVENUE/LETRAS118 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V V“no” Introduced because negation = +

Mar 1, 2006AVENUE/LETRAS119 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V V“no” ver

Mar 1, 2006AVENUE/LETRAS120 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V V“no” ver vi person = 1 number = sg mood = indicative tense = past

Mar 1, 2006AVENUE/LETRAS121 V pe Transfer to Spanish: Top-Down VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V V“no” vi N María N Pass features over to Spanish side

Mar 1, 2006AVENUE/LETRAS122 V pe I Didn’t see Maria VSuff la VSuffGVSuff fi VSuffGVSuff ñ VSuffG NP N Maria N S V VP S NP“a” V V“no” vi N María N

Mar 1, 2006AVENUE/LETRAS123