FROM BITS TO BOTS: Women Everywhere, Leading the Way Lenore Blum, Anastassia Ailamaki, Manuela Veloso, Sonya Allin, Bernardine Dias, Ariadna Font Llitjós School of Computer Science Carnegie Mellon University
AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute
Automatic Machine Translation Interlingua Transfer rules Corpus-based methods analysis interpretation generation
Low-density languages Not endangered languages, but languages with little or no presence in the web, little or no linguistic resources AVENUE is currently working with: –Mapudungun [Chile] –Inupiaq [Alaska] –Aymara, Quechua and Aguaruna [Peru] –Siona [Colombia]
Mapudungun for the Mapuche Chile Official Language: Spanish Population: ~15 million ~1/2 million Mapuche people Language: Mapudungun
The language: Mapudungun Oral tradition (170 hours of recorded speech in the medical domain) Just a few written texts exist Need to standardize the alphabet, determine phoneme set and writing rules, develop an electronic dictionary We provide them with linguistic and technical advice + tools such as a morphological analyzer, parser and ultimately an MTS We work in collaboration with a local team in Temuco
Our last meeting in Temuco, May 2002
New approach to MT Fully automatic (no human intervention) Very little electronic data available elicitation corpus Machine learning techniques –Seeded version space algorithm to automatically learn transfer rules –Interactive and Automatic refinement of Transfer rules
Elicitation corpus sample … \spa Una mujer se quedó en casa \map Kie domo mlewey ruka mew \eng One woman stayed at home. \spa V una mujer \map Pen kie domo \eng I saw one woman. \spa Hay suficiente comida para una mujer \map Mley iagel i yochiluwam kie domo \eng There is enough food for one woman. …
Automatic Learning of a Transfer-based MTS Elicitation corpus SVS algorithm Transfer module tentative Transfer rules Rule Refinement module SL sentences (tentative) TL sentences Kathrin Probst Erik Peterson Ariadna Font
Interactive and Automatic rule refinement 1. Given an MTS, translate sentences and present them to the users for minimal correction (interface design, MT error classification) 2. Determine blame assignment 3. Structure learning, as opposed to binary feedback, to automatically refine the existing rules
Interactive Learning Translation Correction Tool, web application Bilingual informants (no knowledge of linguistics assumed) User-friendly and Intuitive interface Can naïve users reliably pinpoint the source of errors? MT error classification realistic? Need of user studies: –Spanish - English –English - Spanish –English - Chinese
User studies snapshot
Structure learning Given user feedback (correction + error classification) and blame assignment, modify the appropriate transfer rule(s) to obtain correct translation Need to evaluate based on cross- validation, number of sentences it can translate correctly (elicitation corpus) Learn mapping between incorrect structures and correct structures: She saw high woman She saw the tall woman
A simple example Spanish SLS: Ella vio a la mujer alta English TLS: She saw high woman Corrected TLS: She saw the tall woman MT error classification: missing determiner + wrong lexical selection Blame assignment (NP rule that generated the direct object + selectional restrictions) Rule refinement: the Noun Phrase (NP) rule that generated the error: NP -> Adj N needs to be refined into 2 different cases: NP -> Det Adj N[sg] (the tall woman) NP -> (Det) Adj N[pl] ((the)? tall women)
Blame assignment Once an MT error has been detected, need to trace back, which rule generated it. Transfer module has a trace option built-in Some errors might be due to interferences between rules
AVENUE project members LTI team: Researchers Ph. D. students Jaime Carbonell Ariadna Font Llitjós Lori Levin Christian Monson Alon Lavie Erik Peterson Ralf Brown Katharina Probst Avenue External Project Coordinator Rodolfo M Vega, Chilean team: Eliseo Cañulef Luis Caniupil Huaiquiñir Hugo Carrasco Marcela Collio Calfunao Rosendo Huisca Cristian Carrillan Anton Hector Painequeo Salvador Cañulef Flor Caniupil Claudio Millacura
Thanks! For more information: