Download presentation
Presentation is loading. Please wait.
1
Machine Translation with Scarce Resources The Avenue Project
2
Scarce Resources Not much text in electronic form. Very few linguists who can write computational rules. No standard orthography –Kudaw, kusaw (work) (Mapudungun, Chile) –Not even sure of pronunciation: EH-nvelope, AH-nvelope (envelope) (English, US, not a language with scarce resources)
3
Our Approach Learn rules from a controlled corpus. Corpus is elicited from bilingual speakers. The informant only needs to translate and align words.
4
AVENUE Project New Ideas Use machine learning to learn translation rules from native speakers who are not trained in linguistics or computer science. Multi-Engine translation architecture can flexibly take advantage of whatever resources are available. Research partnerships with indigenous communities in Latin America and Alaska ( Mapudungun (Chile), Siona (Colombia), Inupiaq (Alaska)) Carnegie Mellon University, Language Technologies Institute: L. Levin, J. Carbonell, A. Lavie, R. Brown Impact Rapid and low-cost development of machine translation for languages with scarce resources. Policy makers can get input from indigenous people. Indigenous people can participate in government and internet. Schedule Year 1: Seeded Version Space learning– first version Year 2: Example-Based Machine Translation of Mapudungun (Chile). Year 3: Multi-Engine Mapudungun system (EBMT and partially learned transfer rules) Interface for data elicitation
5
Elicitation Interface
6
Elicitation Corpus: example English : I fell. Spanish: Caí Mapudungun: Tranün English: I am falling. Spanish: Estoy cayendo Mapudungun: Tranmeken
7
Elicitation Corpus: example English: You (John) fell. Spanish: Tu (Juan) caiste Mapudungun: Eymi tranimi (Kuan) English: You (Mary) fell. Spanish: Tu (María) caiste Mapudungun: Eymi tranimi (Maria) English: The rock fell. Spanish: La piedra cayó Mapudungun: Trani chi kura
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.