Machine Translation with Scarce Resources The Avenue Project.

Slides:



Advertisements
Similar presentations
Present Progressive The present progressive is formed by combining the verb "to be" with the present participle. (The present participle is merely the.
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Dual Immersion Improving Educational Opportunities for All Students While Creating Global Citizens.
The current status of Chinese- English EBMT -where are we now Joy (Ying Zhang) Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
Flow Network Models for Sub-Sentential Alignment Ying Zhang (Joy) Advisor: Ralf Brown Dec 18 th, 2001.
EBMT1 Example Based Machine Translation as used in the Pangloss system at Carnegie Mellon University Dave Inman.
NICE: Native language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown Carnegie Mellon University.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
Automatic Rule Learning for Resource-Limited Machine Translation Alon Lavie, Katharina Probst, Erik Peterson, Jaime Carbonell, Lori Levin, Ralf Brown Language.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
Seminar on Endangered Languages Alan W Black, Robert Frederking, Lori Levin, Laura Tomokiyo Language Technologies.
4-12 Jan First Latin American SCAT Workshop Universidad T. F. Santa Maria, Valparaiso, Chile 0 First Latin American SCAT Workshop: Advanced Scientific.
Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication.
State of Texas Assessments of Academic Readiness (STAAR TM ) ELL ASSESSMENT UPDATE.
April 20023CSG11 Electronic Commerce Design (1) John Wordsworth Department of Computer Science The University of Reading Room.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Building NLP Systems for Two Resource Scarce Indigenous Languages: Mapudungun and Quechua, and some other languages Christian Monson, Ariadna Font Llitjós,
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Data Elicitation for AVENUE Lori Levin Alison Alvarez Jeff Good (MPI Leipzig) Bob Frederking Erik Peterson Language Technologies Institute Carnegie Mellon.
CHILE 1 Adaptation process of an evidence based model of investigative interviewing: the Chilean case* Carolina Navarro, Decio Mettifogo, Andrés.
Eliciting Features from Minor Languages The elicitation tool provides a simple interface for bilingual informants with no linguistic training and limited.
STAAR & STAAR EOC Presented by: Jamie Hicks & Christina Trotter Humble ISD Coordinator of Student Assessments Updates for ELL’s.
INTERNATIONAL INSTITUTE FOR GEO-INFORMATION SCIENCE AND EARTH OBSERVATION “Think Global, Act Local” Communicating Indigenous Technical Knowledge to a Global.
Overview of the Language Technologies Institute and AVENUE Project Jaime Carbonell, Director March 2, 2002.
Terminology Across Borders : A Partnership to Build a Bilingual Tool for the Americas Lori J. Finch, Thesaurus Coordinator National Agricultural Library,
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Multi-Engine MT for Quick MT. Missing Technology for Quick MT LingWear ISI MT NICE Core Rapid MT - Multi-Engine MT - Omnivorous resource usage - Pervasive.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University.
Documenting Endangered Languages A Partnership between the National Endowment for the Humanities and the National Science Foundation.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Carnegie Mellon Goal Recycle non-expert post-editing efforts to: - Refine translation rules automatically - Improve overall translation quality Proposed.
Data Collection and Language Technologies for Mapudungun Lori Levin, Rodolfo Vega, Jaime Carbonell, Ralf Brown, Alon Lavie Language Technologies Institute.
The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.
Overview of the Language Technologies Institute and AVENUE Project Jaime Carbonell, Director March 2, 2002.
Heritage Language Academy: impacting families and children in culturally relevant ways Jordi and Josie Roman Asheboro City Schools.
Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer.
An Overview of the AVENUE Project Presented by Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh,
Designing a Machine Translation Project Lori Levin and Alon Lavie Language Technologies Institute Carnegie Mellon University CATANAL Planning Meeting Barrow,
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
Language Technologies Institute School of Computer Science Carnegie Mellon University NSF, August 6, 2001 Machine Translation for Indigenous Languages.
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
金聲玉振 Taiwan Univ. & Academia Sinica 1 Spoken Dialogue in Information Retrieval Jia-lin Shen Oct. 22, 1998.
Semi-Automated Elicitation Corpus Generation The elicitation tool provides a simple interface for bilingual informants with no linguistic training and.
NICE: Native Language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown, Erik Peterson, Katharina Probst,
Data Elicitation for AVENUE By: Alison Alvarez Lori Levin Bob Frederking Jeff Good (MPI Leipzig) Erik Peterson.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Language Technologies Capability Demonstration Alon Lavie, Lori Levin, Alex Waibel Language Technologies Institute Carnegie Mellon University CATANAL Planning.
Bridging the Gap: Machine Translation for Lesser Resourced Languages
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
AVENUE: Machine Translation for Resource-Poor Languages NSF ITR
Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September.
FROM BITS TO BOTS: Women Everywhere, Leading the Way Lenore Blum, Anastassia Ailamaki, Manuela Veloso, Sonya Allin, Bernardine Dias, Ariadna Font Llitjós.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Inter-disciplines and applied linguistics. Inter-disciplines: Sociolinguistics looks at how language is used in a social context, e.g. –language use and.
 The most widely-spoken language (as first, second, third language) in the world is… ◦ English.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
Background of the NICE Project Lori Levin Jaime Carbonell Alon Lavie Ralf Brown.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
The AVENUE Project: Automatic Rule Learning for Resource-Limited Machine Translation Faculty: Alon Lavie, Jaime Carbonell, Lori Levin, Ralf Brown Students:
Eliciting a corpus of word-aligned phrases for MT
Copyright © American Speech-Language-Hearing Association
ELEMENTARY CURRICULUM OVERVIEW BILINGUAL PROGRAM
Electronic Commerce Design (1)
Presentation transcript:

Machine Translation with Scarce Resources The Avenue Project

Scarce Resources Not much text in electronic form. Very few linguists who can write computational rules. No standard orthography –Kudaw, kusaw (work) (Mapudungun, Chile) –Not even sure of pronunciation: EH-nvelope, AH-nvelope (envelope) (English, US, not a language with scarce resources)

Our Approach Learn rules from a controlled corpus. Corpus is elicited from bilingual speakers. The informant only needs to translate and align words.

AVENUE Project New Ideas Use machine learning to learn translation rules from native speakers who are not trained in linguistics or computer science. Multi-Engine translation architecture can flexibly take advantage of whatever resources are available. Research partnerships with indigenous communities in Latin America and Alaska ( Mapudungun (Chile), Siona (Colombia), Inupiaq (Alaska)) Carnegie Mellon University, Language Technologies Institute: L. Levin, J. Carbonell, A. Lavie, R. Brown Impact Rapid and low-cost development of machine translation for languages with scarce resources. Policy makers can get input from indigenous people. Indigenous people can participate in government and internet. Schedule Year 1: Seeded Version Space learning– first version Year 2: Example-Based Machine Translation of Mapudungun (Chile). Year 3: Multi-Engine Mapudungun system (EBMT and partially learned transfer rules) Interface for data elicitation

Elicitation Interface

Elicitation Corpus: example English : I fell. Spanish: Caí Mapudungun: Tranün English: I am falling. Spanish: Estoy cayendo Mapudungun: Tranmeken

Elicitation Corpus: example English: You (John) fell. Spanish: Tu (Juan) caiste Mapudungun: Eymi tranimi (Kuan) English: You (Mary) fell. Spanish: Tu (María) caiste Mapudungun: Eymi tranimi (Maria) English: The rock fell. Spanish: La piedra cayó Mapudungun: Trani chi kura