AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Advertisements

Machine Translation II How MT works Modes of use.
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Help communities share knowledge more effectively across the language barrier Automated Community Content Editing PorTal.
Multilingual Information Access in a Digital Library Vamshi Ambati, Rohini U, Pramod, N Balakrishnan and Raj Reddy International Institute of Information.
The current status of Chinese- English EBMT -where are we now Joy (Ying Zhang) Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
Flow Network Models for Sub-Sentential Alignment Ying Zhang (Joy) Advisor: Ralf Brown Dec 18 th, 2001.
NICE: Native language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown Carnegie Mellon University.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
Automatic Rule Learning for Resource-Limited Machine Translation Alon Lavie, Katharina Probst, Erik Peterson, Jaime Carbonell, Lori Levin, Ralf Brown Language.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Machine Translation with Scarce Resources The Avenue Project.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication.
MACHINE TRANSLATION TRANSLATION(5) LECTURE[1-1] Eman Baghlaf.
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
Programming Languages: Telling the Computers What to Do Chapter 16.
The purpose of this Software Requirements Specification document is to clearly define the system under development, that is, the International Etruscan.
IATE EU tool for translation-oriented terminology work
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Globalisation and machine translation Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate.
FishBase Summary Page about Salmo salar in the standard Language of FishBase (English) ENBI-WP-11: Multilingual Access to European Biodiversity Sites through.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
Eliciting Features from Minor Languages The elicitation tool provides a simple interface for bilingual informants with no linguistic training and limited.
Overview of the Language Technologies Institute and AVENUE Project Jaime Carbonell, Director March 2, 2002.
PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.
Carnegie Mellon School of Computer Science Copyright © 2001, Carnegie Mellon. All Rights Reserved. JAVELIN Project Briefing 1 AQUAINT Phase I Kickoff December.
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Multi-Engine MT for Quick MT. Missing Technology for Quick MT LingWear ISI MT NICE Core Rapid MT - Multi-Engine MT - Omnivorous resource usage - Pervasive.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Translation Memory System (TMS)1 Translation Memory Systems Presentation by1 Melina Takanen & Julianna Ekert CAT Prof. Thorsten Trippel University.
Copenhagen, 6 June 2006 EC CHM Multilinguality Anton Cupcea Finsiel Romania.
Carnegie Mellon Goal Recycle non-expert post-editing efforts to: - Refine translation rules automatically - Improve overall translation quality Proposed.
Data Collection and Language Technologies for Mapudungun Lori Levin, Rodolfo Vega, Jaime Carbonell, Ralf Brown, Alon Lavie Language Technologies Institute.
Overview of the Language Technologies Institute and AVENUE Project Jaime Carbonell, Director March 2, 2002.
1 The Internal Market Information System (IMI) A flexible tool for administrative cooperation Brussels, 12 October 2010.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
An Overview of the AVENUE Project Presented by Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh,
Designing a Machine Translation Project Lori Levin and Alon Lavie Language Technologies Institute Carnegie Mellon University CATANAL Planning Meeting Barrow,
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
Language Technologies Institute School of Computer Science Carnegie Mellon University NSF, August 6, 2001 Machine Translation for Indigenous Languages.
Introduction to the European Union. The European Union Foundation Purpose.
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Semi-Automated Elicitation Corpus Generation The elicitation tool provides a simple interface for bilingual informants with no linguistic training and.
How Are Computers Programmed? CPS120: Introduction to Computer Science Lecture 5.
NICE: Native Language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown, Erik Peterson, Katharina Probst,
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Bridging the Gap: Machine Translation for Lesser Resourced Languages
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
AVENUE: Machine Translation for Resource-Poor Languages NSF ITR
Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September.
FROM BITS TO BOTS: Women Everywhere, Leading the Way Lenore Blum, Anastassia Ailamaki, Manuela Veloso, Sonya Allin, Bernardine Dias, Ariadna Font Llitjós.
Software. Introduction n A computer can’t do anything without a program of instructions. n A program is a set of instructions a computer carries out.
Developing GRID Applications GRACE Project
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
Background of the NICE Project Lori Levin Jaime Carbonell Alon Lavie Ralf Brown.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
The AVENUE Project: Automatic Rule Learning for Resource-Limited Machine Translation Faculty: Alon Lavie, Jaime Carbonell, Lori Levin, Ralf Brown Students:
Eliciting a corpus of word-aligned phrases for MT
Language Technologies Institute Carnegie Mellon University
Ariadna Font Llitjós March 10, 2004
European Network of e-Lexicography
Designing and Debugging Batch and Interactive COBOL Programs
Multilingual Information Access in a Digital Library
Presentation transcript:

AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University

2 HCI project proposals Interface to Online Bilingual and Multilingual Dictionaries Translation Correction Tool interface: design, implementation and user studies

Online Bilingual and Multilingual Dictionaries bilingual and multilingual dictionaries for indigenous languages (Mapudungun [Chile], Inupiaq [Alaska], Aymara, Quechua and Aguaruna [Peru]) For each bilingual/multilingual dictionary, we (will) have an excel database created by the local teams (Mapudungun: from a spoken corpus transcribed and translated into Spanish)

Online Bilingual and Multilingual Dictionaries (cont.) For each entry, we give the translation in Spanish, some other linguistic information (POS), and a link to the actual sentence where it appears in the corpus. For example: Püñpüñkünuukey: se manifiesta en forma de ronchas nmlch-nmfhp1_x_0031_nmfhp_00 Mapu: Fey itrofillpüle kuerpu, ta pichike püñpüñkünuukey ta kalül may, peñi. Sp: Así es en todas partes del cuerpo, pequeñas ronchas se forman en el cuerpo pues, hermano

Online Bilingual and Multilingual Dictionaries (cont.) Currently, users can search for: –Mapudungun words –Spanish words –all the words starting with a letter –all the words containing a word or a string of characters

Online Bilingual and Multilingual Dictionaries (cont.) Primary users: –people in the indigenous communities –researchers in these countries, inside and outside the indigenous communities Chilean case: -product of the Ministry of Education. -students and teachers, mostly Mapuche, but maybe some Spanish users as well

Online Bilingual and Multilingual Dictionaries (cont.) Secondary users –Linguistic, Lexicography and Anthropology researchers from all over the world –random people browsing the www

Online Dictionaries: Tasks for HCII project analyze design of the basic web interface given a query for a word in either language, it presents the information for that entry to the user in the other language. how to incorporate an audio file with the word as it was pronounced in the spoken corpus. how to make it interactive, i.e. have bilingual users comment on the entries and possibly add new entries (need profile info)

Translation Correction Tool (TCTool) AVENUE is a project which develops Automatic Machine Translation Systems for low-density languages Since translations are automatic, i.e. not perfect, we need to refine them. instead of having a professional translator, we want to find an automatic way to refine the output of the MTS -> TCTool

TCTool We can use the TCTool to automatically learn a refinement of the Transfer rules in our MTS, from users input Challenges: –users most likely not familiar with computers -> user-friendly and Intuitive interface –bilingual informants can’t be assumed to have any linguistic knowledge

Automatic Machine Translation Interlingua Transfer rules Corpus-based methods analysis interpretation generation

Automatic Learning of a Transfer-based MTS Elicitation corpus SVS algorithm Transfer module tentative Transfer rules Rule Refinement module SL sentences (tentative) TL sentences

Interactive and Automatic rule refinement Interactive step (TCTool): Given an MTS, translate sentences and present them to the users for minimal correction (interface design, MT error classification) Automatic step: Machine learning DS and algorithms to map user input with refined transfer-rules

User studies snapshot

TCTool: Tasks for HCII project analyze design of the basic web interface given a translated sentence, it asks the user to minimally correct it, if incorrect, and to classify the error(s). how to explain what minimally correction is what is the right error classification for non- expert and non-linguist users Can naïve users reliably pinpoint the source of errors? design User Studies to show reliability of user input (Spanish – English, English – Spanish, English – Chinese)

AVENUE project members LTI team: Researchers Ph. D. students Jaime Carbonell Ariadna Font Llitjós Lori Levin Christian Monson Alon Lavie Erik Peterson Ralf Brown Katharina Probst Avenue External Project Coordinator Rodolfo M Vega, Chilean team: Eliseo Cañulef Luis Caniupil Huaiquiñir Hugo Carrasco Marcela Collio Calfunao Rosendo Huisca Cristian Carrillan Anton Hector Painequeo Salvador Cañulef Flor Caniupil Claudio Millacura

Questions? For more information: