FROM BITS TO BOTS: Women Everywhere, Leading the Way Lenore Blum, Anastassia Ailamaki, Manuela Veloso, Sonya Allin, Bernardine Dias, Ariadna Font Llitjós.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Machine Translation II How MT works Modes of use.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
The current status of Chinese- English EBMT -where are we now Joy (Ying Zhang) Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
Flow Network Models for Sub-Sentential Alignment Ying Zhang (Joy) Advisor: Ralf Brown Dec 18 th, 2001.
NICE: Native language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown Carnegie Mellon University.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
SB Program University of Jyväskylä Machine Translation Research Seminar on Software Business Antti Ilmo.
Automatic Rule Learning for Resource-Limited Machine Translation Alon Lavie, Katharina Probst, Erik Peterson, Jaime Carbonell, Lori Levin, Ralf Brown Language.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Machine Translation with Scarce Resources The Avenue Project.
Evaluating an MT French / English System Widad Mustafa El Hadi Ismaïl Timimi Université de Lille III Marianne Dabbadie LexiQuest - Paris.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
Seminar on Endangered Languages Alan W Black, Robert Frederking, Lori Levin, Laura Tomokiyo Language Technologies.
Collaborative Cross-Language Search Douglas W. Oard University of Maryland, College Park May 14, 2015SICS Workshop.
Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication.
MACHINE TRANSLATION TRANSLATION(5) LECTURE[1-1] Eman Baghlaf.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Eliciting Features from Minor Languages The elicitation tool provides a simple interface for bilingual informants with no linguistic training and limited.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Overview of the Language Technologies Institute and AVENUE Project Jaime Carbonell, Director March 2, 2002.
Carnegie Mellon School of Computer Science Copyright © 2001, Carnegie Mellon. All Rights Reserved. JAVELIN Project Briefing 1 AQUAINT Phase I Kickoff December.
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Multi-Engine MT for Quick MT. Missing Technology for Quick MT LingWear ISI MT NICE Core Rapid MT - Multi-Engine MT - Omnivorous resource usage - Pervasive.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Improving Statistical Machine Translation by Means of Transfer Rules Nurit Melnik.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Carnegie Mellon Goal Recycle non-expert post-editing efforts to: - Refine translation rules automatically - Improve overall translation quality Proposed.
Data Collection and Language Technologies for Mapudungun Lori Levin, Rodolfo Vega, Jaime Carbonell, Ralf Brown, Alon Lavie Language Technologies Institute.
Overview of the Language Technologies Institute and AVENUE Project Jaime Carbonell, Director March 2, 2002.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
An Overview of the AVENUE Project Presented by Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh,
Designing a Machine Translation Project Lori Levin and Alon Lavie Language Technologies Institute Carnegie Mellon University CATANAL Planning Meeting Barrow,
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
Language Technologies Institute School of Computer Science Carnegie Mellon University NSF, August 6, 2001 Machine Translation for Indigenous Languages.
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Semi-Automated Elicitation Corpus Generation The elicitation tool provides a simple interface for bilingual informants with no linguistic training and.
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
NICE: Native Language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown, Erik Peterson, Katharina Probst,
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Bridging the Gap: Machine Translation for Lesser Resourced Languages
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
Seed Generation and Seeded Version Space Learning Version 0.02 Katharina Probst Feb 28,2002.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
AVENUE: Machine Translation for Resource-Poor Languages NSF ITR
Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
Background of the NICE Project Lori Levin Jaime Carbonell Alon Lavie Ralf Brown.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
English-Lithuanian-English Lexicon Database Management System for MT Gintaras Barisevicius and Elvinas Cernys Kaunas University of Technology, Department.
The AVENUE Project: Automatic Rule Learning for Resource-Limited Machine Translation Faculty: Alon Lavie, Jaime Carbonell, Lori Levin, Ralf Brown Students:
Eliciting a corpus of word-aligned phrases for MT
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
Ariadna Font Llitjós March 10, 2004
Alon Lavie, Jaime Carbonell, Lori Levin,
Presentation transcript:

FROM BITS TO BOTS: Women Everywhere, Leading the Way Lenore Blum, Anastassia Ailamaki, Manuela Veloso, Sonya Allin, Bernardine Dias, Ariadna Font Llitjós School of Computer Science Carnegie Mellon University

AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute

Automatic Machine Translation Interlingua Transfer rules Corpus-based methods analysis interpretation generation

Low-density languages Not endangered languages, but languages with little or no presence in the web, little or no linguistic resources AVENUE is currently working with: –Mapudungun [Chile] –Inupiaq [Alaska] –Aymara, Quechua and Aguaruna [Peru] –Siona [Colombia]

Mapudungun for the Mapuche Chile Official Language: Spanish Population: ~15 million ~1/2 million Mapuche people Language: Mapudungun

The language: Mapudungun Oral tradition (170 hours of recorded speech in the medical domain) Just a few written texts exist Need to standardize the alphabet, determine phoneme set and writing rules, develop an electronic dictionary We provide them with linguistic and technical advice + tools such as a morphological analyzer, parser and ultimately an MTS We work in collaboration with a local team in Temuco

Our last meeting in Temuco, May 2002

New approach to MT Fully automatic (no human intervention) Very little electronic data available elicitation corpus Machine learning techniques –Seeded version space algorithm to automatically learn transfer rules –Interactive and Automatic refinement of Transfer rules

Elicitation corpus sample … \spa Una mujer se quedó en casa \map Kie domo mlewey ruka mew \eng One woman stayed at home. \spa V una mujer \map Pen kie domo \eng I saw one woman. \spa Hay suficiente comida para una mujer \map Mley iagel i yochiluwam kie domo \eng There is enough food for one woman. …

Automatic Learning of a Transfer-based MTS Elicitation corpus SVS algorithm Transfer module tentative Transfer rules Rule Refinement module SL sentences (tentative) TL sentences Kathrin Probst Erik Peterson Ariadna Font

Interactive and Automatic rule refinement 1. Given an MTS, translate sentences and present them to the users for minimal correction (interface design, MT error classification) 2. Determine blame assignment 3. Structure learning, as opposed to binary feedback, to automatically refine the existing rules

Interactive Learning Translation Correction Tool, web application Bilingual informants (no knowledge of linguistics assumed) User-friendly and Intuitive interface Can naïve users reliably pinpoint the source of errors? MT error classification realistic? Need of user studies: –Spanish - English –English - Spanish –English - Chinese

User studies snapshot

Structure learning Given user feedback (correction + error classification) and blame assignment, modify the appropriate transfer rule(s) to obtain correct translation Need to evaluate based on cross- validation, number of sentences it can translate correctly (elicitation corpus) Learn mapping between incorrect structures and correct structures: She saw  high woman She saw the tall woman

A simple example Spanish SLS: Ella vio a la mujer alta English TLS: She saw high woman Corrected TLS: She saw the tall woman MT error classification: missing determiner + wrong lexical selection Blame assignment (NP rule that generated the direct object + selectional restrictions) Rule refinement: the Noun Phrase (NP) rule that generated the error: NP -> Adj N needs to be refined into 2 different cases: NP -> Det Adj N[sg] (the tall woman) NP -> (Det) Adj N[pl] ((the)? tall women)

Blame assignment Once an MT error has been detected, need to trace back, which rule generated it. Transfer module has a trace option built-in Some errors might be due to interferences between rules

AVENUE project members LTI team: Researchers Ph. D. students Jaime Carbonell Ariadna Font Llitjós Lori Levin Christian Monson Alon Lavie Erik Peterson Ralf Brown Katharina Probst Avenue External Project Coordinator Rodolfo M Vega, Chilean team: Eliseo Cañulef Luis Caniupil Huaiquiñir Hugo Carrasco Marcela Collio Calfunao Rosendo Huisca Cristian Carrillan Anton Hector Painequeo Salvador Cañulef Flor Caniupil Claudio Millacura

Thanks! For more information: