Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.

Slides:



Advertisements
Similar presentations
Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
CSA2050: DCG I1 CSA2050 Introduction to Computational Linguistics Lecture 8 Definite Clause Grammars.
Introducing COMPARA The Portuguese-English Parallel Corpus Ana Frankenberg-Garcia ISLA, Lisbon & Diana Santos SINTEF, Oslo.
How do we work in a virtual multilingual classroom? A virtual multilingual classroom with Moodle and Apertium Cultural and Linguistic Practices in the.
Enabling MT for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University.
NICE: Native language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown Carnegie Mellon University.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
Automatic Rule Learning for Resource-Limited Machine Translation Alon Lavie, Katharina Probst, Erik Peterson, Jaime Carbonell, Lori Levin, Ralf Brown Language.
Evaluating an MT French / English System Widad Mustafa El Hadi Ismaïl Timimi Université de Lille III Marianne Dabbadie LexiQuest - Paris.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication.
MACHINE TRANSLATION TRANSLATION(5) LECTURE[1-1] Eman Baghlaf.
© 2014 The MITRE Corporation. All rights reserved. Stacey Bailey and Keith Miller On the Value of Machine Translation Adaptation LREC Workshop: Automatic.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Building NLP Systems for Two Resource Scarce Indigenous Languages: Mapudungun and Quechua, and some other languages Christian Monson, Ariadna Font Llitjós,
Eliciting Features from Minor Languages The elicitation tool provides a simple interface for bilingual informants with no linguistic training and limited.
Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Statistical XFER: Hybrid Statistical Rule-based Machine Translation Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System Alon Lavie Language Technologies Institute Carnegie Mellon University.
Rule Learning - Overview Goal: Syntactic Transfer Rules 1) Flat Seed Generation: produce rules from word- aligned sentence pairs, abstracted only to POS.
AMTEXT: Extraction-based MT for Arabic Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Laura Kieras, Peter Jansen Informant: Loubna El Abadi.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University.
AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Carnegie Mellon Goal Recycle non-expert post-editing efforts to: - Refine translation rules automatically - Improve overall translation quality Proposed.
Data Collection and Language Technologies for Mapudungun Lori Levin, Rodolfo Vega, Jaime Carbonell, Ralf Brown, Alon Lavie Language Technologies Institute.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Hebrew-to-English XFER MT Project - Update Alon Lavie June 2, 2004.
An Overview of the AVENUE Project Presented by Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh,
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
Automating Post-Editing To Improve MT Systems Ariadna Font Llitjós and Jaime Carbonell APE Workshop, AMTA – Boston August 12, 2006.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Semi-Automated Elicitation Corpus Generation The elicitation tool provides a simple interface for bilingual informants with no linguistic training and.
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
The CMU Mill-RADD Project: Recent Activities and Results Alon Lavie Language Technologies Institute Carnegie Mellon University.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Objective: Enabling students to translate from English into Arabic and vice versa. Why teach translation: It develops accuracy, fluency, clarity, and.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Bridging the Gap: Machine Translation for Lesser Resourced Languages
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
CMU Statistical-XFER System Hybrid “rule-based”/statistical system Scaled up version of our XFER approach developed for low-resource languages Large-coverage.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
Seed Generation and Seeded Version Space Learning Version 0.02 Katharina Probst Feb 28,2002.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
AVENUE: Machine Translation for Resource-Poor Languages NSF ITR
Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September.
FROM BITS TO BOTS: Women Everywhere, Leading the Way Lenore Blum, Anastassia Ailamaki, Manuela Veloso, Sonya Allin, Bernardine Dias, Ariadna Font Llitjós.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
The AVENUE Project: Automatic Rule Learning for Resource-Limited Machine Translation Faculty: Alon Lavie, Jaime Carbonell, Lori Levin, Ralf Brown Students:
Eliciting a corpus of word-aligned phrases for MT
Faculty: Alon Lavie, Jaime Carbonell, Lori Levin, Ralf Brown Students:
Ariadna Font Llitjós March 10, 2004
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Alon Lavie, Jaime Carbonell, Lori Levin,
Towards Interactive and Automatic Refinement of Translation Rules
Towards Interactive and Automatic Refinement of Translation Rules
Towards Interactive and Automatic Refinement of Translation Rules
Towards Interactive and Automatic Refinement of Translation Rules
Presentation transcript:

Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Morphology Analyzer Learning Module Handcrafted rules INPUT TEXT OUTPUT TEXT

Interactive and Automatic Refinement of translation Rules Problem: Improve Machine Translation Quality. Proposed Solution: Put bilingual speakers back into the loop; use their corrections to detect the source of the error and automatically improve the lexicon and the grammar. Approach: Automate post-editing efforts by feeding them back into the MT system.  Automatic refinement of translation rules that caused an error beyond post-editing. Goal: Improve MT coverage and overall quality.

Technical Challenges Elicit minimal MT information from non-expert users Automatically Refine and Expand Translation Rules minimally Manually written Automatically Learned Automatic Evaluation of Refinement process

Error Typology for Automatic Rule Refinement (simplified) Missing word Extra word Wrong word order Incorrect word Wrong agreement Interactive elicitation of error information Local vs Long distance Word vs. phrase + Word change Sense Form Selectional restrictions Idiom Missing constraint Extra constraint

TCTool (Demo)Demo Add a word Delete a word Modify a word Change word order Actions: Interactive elicitation of error information precisionrecall error detection90%89% error classification72%71%

1. Refine a translation rule: R0  R1 (change R0 to make it more specific or more general) Types of Refinement Operations Automatic Rule Adaptation R0: R1: NP DET N ADJ NP DET ADJ N a nice house una casa bonito NP DET N ADJ NP DET ADJ N a nice house una casa bonita N gender = ADJ gender

2. Bifurcate a translation rule: R0  R0 (same, general rule)  R1 (add a new more specific rule) Types of Refinement Operations Automatic Rule Adaptation R0: NP DET N ADJ NP DET ADJ N NP DET ADJ N NP DET ADJ N R1: a nice house una casa bonita a great artist un gran artista ADJ type: pre-nominal

Error Information Elicitation Refinement Operation Typology Automatic Rule Adaptation Change word order SL: Gaudí was a great artist MT system output: TL: Gaudí era un artista grande Ucorrection: *Gaudí era un artista grande Gaudí era un gran artista A concrete example clue word error correction

Finding Triggering Feature(s):  (error word, corrected word ) =   need to postulate a new binary feature: feat1 Blame assignment (from MT system output) tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> Automatic Rule Adaptation S,1 … NP,1 … NP,8 … Grammar ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc))

Refining Rules Bifurcate NP,8  NP,8 (R0) + NP,8’ (R1) (flip order of ADJ-N) {NP,8’} NP::NP : [DET ADJ N] -> [DET ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2 agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + )) Automatic Rule Adaptation

Refining Lexical Entries ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = -)) ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = +)) Automatic Rule Adaptation

Evaluating Improvement Automatic Rule Adaptation -Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present: -Corrected Translation Sentence -Original Translation Sentence (labelled as incorrect by the user) un artista gran un gran artista un grande artista *un artista grande

Evaluating Improvement Automatic Rule Adaptation -Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present: -Corrected Translation Sentence -Original Translation Sentence (labelled as incorrect by the user) *un artista gran un gran artista *un grande artista *un artista grande

Challenges and future work Credit and Blame assignment from TCTool Log Files and Xfer engine’s trace Order of corrections matters ~ explore rule interactions Explore the space between batch mode and fully interactive system Online TCTool always running to collect corrections from bilingual speakers  make it into a game with rewards for the best users

Publications Font Llitjós, A., J.G. Carbonell and A. Lavie. "A Framework for Interactive and Automatic Refinement of Transfer-based Machine Translation" EAMT 10th Annual Conference May 2005, Budapest, Hungary. "A Framework for Interactive and Automatic Refinement of Transfer-based Machine Translation" Font Llitjós, A., R. Aranovich and L. Levin. "Building Machine translation systems for indigenous languages". Second Conference on the Indigenous Languages of Latin America (CILLA II), October 2005, Texas, USA. "Building Machine translation systems for indigenous languages" Font Llitjós, A., K. Probst and J.G. Carbonell. "Error Analysis of Two Types of Grammar for the Purpose of Automatic Rule Refinement". AMTA, 2004, Washington, USA. "Error Analysis of Two Types of Grammar for the Purpose of Automatic Rule Refinement" Font Llitjós, A. and J.G. Carbonell. "The Translation Correction Tool: English-Spanish user studies“. LREC, Lisbon, Portugal. "The Translation Correction Tool: English-Spanish user studies“

Quechua  Spanish MT V-Unit: funded Summer project in Cusco (Peru) June-August 2005 [preparations and data collection started earlier] Intensive Quechua course in Centro Bartolome de las Casas (CBC) Worked together with two Quechua native and one non-native speakers on developing infrastructure (correcting elicited translations, segmenting and translating list of most frequent words)

Quechua  Spanish prototype MT system Stem Lexicon (semi-automatically generated): 753 lexical entries Suffix lexicon: 21 suffixes (150 Cusihuaman) Quechua morphology analyzer 25 translation rules Spanish morphology generation module User-Studies: 10 sentences, 3 users (2 native, 1 non- native)