Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

Machine Translation II How MT works Modes of use.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Multilingual eLearning in LANGuage Engineering. Project Overview  Project span: Oct 2004 – Oct 2007  Kick-off meeting Oct  Project goals:
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Machine Translation Anna Sågvall Hein Mösg F
NICE: Native language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown Carnegie Mellon University.
Automatic Rule Learning for Resource-Limited Machine Translation Alon Lavie, Katharina Probst, Erik Peterson, Jaime Carbonell, Lori Levin, Ralf Brown Language.
Machine Translation with Scarce Resources The Avenue Project.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Building NLP Systems for Two Resource Scarce Indigenous Languages: Mapudungun and Quechua, and some other languages Christian Monson, Ariadna Font Llitjós,
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Eliciting Features from Minor Languages The elicitation tool provides a simple interface for bilingual informants with no linguistic training and limited.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Multi-Engine MT for Quick MT. Missing Technology for Quick MT LingWear ISI MT NICE Core Rapid MT - Multi-Engine MT - Omnivorous resource usage - Pervasive.
An ICALL writing support system tunable to varying levels of learner initiative Karin Harbusch 1 & Gerard Kempen 2,3 1 University of Koblenz-Landau, Koblenz,
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Rule Learning - Overview Goal: Syntactic Transfer Rules 1) Flat Seed Generation: produce rules from word- aligned sentence pairs, abstracted only to POS.
AMTEXT: Extraction-based MT for Arabic Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Laura Kieras, Peter Jansen Informant: Loubna El Abadi.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University.
Carnegie Mellon Goal Recycle non-expert post-editing efforts to: - Refine translation rules automatically - Improve overall translation quality Proposed.
Data Collection and Language Technologies for Mapudungun Lori Levin, Rodolfo Vega, Jaime Carbonell, Ralf Brown, Alon Lavie Language Technologies Institute.
Overview of the Language Technologies Institute and AVENUE Project Jaime Carbonell, Director March 2, 2002.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, January 2003.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
An Overview of the AVENUE Project Presented by Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh,
Designing a Machine Translation Project Lori Levin and Alon Lavie Language Technologies Institute Carnegie Mellon University CATANAL Planning Meeting Barrow,
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
Automating Post-Editing To Improve MT Systems Ariadna Font Llitjós and Jaime Carbonell APE Workshop, AMTA – Boston August 12, 2006.
Language Technologies Institute School of Computer Science Carnegie Mellon University NSF, August 6, 2001 Machine Translation for Indigenous Languages.
Semi-Automated Elicitation Corpus Generation The elicitation tool provides a simple interface for bilingual informants with no linguistic training and.
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
NICE: Native Language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown, Erik Peterson, Katharina Probst,
Data Elicitation for AVENUE By: Alison Alvarez Lori Levin Bob Frederking Jeff Good (MPI Leipzig) Erik Peterson.
The CMU Mill-RADD Project: Recent Activities and Results Alon Lavie Language Technologies Institute Carnegie Mellon University.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Bridging the Gap: Machine Translation for Lesser Resourced Languages
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
AVENUE: Machine Translation for Resource-Poor Languages NSF ITR
FROM BITS TO BOTS: Women Everywhere, Leading the Way Lenore Blum, Anastassia Ailamaki, Manuela Veloso, Sonya Allin, Bernardine Dias, Ariadna Font Llitjós.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
The AVENUE Project: Automatic Rule Learning for Resource-Limited Machine Translation Faculty: Alon Lavie, Jaime Carbonell, Lori Levin, Ralf Brown Students:
Eliciting a corpus of word-aligned phrases for MT
Approaches to Machine Translation
Ariadna Font Llitjós March 10, 2004
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Alon Lavie, Jaime Carbonell, Lori Levin,
Towards Interactive and Automatic Refinement of Translation Rules
Towards Interactive and Automatic Refinement of Translation Rules
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Approaches to Machine Translation
Towards Interactive and Automatic Refinement of Translation Rules
Towards Interactive and Automatic Refinement of Translation Rules
Presentation transcript:

Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September 22, 2004

October 11, 2002AMTA dot = language

October 11, 2002AMTA Motivation Resource-poor scenarios -Indigenous communities have difficult access to crucial information that directly affects their life (such as land laws, health warnings, etc.) -Formalize a potentially endangered language Affordable technologies, such as -spell-checkers, -on-line dictionaries, -Machine Translation (MT) systems, -computer assisted tutoring

October 11, 2002AMTA AVENUE Partners LanguageCountryInstitutions Mapudungun (in place) Chile Universidad de la Frontera, Institute for Indigenous Studies, Ministry of Education Quechua (started) Peru Ministry of Education Iñupiaq (discussion) US (Alaska) Ilisagvik College, Barrow school district, Alaska Rural Systemic Initiative, Trans- Arctic and Antarctic Institute, Alaska Native Language Center Siona (discussion) Colombia OAS-CICAD, Plante, Department of the Interior

October 11, 2002AMTA Chile Official Language: Spanish Population: ~15 million ~1/2 million Mapuche people Language: Mapudungun Mapudungun for the Mapuche

October 11, 2002AMTA What’s Machine Translation (MT)? Japanese sentence Swahili sentence

October 11, 2002AMTA Speech to Speech MT

October 11, 2002AMTA Why Machine Translation for resource-poor (indigenous) languages? Commercial MT economically feasible for only a handful of major languages with large resources (corpora, human developers) Benefits include: –Better government access to indigenous communities (Epidemics, crop failures, etc.) –Better indigenous communities participation in information-rich activities (health care, education, government) without giving up their languages. –Language preservation –Civilian and military applications (disaster relief)

October 11, 2002AMTA MT for resource-poor languages: Challenges Minimal amount of parallel text (oral tradition) Possibly competing standards for orthography/spelling Often relatively few trained linguists Access to native informants possible Need to minimize development time and cost

October 11, 2002AMTA Interlingua Transfer rules Corpus-based methods analysis interpretation generation I saw you Yo vi tú Machine Translation Pyramid

October 11, 2002AMTA AVENUE MT system overview \spa Una mujer se quedó en casa \map Kie domo mlewey ruka mew \eng One woman stayed at home. {VP,3} VP::VP : [VP NP] -> [VP NP] ( (X1::Y1) (X2::Y2) ((x2 case) = acc) ((x0 obj) = x2) ((x0 agr) = (x1 agr)) (y2 == (y0 obj)) ((y0 tense) = (x0 tense)) ((y0 agr) = (y1 agr))) V::V |: [stayed] -> [quedó] ((X1::Y1) ((x0 form) = stay) ((x0 actform) = stayed) ((x0 tense) = past-pp) ((y0 agr pers) = 3) ((y0 agr num) = sg))

October 11, 2002AMTA Avenue MT system overview Learning Module Transfer Rules Lexical Resources Run Time Transfer System Lattice Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Handcrafted rules Morphology Morpho- logical analyzer

October 11, 2002AMTA Avenue overview: my research Learning Module Transfer Rules Lexical Resources Run Time Transfer System Lattice Translation Correction Tool Word- Aligned Parallel Corpus Elicitation Tool Elicitation Corpus ElicitationRule Learning Run-Time System Rule Refinement Rule Refinement Module Handcrafted rules Morphology Morpho- logical analyzer

Interactive and Automatic Refinement of Translation Rules Or: How to recycle corrections of MT output back into the MT system by adjusting and adapting the grammar and lexical rules

October 11, 2002AMTA Error correction by non-expert bilingual users

October 11, 2002AMTA Interactive elicitation of MT errors Assumptions: non-expert bilingual users can reliably detect and minimally correct MT errors, given: –SL sentence (I saw you) –TL sentence (Yo vi tú) –word-to-word alignments (I-yo, saw-vi, you-tú) –(context) using an online GUI: the Translation Correction Tool (TCTool) Goal: simplify MT correction task maximally

October 11, 2002AMTA Translation Correction Tool Actions:

October 11, 2002AMTA SL + best TL picked by user

October 11, 2002AMTA Changing word order

October 11, 2002AMTA Changing “grande” into “gran”

October 11, 2002AMTA

October 11, 2002AMTA

October 11, 2002AMTA Automatic Rule Refinement Framework Find best RR operations given a: grammar (G), lexicon (L), (set of) source language sentence(s) (SL), (set of) target language sentence(s) (TL), its parse tree (P), and minimal correction of TL (TL’) such that TQ2 > TQ1 Which can also be expressed as: max TQ (TL|TL’,P,SL,RR(G,L))

October 11, 2002AMTA Types of RR operations Grammar: –R0  R0 + R1 [=R0’ + contr] Cov[R0]  Cov[R0,R1] –R0  R1 [=R0 + constr] Cov[R0]  Cov[R1] –R0  R1[=R0 + constr= -]  R2[=R0’ + constr=c +] Cov[R0]  Cov[R1,R2] Lexicon –Lex0  Lex0 + Lex1[=Lex0 + constr] –Lex0  Lex1[=Lex0 + constr] –Lex0  Lex1[  Lex0 +  TLword] –   Lex1 (adding lexical item)

October 11, 2002AMTA Questions & Discussion Thanks!

October 11, 2002AMTA Formalizing Error Information W i = error W i ’ = correction W c = clue word Example: SL: the red car - TL: *el auto roja  TL’: el auto rojo W i = roja W i ’ = rojo W c = auto

October 11, 2002AMTA Finding Triggering Features Once we have user’s correction (W i ’), we can compare it with W i at the feature level and find which is the triggering feature. If  set is empty, need to postulate a new binary feature  Delta function: