Improving Statistical Machine Translation by Means of Transfer Rules Nurit Melnik.

Slides:



Advertisements
Similar presentations
Keyboarding Objective Apply language skills in keyed documents
Advertisements

Projecting Grammatical Features in Nominals: 23 March 2010 Jerry T. Ball Senior Research Psychologist 711 th HPW / RHAC Air Force Research Laboratory DISTRIBUTION.
Statistical NLP: Lecture 3
Stat-XFER: A General Framework for Search-based Syntax-driven MT Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Enabling MT for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University.
Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System Alon Lavie Language Technologies Institute Carnegie Mellon University.
NICE: Native language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown Carnegie Mellon University.
Automatic Rule Learning for Resource-Limited Machine Translation Alon Lavie, Katharina Probst, Erik Peterson, Jaime Carbonell, Lori Levin, Ralf Brown Language.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
Machine Translation Challenges and Language Divergences Alon Lavie Language Technologies Institute Carnegie Mellon University : Machine Translation.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
Stat-XFER: A General Framework for Search-based Syntax-driven MT Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Keyboarding Objective 3.01 Interpret Proofreader Marks
Machine Translation History of Machine Translation Difficulties in Machine Translation Structure of Machine Translation System Research methods for Machine.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
MT for Languages with Limited Resources Machine Translation April 20, 2011 Based on Joint Work with: Lori Levin, Jaime Carbonell, Stephan Vogel,
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Proofreading Skills Keyboarding Objective Apply language skills in keyed documents.
Form and Meaning As we know that translation is changing from one state or form to another, to turn into one’s own or another’s language( Merriam-Webster.
Eliciting Features from Minor Languages The elicitation tool provides a simple interface for bilingual informants with no linguistic training and limited.
Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Statistical XFER: Hybrid Statistical Rule-based Machine Translation Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System Alon Lavie Language Technologies Institute Carnegie Mellon University.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
LANGUAGE TRANSFER SRI SURYANTI WORD ORDER STUDIES OF TRANSFER ODLIN (1989;1990) UNIVERSAL POSITION WHAT EXTENT WORD ORDER IN INTERLANGUAGE IS.
AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University.
Error Correction: For Dummies? Ellen Pratt, PhD. UPR Mayaguez.
2007CLINT-LIN-FEATSTR1 Computational Linguistics for Linguists Feature Structures.
Carnegie Mellon Goal Recycle non-expert post-editing efforts to: - Refine translation rules automatically - Improve overall translation quality Proposed.
Data Collection and Language Technologies for Mapudungun Lori Levin, Rodolfo Vega, Jaime Carbonell, Ralf Brown, Alon Lavie Language Technologies Institute.
Linguistic Essentials
Hebrew-to-English XFER MT Project - Update Alon Lavie June 2, 2004.
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
Semi-Automated Elicitation Corpus Generation The elicitation tool provides a simple interface for bilingual informants with no linguistic training and.
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Bridging the Gap: Machine Translation for Lesser Resourced Languages
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
CMU Statistical-XFER System Hybrid “rule-based”/statistical system Scaled up version of our XFER approach developed for low-resource languages Large-coverage.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
September 26, : Grammars and Lexicons Lori Levin.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
AVENUE: Machine Translation for Resource-Poor Languages NSF ITR
Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September.
FROM BITS TO BOTS: Women Everywhere, Leading the Way Lenore Blum, Anastassia Ailamaki, Manuela Veloso, Sonya Allin, Bernardine Dias, Ariadna Font Llitjós.
A method to restrict the blow-up of hypotheses... A method to restrict the blow-up of hypotheses of a non-disambiguated shallow machine translation system.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
1 Proofreading & Language Skills Keyboarding Objective Apply language skills in keyed documents.
Eliciting a corpus of word-aligned phrases for MT
Approaches to Machine Translation
Statistical NLP: Lecture 3
Keyboarding Objective Interpret Proofreaders’ Marks in Documents
Alon Lavie, Jaime Carbonell, Lori Levin,
Stat-Xfer מציגים: יוגב וקנין ועומר טבח, 05/01/2012
Stat-XFER: A General Framework for Search-based Syntax-driven MT
Approaches to Machine Translation
Keyboarding Objective Interpret Proofreaders’ Marks in Documents
Keyboarding Objective Interpret Proofreaders’ Marks in Documents
Presentation transcript:

Improving Statistical Machine Translation by Means of Transfer Rules Nurit Melnik

Hebrew to English Machine Translation Language Technologies Institute Carnegie Mellon University Headed by Alon Lavie Computational Linguistics Group University of Haifa Headed by Shuly Wintner This research was made possible by support from the Caesarea Rothschild Institute at Haifa University and was funded in part by NSF grant number IIS With Danny Shacham (Haifa U.) and Erik Peterson (CMU)

Hebrew-specific challenges for MT High lexical & morphological ambiguity Limited electronic linguistic resources Lack of comprehensive electronic open-source bilingual dictionaries Consequently: State of the art technologies are not applicable to Hebrew.

The AVENUE Project The goal  The design and rapid development of new MT methods for languages for which only limited resources are available Projects  Aymara (Bolivia)  Quechua (Peru)  Mapudungun (Chile) Language Technologies Institute, CMU

Lavie, Peterson, Probst, Wintner and Eytani Rapid Prototyping of a Transfer-based Hebrew-to- English Machine Translation System. Proceedings of The 10th International Conference on Theoretical and Methodological Issues in Machine Translation, pages 1-10, Baltimore, MD, October THE ARCHITECTURE

Lavie, Peterson, Probst, Wintner and Eytani Rapid Prototyping of a Transfer-based Hebrew-to- English Machine Translation System. Proceedings of The 10th International Conference on Theoretical and Methodological Issues in Machine Translation, pages 1-10, Baltimore, MD, October A HYBRID APPROACH Rule-based Corpus-based

Syntactic Transfer Rules Transfer rules embody the 3 stages of translation  Analysis of source language  Transfer  Generation of target language Currently: 33 transfer rules (The original version written by Alon Lavie)

The Lattice N ha-sara the-minister.3SF V pagSa met.3SF N ha-nasi the-president.3SM P et ACC V hiSra soaked.3SM PRO at you.2SF N et spade H$RHPG$HATHN$IA NP(subj) Verb (0,1) the minister met NP (0,0) the minister NP (3,3) the president NP (2,3) the president’s spade NP(2,2) spade NP (2,2) you NP[acc] (2,3) the president

The Decoder The decoder uses the statistical Language Model of English to pick the most likely translation. N ha-sara the-minister.3SF V pagSa met.3SF N ha-nasi the-president.3SM P et ACC V hiSra soaked.3SM PRO at you.2SF N et spade H$RHPG$HATHN$IA NP(subj) Verb (0,1) the minister met NP (0,0) the minister NP (3,3) the president NP (2,3) the president’s spade NP(2,2) spade NP (2,2) you NP[acc] (2,3) the president

Some Syntactic Challenges for Hebrew- English MT The structure of Noun Phrases Subject-Verb inversion Pro-drop Argument Structure (valency) nistaymuha-bxirot endedthe-elections The elections ended. hitsbatiba-bxirot voted.1Sin-the-elections I voted in the elections. Danihimlitsalha-seret Dannyrecommendedonthe-movie Danny recommended the movie.

Some Syntactic Challenges for Hebrew- English MT Possessor Dative Construction Anaphor resolution hitkalkelala-nuha-mexonit broke-downto-usthe-car Our car broke down. ha-memSalaarxaetyeSivata ha-riSona the-government held ACC her-meeting the-first The government held its first meeting.

Hebrew-English Syntactic Transfer Noun Phrases Subject-Verb inversion

Transfer Rules for NPs Hebrew NPEnglish NP the morphological level (only Hebrew) syntactic specifiers (only English)

def+ def- DEF Feature Percolation In Construct State NPs

Possessor Feature Structure Percolation

{NP0,2} NP0::NP0 [N PRO] -> [N] ( (X1::Y1) ((X2 case) = possessive) ((X0 possessor) = X2) ((X0 def) = +) ((Y1 num) = (X1 num)) (X0 = X1) (Y0 = X0) ) פגישתם PGI$TM pgiSat-am meeting.3SF-POSS.3PM ( ( SPANSTART 0 ) ( SPANEND 1 ) ( SCORE 1 ) ( LEX PGI$H ) ( POS N ) ( GEN feminine ) ( NUM singular ) ( STATUS absolute ) ) ( ( SPANSTART 1 ) ( SPANEND 2 ) ( SCORE 1 ) ( LEX *PRO* ) ( POS PRO ) ( TRANS *PRO* ) ( GEN masculine ) ( NUM plural ) ( PER 3 ) ( CASE possessive ) ) Transfer RulesMorph. AnalysisInput {NP,3} NP::NP [NP2] -> [PRO NP2] ( (X1::Y2) ((X1 possessor) =c *DEFINED*) ((Y1 case) = (X1 possessor case)) ((Y1 per) = (X1 possessor person)) ((Y1 num) = (X1 possessor num)) ((Y1 gen) = (X1 possessor gen)) (X0 = X1) (Y0 = Y2) ) THEIR MEETING Output

Noun Phrases – Construct State decision.3SF-CSthe-president.3SMthe-first.3SM החלטת הנשיא הראשון החלטת הנשיא הראשונה decision.3SF-CSthe-president.3SMthe-first.3SF THE DECISION OF THE FIRST PRESIDENT THE FIRST DECISION OF THE PRESIDENT

Noun Phrases - Possessives HNSIAHKRIZ$HM$IMHHRA$WNH$LWTHIH the-presidentannouncedthat-the-task.3SFthe-first.3SFof-himwill.3SF LMCWA PTRWNLSKSWKBAZWRNW to-findsolutionto-the-conflictin-region-POSS.1P הנשיא הכריז שהמשימה הראשונה שלו תהיה למצוא פתרון לסכסוך באזורנו Without transfer grammar: THE PRESIDENT ANNOUNCED THAT THE TASK THE BEST OF HIM WILL BE TO FIND SOLUTION TO THE CONFLICT IN REGION OUR With transfer grammar: THE PRESIDENT ANNOUNCED THAT HIS FIRST TASK WILL BE TO FIND A SOLUTION TO THE CONFLICT IN OUR REGION

Subject-Verb Inversion ATMWLHWDI&HHMM$LH yesterdayannounced.3SFthe-government.3SF אתמול הודיעה הממשלה שתערכנה בחירות בחודש הבא $T&RKNHBXIRWTBXWD$HBA that-will-be-held.3PFelections.3PFin-the-monththe-next Without transfer grammar: YESTERDAY ANNOUNCED THE GOVERNMENT THAT WILL RESPECT OF THE FREEDOM OF THE MONTH THE NEXT With transfer grammar: YESTERDAY THE GOVERNMENT ANNOUNCED THAT ELECTIONS WILL ASSUME IN THE NEXT MONTH

Subject-Verb Inversion LPNIKMH$BW&WTHWDI&HHNHLTHMLWN beforeseveralweeksannounced.3SFmanagement.3SF.CSthe-hotel לפני כמה שבועות הודיעה הנהלת המלון שהמלון יסגר בסוף השנה $HMLWNISGRBSWFH$NH that-the-hotel.3SMwill-be-closed.3SMat-end.3SM.CSthe-year Without transfer grammar: IN FRONT OF A FEW WEEKS ANNOUNCED ADMINISTRATION THE HOTEL THAT THE HOTEL WILL CLOSE AT THE END THIS YEAR With transfer grammar: SEVERAL WEEKS AGO THE MANAGEMENT OF THE HOTEL ANNOUNCED THAT THE HOTEL WILL CLOSE AT THE END OF THE YEAR

Qualitative Evaluation Error Types  Syntactic errors  Lexical errors  Language Model errors

Syntactic errors Syntactic structures that are not covered by the current grammar  Passive  Pro-drop  Participles  Negation  Copula-less constructions  …

Lexical Errors Complex lexical items that are missing from the lexicon Multi-word phrases  axar kax after like-this ‘later’ (Semi-)fixed expressions  magi’alomaskoret reaches.3SFto-himsalary.3SF ‘he deserves a salary’  ha-yeledbensheva the-boysonseven ‘the boy is seven years old’

Language Model Errors The English Language Model is used to pick the most likely translation from a set of options in the lattice. “LM errors” occur when the LM does not pick the best option.

Language Model Errors Wrong lexical choices  אני רוצה את השכר...  Selected: … I want the charter…  Better: … I want the salary… Wrong syntactic choices ...שמארגנת מנהלת ההגירה  Selected: … that the organizer of the management of the immigration  Better: … that the administration of the immigration organizes

Conclusion Purely statistical MT is not possible for languages with limited resources. The solution: A hybrid system  Transfer-rule-based methods for the resource-poor source language  Statistical methods for the resource-rich target language