Machine Translation Challenges and Language Divergences Alon Lavie Language Technologies Institute Carnegie Mellon University 11-731: Machine Translation.

Slides:



Advertisements
Similar presentations
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Advertisements

Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Introduction.  “a technique that enables the computer to encode complex grammatical knowledge such as humans use to assemble sentences, recognize errors.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
CS 4705 Lecture 17 Semantic Analysis: Syntax-Driven Semantics.
PSY 369: Psycholinguistics
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Translation Divergence LING 580MT Fei Xia 1/10/06.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Machine Translation Overview Alon Lavie Language Technologies Institute Carnegie Mellon University LTI Immigration Course August 23, 2007.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
Syntax.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Syntax Nuha AlWadaani.
Phonetics, Phonology, Morphology and Syntax
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Globalisation and machine translation Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate.
Machine Translation: Approaches, Challenges and Future Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University.
Machine Translation: Approaches, Challenges and Future Alon Lavie Language Technologies Institute Carnegie Mellon University ITEC Dinner May 21, 2009.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
An Intelligent Analyzer and Understander of English Yorick Wilks 1975, ACM.
Natural Language Processing Artificial Intelligence CMSC February 28, 2002.
COMPILERS Symbol Tables hussein suleman uct csc3003s 2007.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Machine Translation: An Introduction and Overview Alon Lavie Language Technologies Institute Carnegie Mellon University JHU Summer School June 28, 2006.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
LANGUAGE TRANSFER SRI SURYANTI WORD ORDER STUDIES OF TRANSFER ODLIN (1989;1990) UNIVERSAL POSITION WHAT EXTENT WORD ORDER IN INTERLANGUAGE IS.
Artificial Intelligence: Natural Language
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
Semantic Construction lecture 2. Semantic Construction Is there a systematic way of constructing semantic representation from a sentence of English? This.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Challenges of Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Rules, Movement, Ambiguity
Artificial Intelligence: Natural Language
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Introduction to Compiling
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Lecture 1 Ling 442.
Test Review. Test Tips Chapters How to Answer Questions on a Test 1.Be thorough, but brief. Shorter is usually better. 2.Base the length of your.
Natural Language Processing (NLP)
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
Carnegie Mellon IRST-itc Balancing Expressiveness and Simplicity in an Interlingua for Task Based Dialogue Lori Levin, Donna Gates, Dorcas Wallace, Kay.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Approaches to Machine Translation
Statistical NLP: Lecture 3
SYNTAX.
How Do We Translate? Methods of Translation The Process of Translation.
Chapter Eight Syntax.
Method of Language Definition
Chapter Eight Syntax.
Approaches to Machine Translation
Challenges of Machine Translation
Dekai Wu Presented by David Goss-Grubbs
Presentation transcript:

Machine Translation Challenges and Language Divergences Alon Lavie Language Technologies Institute Carnegie Mellon University : Machine Translation January 12, 2011

Major Sources of Translation Problems Lexical Differences: –Multiple possible translations for SL word, or difficulties expressing SL word meaning in a single TL word Structural Differences: –Syntax of SL is different than syntax of the TL: word order, sentence and constituent structure Differences in Mappings of Syntax to Semantics: –Meaning in TL is conveyed using a different syntactic structure than in the SL Idioms and Constructions : Machine Translation

January 12, 2011 Lexical Differences SL word has several different meanings, that translate differently into TL –Ex: financial bank vs. river bank Lexical Gaps: SL word reflects a unique meaning that cannot be expressed by a single word in TL –Ex: English snub doesn’t have a corresponding verb in French or German TL has finer distinctions than SL  SL word should be translated differently in different contexts –Ex: English wall can be German wand (internal), mauer (external) : Machine Translation

January 12, 2011 Google at Work… : Machine Translation

January 12, : Machine Translation

January 12, : Machine Translation

January 12, 2011 Lexical Differences Lexical gaps: –Examples: these have no direct equivalent in English: gratiner (v., French, “to cook with a cheese coating”) ōtosanrin (n., Japanese, “three-wheeled truck or van”) : Machine Translation

January 12, 2011 [From Hutchins & Somers] Lexical Differences : Machine Translation

January 12, 2011 MT Handling of Lexical Differences Direct MT and Syntactic Transfer: –Lexical Transfer stage uses bilingual lexicon –SL word can have multiple translation entries, possibly augmented with disambiguation features or probabilities –Lexical Transfer can involve use of limited context (on SL side, TL side, or both) –Lexical Gaps can partly be addressed via phrasal lexicons Semantic Transfer: –Ambiguity of SL word must be resolved during analysis  correct symbolic representation at semantic level –TL Generation must select appropriate word or structure for correctly conveying the concept in TL : Machine Translation

January 12, 2011 Structural Differences Syntax of SL is different than syntax of the TL: –Word order within constituents: English NPs: art adj n the big boy Hebrew NPs: art n art adj ha yeled ha gadol –Constituent structure: English is SVO: Subj Verb Obj I saw the man Modern Arabic is VSO: Verb Subj Obj –Different verb syntax: Verb complexes in English vs. in German I can eat the apple Ich kann den apfel essen –Case marking and free constituent order German and other languages that mark case: den apfel esse Ich the (acc) apple eat I (nom) : Machine Translation

January 12, : Machine Translation

January 12, : Machine Translation

January 12, : Machine Translation

January 12, 2011 MT Handling of Structural Differences Direct MT Approaches: –No explicit treatment: Phrasal Lexicons and sentence level matches or templates Syntactic Transfer: –Structural Transfer Grammars Trigger rule by matching against syntactic structure on SL side Rule specifies how to reorder and re-structure the syntactic constituents to reflect syntax of TL side Semantic Transfer: –SL Semantic Representation abstracts away from SL syntax to functional roles  done during analysis –TL Generation maps semantic structures to correct TL syntax : Machine Translation

January 12, 2011 Syntax-to-Semantics Differences Meaning in TL is conveyed using a different syntactic structure than in the SL –Changes in verb and its arguments –Passive constructions –Motion verbs and state verbs –Case creation and case absorption Main Distinction from Structural Differences: –Structural differences are mostly independent of lexical choices and their semantic meaning  addressed by transfer rules that are syntactic in nature –Syntax-to-semantic mapping differences are meaning-specific: require the presence of specific words (and meanings) in the SL : Machine Translation

January 12, 2011 Syntax-to-Semantics Differences Structure-change example: I like swimming “Ich scwhimme gern” I swim gladly : Machine Translation

January 12, : Machine Translation

January 12, 2011 Syntax-to-Semantics Differences Verb-argument example: Jones likes the film. “Le film plait à Jones.” (lit: “the film pleases to Jones”) Use of case roles can eliminate the need for this type of transfer – Jones = Experiencer – film = Theme : Machine Translation

January 12, : Machine Translation

January 12, : Machine Translation

January 12, 2011 Syntax-to-Semantics Differences Passive Constructions Example: French reflexive passives: Ces livres se lisent facilement *”These books read themselves easily” These books are easily read : Machine Translation

January 12, : Machine Translation

January 12, : Machine Translation

January 12, 2011 Same intention, different syntax rigly bitiwgacny my leg hurts candy wagac fE rigly I have pain in my leg rigly bitiClimny my leg hurts fE wagac fE rigly there is pain in my leg rigly bitinqaH calya my leg bothers on me Romanization of Arabic from CallHome Egypt : Machine Translation

January 12, 2011 MT Handling of Syntax-to-Semantics Differences Direct MT Approaches: –No Explicit treatment: Phrasal Lexicons and sentence level matches or templates Syntactic Transfer: –“Lexicalized” Structural Transfer Grammars Trigger rule by matching against “lexicalized” syntactic structure on SL side: lexical and functional features Rule specifies how to reorder and re-structure the syntactic constituents to reflect syntax of TL side Semantic Transfer: –SL Semantic Representation abstracts away from SL syntax to functional roles  done during analysis –TL Generation maps semantic structures to correct TL syntax : Machine Translation

January 12, 2011 Idioms and Constructions Main Distinction: meaning of whole is not directly compositional from meaning of its sub-parts  no compositional translation Examples: –George is a bull in a china shop –He kicked the bucket –Can you please open the window? : Machine Translation

January 12, : Machine Translation

January 12, 2011 Formulaic Utterances Good night. tisbaH cala xEr waking up on good Romanization of Arabic from CallHome Egypt : Machine Translation

January 12, 2011 Constructions Identifying speaker intention rather than literal meaning for formulaic and task-oriented sentences. How about … suggestion Why don’t you… suggestion Could you tell me… request info. I was wondering… request info : Machine Translation

January 12, : Machine Translation

January 12, 2011 MT Handling of Constructions and Idioms Direct MT Approaches: –No Explicit treatment: Phrasal Lexicons and sentence level matches or templates Syntactic Transfer: –No effective treatment –“Highly Lexicalized” Structural Transfer rules can handle some constructions Trigger rule by matching against entire construction, including structure on SL side Rule specifies how to generate the correct construction on the TL side Semantic Transfer: –Analysis must capture non-compositional representation of the idiom or construction  specialized rules –TL Generation maps construction semantic structures to correct TL syntax and lexical words : Machine Translation

January 12, 2011 Take Home Messages Remember these types of language divergences as you learn about and apply the various steps in the MT system pipelines of different approaches! Ask yourself how capable these various steps and approaches are in addressing these types of divergences! –Can the step/approach handle these divergences? –If so, does it model the divergence at the appropriate level of abstraction? Keep these language divergences in mind when you analyze the errors of the MT system that you have put together and trained! –Are the errors attributable to a particular divergence? –What would be required for the system to address this type of error? : Machine Translation

January 12, 2011 Summary Main challenges for current state-of-the-art MT approaches - Coverage and Accuracy: –Acquiring broad-coverage high-accuracy translation lexicons (for words and phrases) –learning syntactic mappings between languages from parallel word-aligned data –overcoming syntax-to-semantics differences and dealing with constructions –Effective Target Language Modeling : Machine Translation

January 12, 2011 Homework Assignment # : Machine Translation

January 12, 2011 Questions… : Machine Translation