MT with an Interlingua Lori Levin April 13, 2009.

Slides:



Advertisements
Similar presentations
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Advertisements

Machine Translation: Interlingual Methods Thanks to Les Sikos Bonnie J. Dorr, Eduard H. Hovy, Lori S. Levin.
Semantics (Representing Meaning)
Help communities share knowledge more effectively across the language barrier Automated Community Content Editing PorTal.
Critical Thinking Course Introduction and Lesson 1
OntoSem2OWL. Plan of the talk ● OntoSem Overview ● Features of OntoSem Ontology ● Mapping OntoSem2OWL ● Motivation ● Possible Application Scenarios.
ISBN Chapter 3 Describing Syntax and Semantics.
C SC 620 Advanced Topics in Natural Language Processing Lecture 22 4/15.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Towards an NLP `module’ The role of an utterance-level interface.
Machine Translation Anna Sågvall Hein Mösg F
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:
David Farwell, Stephen Helmreich Computing Research Laboratory/New Mexico State University Lori Levin, Teruko Mitamura Language Technologies Institute/Carnegie.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Recall The Team Skills 1. Analyzing the Problem (with 5 steps) 1.Gain agreement on the problem definition. 2.Understand the root causes 3.Identify the.
LCS and Approximate Interlingua at UMD Semantic Annotation Planning Meeting April 14, 2004 Bonnie J. Dorr University of Maryland.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Translation Divergence LING 580MT Fei Xia 1/10/06.
1 Lending a Hand: Sign Language Machine Translation Sara Morrissey NCLT Seminar Series 21 st June 2006.
Describing Syntax and Semantics
Language-Based Learning Disabilities in the School-Age Population Chapter 9.
Parts of Speech (Lexical Categories). Parts of Speech Nouns, Verbs, Adjectives, Prepositions, Adverbs (etc.) The building blocks of sentences The [ N.
Created By: Benjamin J. Van Someren.  Natural Language Translation – Translating one natural language such as German to another natural language such.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
1 Interlingual Annotation of Multilingual Text Corpora (IAMTC) Project Overview for ITIC November 13, 2003 Carnegie Mellon University Lori Levin, Teruko.
Content of the Data Category Registry 10 May /20111CLARIN-NL ISOcat workshop.
Eliciting Features from Minor Languages The elicitation tool provides a simple interface for bilingual informants with no linguistic training and limited.
Integrating Language Understanding agents into the Semantic Web Akshay Java, Tim Finin, Sergei Nirenburg 11/04/2005.
ACL Birds of a Feather Corpus Annotation with Interlingual Content Interlingual Annotation of Multilingual Text Corpora Bonnie Dorr, David Farwell, Rebecca.
Modeling Storing and Mining Moving Object Databases Proceedings of the International Database Engineering and Applications Symposium (IDEAS’04) Sotiris.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 14 Slide 1 Object-oriented Design.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
For Friday Finish chapter 24 No written homework.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Designing a Machine Translation Project Lori Levin and Alon Lavie Language Technologies Institute Carnegie Mellon University CATANAL Planning Meeting Barrow,
Natural Language Processing Chapter 2 : Morphology.
ITS 2.0 in XLIFF 2 FEISGILTT Dublin June 2014 Yves Savourel ENLASO Corporation This presentation was made possible by.
The Unreasonable Effectiveness of Data
Semi-Automated Elicitation Corpus Generation The elicitation tool provides a simple interface for bilingual informants with no linguistic training and.
NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Recent Advances in Speech Translation Systems ESSLLI-2002 Tutorial Course August 12-16, 2002 Course Organizers: Alon Lavie – Carnegie Mellon University.
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
Human-Assisted Machine Annotation Sergei Nirenburg, Marjorie McShane, Stephen Beale Institute for Language and Information Technologies University of Maryland.
Bridging the Gap: Machine Translation for Lesser Resourced Languages
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
UNL Document Summarization Virach Sornlertlamvanich, Tanapong Potipiti and Thatsanee Charoenporn Information Research and Development Division National.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Removing the Language Barrier Machine Translation And Digital Libraries.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Understanding Close Reading Agenda Approaching the Text INTRODUCTION TO THE UNIT.
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
The UNL Program A program created by the United Nations University / Institute of Advanced Studies Now carried out by the UNDL Foundation
Approaches to Machine Translation
Semantics (Representing Meaning)
Representation of Actions as an Interlingua
Computer Programming.
Machine Translation Nov 8, 2006
Approaches to Machine Translation
Introduction to Machine Translation
Presentation transcript:

MT with an Interlingua Lori Levin April 13, 2009

Interlingua “An interlingua is a notation for representing the content of a text that abstracts away from the characteristics of the language itself and focuses on the meaning (semantics) alone. Interlinguas are typically used as pivot representations in machine translation, allowing the contents of a source text to be generated in many different target languages. Due to the complexities involved, few interlinguas are more than demonstration prototypes, and only one has been used in a commercial MT system.” –Dorr, Hovy, Levin, Natural Language Processing and Machine Translation, Encyclopedia of Language and Linguistics, 2nd ed. (ELL2). Machine Translation: Interlingual Methods

KANT: If the error persists, service is required (Mitamura and Nyberg) (*BE-PREDICATE (attribute (*REQUIRED (degree positive))) (mood declarative) (predicate-role attribute) (punctuation period) (qualification (*QUALIFYING-EVENT (event (*PERSIST (argument-class theme) (mood declarative) (tense present) (theme (*ERROR (number (:OR mass singular)) (reference definite))))) (extent (*CONJ-if)) (topic +))) (tense present) (theme (*SERVICE (number (:OR mass singular)) (reference no-reference))))

NESPOLE! (Levin et al.) “I want to know what time the flight leaves Pittsburgh.” “What time does the flight leave Pittsburgh?” request-information+departure (time = (clock = question), transportation-spec = (flight, id = yes), origin = name=Pittsburgh)

Not Just for MT anymore

Vauquois Triangle

Reasons for using an interlingua N 2 vs 2N –For all-ways translation between N languages, you need an analyzer (L to interlingua) and a synthesizer (interlingua to L) for each language. Monolingual development teams –Each developer needs to know only his/her language and the interlingua. –NESPOLE! project: Italian to Korean translation worked as well as Italian to English, even though nobody on the team was bilingual in Korean and Italian. –Same may be true for SMT?

MT Divergences Translating word-by-word, node-by-node, or dependency-by-dependency does not work. –Mi chiamo Lori – My name is Lori –to be jealous — tener celos (to have jealousy) –to kick — dar una patada (give a kick) –to enter the house — entrar en la casa (enter in the house) –to run in — entrar corriendo (enter running) –meet someone/meet with someone –decide/make a decision Which of these are handled well by phrase-based SMT or syntax based SMT (with or without morphology – dar, doy, etc.)?

Interlingua Example: KANT (*BE-PREDICATE (attribute (*REQUIRED (degree positive))) (mood declarative) (predicate-role attribute) (punctuation period) (qualification (*QUALIFYING-EVENT (event (*PERSIST (argument-class theme) (mood declarative) (tense present) (theme (*ERROR (number (:OR mass singular)) (reference definite))))) (extent (*CONJ-if)) (topic +))) (tense present) (theme (*SERVICE (number (:OR mass singular)) (reference no-reference))))

Interlingua Example: NESPOLE! “I want to know what time the flight leaves Pittsburgh.” “What time does the flight leave Pittsburgh?” request-information+departure (time = (clock = question), transportation-spec = (flight, id = yes), origin = name=Pittsburgh)

Interlingua Example: Mikrokosmos request-action-69 agent human-72 theme accept-70 beneficiary organization-71 source-root-word ask time (< (find-anchor-time)) accept-70 theme war-73 theme-of request-action-69 source-root-word authorize organization-71 has-name united-nations beneficiary-of request-action-69 source-root-word UN human-72 has-namecolin powell agent-of request-action-69 source-root-word he ; ref. resolution has been carried out war-73 theme-of accept-70 source-root-word war

Interlingua Example: Lexical Conceptual Structure (event cause (thing[agent] reporter+) (go loc (thing[theme] +) (path to loc (thing +) (position at loc (thing +) (thing[goal] aljazeera+))) (manner send+ingly))) Figure 10: LCS Representation of The reporter ed Al-Jazeera

Issues in Interlingua design Grainsize of meaning Domain specificity of meaning Ambiguity Lack of agreement among humans From EACL workshop 2009: –Russell-Frege: Meaning can be broken down in to pieces that combine logically. – Witgenstein-Quine: Meaning = use. Use is represented by a corpus

Interlingua: annotated corpora Many annotated corpora can be considered as part of an interlingua: –Named entities and co-reference –Semantic roles –Temporal expression

IAMTC: Interlingua Annotation of Multi-lingual Text Corpora 14 PI’s. One year ( ). Still publishing. See other set of slides.

Elicitation Corpus 3000 feature structures English sentence for each one. LDC translated the English sentences into 13 languages and a few other places did a few more languages.

SCALE 2009: MT and HIVEs High Information Value Elements –Named entities, negation, modality Urdu to English Modality –H firmly believes [R is true/false] –H believes [R may be true/false] –H requires [R to be true/false] –H permits [R to be true/false] –H intends [to make R true/false] –H does not intend [to make R true/false] –H is trying [to make R true/false] –H is able [to make R true/false and succeeds] –H is able [to make R true/false and fails] –H is able [to make R true/false] –H wants [R to be true/false]