Example-Based Machine Translation Kevin Duh April 21, 2005 UW Machine Translation Reading Group.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

Example-based Machine Translation The other corpus-based approach to MT.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Machine Translation III Empirical approaches to MT: Example-based MT Statistical MT LELA30431/chapter50.pdf.
CSA4050: Advanced Topics in NLP Example Based MT.
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
1 Historical Developments of Translation Technology (TT) widespread use of fax machines, enabling translation services to operate internationally 1980s.
1 Session 1 Advantages and Disadvantages of Translation Technology (TT) - Historical development of translation technology - Focus on TM and MT (Theory.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
Flow Network Models for Sub-Sentential Alignment Ying Zhang (Joy) Advisor: Ralf Brown Dec 18 th, 2001.
EBMT1 Example Based Machine Translation as used in the Pangloss system at Carnegie Mellon University Dave Inman.
Jumping Off Points Ideas of possible tasks Examples of possible tasks Categories of possible tasks.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
1 Lending a Hand: Sign Language Machine Translation Sara Morrissey NCLT Seminar Series 21 st June 2006.
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
Outline What is a collocation? Automatic approaches 1: frequency-based methods Automatic approaches 2: ruling out the null hypothesis, t-test Automatic.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Chapter 6: The Traditional Approach to Requirements
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
Machine translation Context-based approach Lucia Otoyo.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10.
National Institute of Informatics Kiyoko Uchiyama 1 A Study for Introductory Terms in Logical Structure of Scientific Papers.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
AN IMPLEMENTATION OF A REGULAR EXPRESSION PARSER
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.
GUIDE : PROF. PUSHPAK BHATTACHARYYA Bilingual Terminology Mining BY: MUNISH MINIA (07D05016) PRIYANK SHARMA (07D05017)
Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Levels of Linguistic Analysis
8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Natural Language Processing (NLP)
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Of 24 lecture 11: ontology – mediation, merging & aligning.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Approaches to Machine Translation
Suggestions for Class Projects
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Approaches to Machine Translation
Presentation transcript:

Example-Based Machine Translation Kevin Duh April 21, 2005 UW Machine Translation Reading Group

1 Outline 1.Basics of Example-based MT 2.System example: ATR system (1991): E. Sumita & H. Iida paper 3.General EBMT Issues 4.Different flavors of EBMT 5.Connections to Statistical MT, Rule-based MT, Speech synthesis, Case-base Reasoning…

2 EBMT Basic Philosophy “Man does not translate a simple sentence by doing deep linguistic analysis, rather, man does translation, first, by properly decomposing an input sentence into certain fragmental phrases, and finally by properly composing these fragmental translations into one long sentence. The translation of each fragmental phrase will be done by the analogy translation principle with proper examples as its reference.” -- Nagao (1984)

3 Example run of EBMT Example: Input: He buys a book on international politics 1.He buys a notebook -- Kare wa noto o kau 2.I read a book on international politics -- Watashi wa kokusai seiji nitsuite kakareta hon o yomu Output: Kare wa kokusai seiji nitsuite kakareta hon o kau 3 Main Components: Matching input to a database of real examples Identifying corresponding translation fragments Recombining fragments into target text

4 EBMT “Pyramid” Exact match (direct translation) Source Target ALIGNMENT (transfer) MATCHING (analysis) RECOMBINATION (generation) From: H. Somers, 2003, “An Overview of EBMT,” in Recent Advances in Example- Based Machine Translation (ed. M. Carl, A. Way), Kluwer

5 ATR System (1991) “Experiments and Prospects of Example-based Machine Translation,” Eiichiro Sumita and Hitoshi Iida. In 29th Annual Meeting of the Association for Computational Linguistics, Overview: 1.Japanese-English translation: “N1 no N2” problem 2.When EBMT is better suited than Rule-based MT 3.EBMT in action: distance calculation, etc. 4.Evaluation

6 Translating “N1 no N2” “no” の is an adnominal particle Variants: “deno” での, “madeno” までの, etc. “Noun no Noun” => “Noun of Noun” Youka no gogoThe afternoon of the 8th Kaigi no mokutekiThe objective of the conference Kaigi no sankaryouThe application fee for the conference Kyouto deno kaigiThe conference in Kyoto Isshukan no kyukaA week’s holiday Mittsu no hoteruThree hotels

7 Difficult linguistic phenomena It is difficult to hand-craft linguistic rules for “N1 no N2” translation phenomenon Requires deep semantic analysis for each word Other difficult phenomena: Optional case particles (“-de”, “-ni”) Sentences lacking main verb (“-onegaishimasu”) Fragmental expressions (“hai”, “soudesu”) “N1 wa N2 da” (“N1 be N2”) Spanish “de” German compound nouns

8 When EBMT works better than Rule-based MT 1.Translation rule is difficult to formulate 2.General rule cannot accurately describe phenomena due to special cases (e.g. idioms) 3.Translation cannot be made by a compositional way using target words 4.When sentence to be translated has a close match in the database. How about when does Statistical MT work well?

9 EBMT in action Required resources: Sentence-aligned parallel corpora (Hierarchical) Thesaurus for calculating semantic distance between content words of input and example sentences Distance calculation: Input and example sentences (I, E) are represented by a set of attributes For “N1 no N2”: For N1/N2: Lexical subcategory of noun, existence of prefix/suffix, semantic class in thesaurus For No: “no”, “deno”, “madeno” binary variables

10 EBMT in action : Attribute distance & weight Attribute Distance: For “no”: d(“no”,”deno”)=1, d(“no”,”no”) = 0 For Noun1, Noun2, use thesaurus. Weight for each attribute: Timei 地名 [place]Deno での [in]Soudan 相談 [meeting] B in A (freq: 12/27) AB (4/27) B from A (2/27) BA (2/27) B to A (1/27) B in A (3/3)B (9/24) A’s B (1/24) … B on A (1/24)

11 Evaluation Corpus (Conversations re. Conference registration) 3000 words, 2550 examples Jacknife evaluation Ave success rate 78% (min 70%, max 89%) Success rate improves as examples are added Success rate for low-distance sentences are higher Failures due to: Lack of similar examples Retrieval of dissimilar examples due to current distance metric In practice: EBMT is used as a subsystem within Rule-based MT to handle special cases like “N1 no N2”

12 General EBMT Issues: Granularity for locating matches Sentence or sub-sentence? Sentence: Better quality translation Boundaries are easy to determine Harder to find a match Sub-sentence: Studies suggest this is how humans translate “Boundary friction” The handsome boy ate his breakfast -- Der schone Junge as seinen Fruhstuck I saw the handsome boy -- Ich sah den schonen Jungen The following slides are based from: H. Somers, 2003, “An Overview of EBMT,” in Recent Advances in Example-Based Machine Translation (ed. M. Carl, A. Way), Kluwer

13 General EBMT Issues: Suitability of Examples Some EBMT systems do not use raw corpus directly, but use manually-constructed examples or carefully-filtered set of real-world examples Real-world examples may contain: Examples that mutually reinforce each other (overgeneration) Examples that conflict Examples that mislead the distance metric Watashi wa kompyuta o kyoyosuru -- I share the use of a computer Watashi wa kuruma o tsukau -- I use a car Watashi wa dentaku o shiyosuru --> * I share the use of a calculator

14 General EBMT Issues: Matching String matching / IR-style matching “This is shown as A in the diagram” = “This is shown as B in the diagram” “The large paper tray holds 400 sheets of paper” =? “The small paper tray holds 300 sheets of paper” Matching by meaning: use thesaurus and distance based on semantic similarity Matching by structure: Tree edit distance, etc.

15 General EBMT Issues: Alignment and Recombination Once a set of examples close to the input are found, we need to: 1.Identify which portion of the associated translation corresponds to input (alignment) 2.Stitch together these fragments to create smooth ouput (recombination) Some interesting solutions: “Adaptation-guided retrieval”: scores an example based on both sentence similarity and ability to align/recombine well Statistical Language Model solution: Pangloss EBMT (RD Brown, 1996) Post-processing (c.f. der schone Jungen example)

16 Flavors of EBMT Different uses of EBMT: Full automatic translation system System component for handling difficult cases (ATR) EBMT as one engine in Multi-engine MT (Pangloss) Different approaches to EBMT: Run-time approach: Sub-sentential alignment of input sentence and mapping on translation examples are computed at run-time (translation knowledge lies implicit in the corpus) “Compile-time” approach: Explicity extracts translation patterns from corpora. Template-driven: uses taggers, analyzers, etc. Structural Representations: abstracts sentences into more structured representation, such as LF, dependency tree

17 Connections to other areas Statistical MT Rule-based MT Unit Selection Speech Synthesis Case-base Reasoning What do you think?