Wrapper Syntax for Example-Based Machine Translation Karolina Owczarzak, Bart Mellebeek, Declan Groves, Josef Van Genabith, Andy Way National Centre for.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
Dependency-Based Automatic Evaluation for Machine Translation Karolina Owczarzak, Josef van Genabith, Andy Way National Centre for Language Technology.
Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009.
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Semantic Role Labeling Abdul-Lateef Yussiff
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
LING 581: Advanced Computational Linguistics Lecture Notes January 19th.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Introduction to treebanks Session 1: 7/08/
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Hybridity in MT: Experiments on the Europarl Corpus Declan Groves 24 th May, NCLT Seminar Series 2006.
Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.
1 MT in the NCLT Andy Way NCLT, School of Computing, Dublin City University, Dublin 9, Ireland
1 Josef van Genabith & Andy Way TransBooster ( ) LaDEva: Labelled Dependency-Based MT Evaluation ( ) GramLab ( ) Previous MT Work.
Machine Translation via Dependency Transfer Philip Resnik University of Maryland DoD MURI award in collaboration with JHU: Bootstrapping Out of the Multilingual.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Evaluating an MT French / English System Widad Mustafa El Hadi Ismaïl Timimi Université de Lille III Marianne Dabbadie LexiQuest - Paris.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
12/06/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Parsing Dr. Jan Hajič CS Dept., Johns Hopkins Univ.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
National Centre for Language Technology Comparing Example-Based & Statistical Machine Translation Andy Way*† Nano Gough*, Declan Groves† National Centre.
February 2006Machine Translation II.21 Postgraduate Diploma In Translation Example Based Machine Translation Statistical Machine Translation.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
Improving a Statistical MT System with Automatically Learned Rewrite Rules Fei Xia and Michael McCord IBM T. J. Watson Research Center Yorktown Heights,
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Statistical Natural Language Parsing Parsing: The rise of data and statistics.
Is Neural Machine Translation the New State of the Art?
Approaches to Machine Translation
PRESENTED BY: PEAR A BHUIYAN
[A Contrastive Study of Syntacto-Semantic Dependencies]
Ankit Srivastava CNGL, DCU Sergio Penkale CNGL, DCU
Authorship Attribution Using Probabilistic Context-Free Grammars
Semantic Parsing for Question Answering
Statistical NLP: Lecture 13
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Approaches to Machine Translation
Machine Translation(MT)
Presentation transcript:

Wrapper Syntax for Example-Based Machine Translation Karolina Owczarzak, Bart Mellebeek, Declan Groves, Josef Van Genabith, Andy Way National Centre for Language Technology School of Computing Dublin City University

Overview TransBooster – wrapper technology for MT –motivation –decomposition process –variables and template contexts –recomposition Example-Based Machine Translation –marker-based EBMT Experiment –English-Spanish –Europarl, Wall Street Journal section of Penn II Treebank –automatic and manual evaluation Comparison with previous experiments

TransBooster – wrapper technology for MT Assumption: MT systems perform better at translating short sentences than long ones. Decompose long sentences into shorter and syntactically simpler chunks, send to translation, recompose on output Decomposition linguistically guided by syntactic parse of the sentence

TransBooster – wrapper technology for MT TransBooster technology is universal and can be applied to any MT system Experiments to date: –TB and Rule-Based MT (Mellebeek et al., 2005a,b) –TB and Statistical MT (Mellebeek et al., 2006a) –TB and Multi-Engine MT (Mellebeek et al., 2006b) TransBooster outperforms baseline MT systems

TransBooster – decomposition Input – syntactically parsed sentence (Penn II format) Decompose into pivot and satellites –pivot: usually main predicate (plus additional material) –satellites: arguments and adjuncts Recursively decompose satellites if longer than x leaves Replace satellites around pivot with variables –static: simple same-type phrases with known translation –dynamic: simplified version of original satellites –send off to translation Insert each satellite into a template context –static: simple predicate with known translation –dynamic: simpler version of original clause (pivot + simplified arguments, no adjuncts) –send off to translation

TransBooster – decomposition example (S (NP (NP (DT the) (NN chairman)) (,,) (NP (NP (DT a) (JJ long-time) (NN rival)) (PP (IN of) (NP (NNP Bill) (NNP Gates)))) (,,)) (VP (VBZ likes) (NP (ADJP (JJ fast) (CC and) (JJ confidential)) (NNS deals))) (..)) [The chairman, a long-time rival of Bill Gates,] ARG1 [likes] pivot [fast and confidential deals] ARG2. [The chairman] V1 [likes] pivot [deals] V2. [The chairman, a long-time rival of Bill Gates,] ARG1 [likes deals] V1. [The chairman likes] V1 [fast and confidential deals] ARG2. [The man] V1 [likes] pivot [cars] V2. [The chairman, a long-time rival of Bill Gates,] ARG1 [is sleeping] V1. [The man sees] V1 [fast and confidential deals] ARG2. MT engine

TransBooster – recomposition MT output: a set of translations with dynamic and static variables and contexts for a sentence S Remove translations of dynamic variables and contexts from translation of S If unsuccessful, back off to translation with static variables and contexts, remove those Recombine translated pivot and satellites into output sentence

TransBooster – recomposition example [The chairman] V1 [likes] pivot [deals] V2. -> El presidente tiene gusto de repartos. [The chairman, a long-time rival of Bill Gates,] ARG1 [likes deals] V1. -> El presidente, un rival de largo plazo de Bill Gates, tiene gusto de repartos. [The chairman likes] V1 [fast and confidential deals] ARG2. -> El presidente tiene gusto de repartos rápidos y confidenciales. [The man] V1 [likes] pivot [cars] V2. -> El hombre tiene gusto de automóviles. [The chairman, a long-time rival of Bill Gates,] ARG1 [is sleeping] V1. -> El presidente, un rival de largo plazo de Bill Gates, está durmiendo. [The man sees] V1 [fast and confidential deals] ARG2. -> El hombre ve repartos rápidos y confidenciales. [El presidente, un rival de largo plazo de Bill Gates,] [tiene gusto de] [repartos rápidos y confidenciales]. Original translation: El presidente, rival de largo plazo de Bill Gates, gustos ayuna y los repartos confidenciales. The chairman, a long-time rival of Bill Gates, likes fast and confidential deals.

EBMT – Overview An aligned bilingual corpus Input text is matched against this corpus The best match is found and a translation is produced French F1 F2 F3 F4 EX (input) search F2 F4 FX (output) English E1 E2 E3 E4 Given in corpus John went to school  Jean est allé à l’école The butcher’s is next to the baker’s  La boucherie est à côté de la boulangerie Isolate useful fragments John went to  Jean est allé à the baker’s  la boulangerie We can now translate John went to the baker’s  Jean est allé à la boulangerie

EBMT – Marker-Based Chunking = {the,a,these……} = {le,la,l’,une,un,ces…..} = {on, of …} = {sur, d’..} English phrase : on virtually all uses of asbestos French translation: sur virtuellement tous usages d’asbeste on virtually all uses of asbestos sur virtuellement tous usages d’ asbeste Marker Chunks: on virtually : sur virtuellement all uses : tous usages of asbestos : d’asbeste Lexical Chunks: on : sur virtually : virtuellement all : tous uses : usages of : d’ asbestos : asbeste

EBMT – System Overview

Experiment English -> Spanish Two test sets: –Wall Street Journal section of Penn II Treebank 800 sentences –Europarl 800 sentences “Out-of-domain” factor: –TransBooster developed on perfect Penn II trees –EBMT trained on 958K English-Spanish Europarl sentences

Experiment – Results Results for EBMT vs TransBooster on 741-sentence test set from Europarl. Europarl BLEUNIST EBMT TransBooster Percent of Baseline101%100.2% Wall Street Journal BLEUNIST EBMT TransBooster Percent of Baseline103.8%100.5% Results for EBMT vs TransBooster on 800-sentence test set from Penn II Treebank. Automatic evaluation

Experiment - Results Manual evaluation 100 randomly selected sentences from EP test set: –source English sentence –EBMT translation –EBMT + TransBooster translation 3 judges, native speakers of Spanish fluent in English Accuracy and fluency: relative scale for comparing the two translations Inter-judge agreement (Kappa): Fluency > 0.948, Accuracy > FluencyAccuracy TB > EBMT35.33%35% EBMT > TB16%19.33% Absolute quality gain when using TransBooster: Fluency 19.33% of sentences Accuracy 15.67% of sentences

Experiment – Results TB improvements: Example 1 Source: women have decided that they wish to work, that they wish to make their work compatible with their family life. EBMT: hemos decidido su deseo de trabajar, su deseo de hacer su trabajo compatible con su vida familiar. empresarias TB: mujeres han decidido su deseo de trabajar, su deseo de hacer su trabajo compatible con su vida familiar. Example 2 Source: if this global warming continues, then part of the territory of the eu member states will become sea or desert. EBMT: si esto continúa calentamiento global, tanto dentro del territorio de los estados miembros tendrán tornarse altamar o desértico TB: si esto calentamiento global perdurará, entonces parte del territorio de los estados miembros de la unión europea tendrán tornarse altamar o desértico

Previous experiments TransBooster vs. SMT on 800-sentence test set from Europarl. TB vs. SMT: EPBLEUNIST SMT TransBooster % of Baseline103.3%100.6% TB vs. RBMT: WSJBLEUNIST Rule-Based MT TransBooster % of Baseline101.7%100.6% Results for TransBooster vs. Rule-Based MT on 800-sentence test set from Penn II Treebank. TB vs. SMT: WSJBLEUNIST SMT TransBooster % of Baseline102.7%99.7% TransBooster vs. SMT on 800-sentence test set from Penn II Treebank. TB vs. EBMT: EPBLEUNIST EBMT TransBooster % of Baseline101%100.2% TransBooster vs. EBMT on 800-sentence test set from Europarl. TB vs. EBMT: WSJBLEUNIST EBMT TransBooster % of Baseline103.8%100.5% TransBooster vs. EBMT on 800-sentence test set from Penn II Treebank.

Previous experiments TransBooster vs. SMT on 800-sentence test set from Europarl. TB vs. SMT: EPBLEUNIST SMT TransBooster % of Baseline103.3%100.6% TB vs. EBMT: EPBLEUNIST EBMT TransBooster % of Baseline101%100.2% TransBooster vs. EBMT on 800-sentence test set from Europarl.

Previous experiments TB vs. RBMT: WSJBLEUNIST Rule-Based MT TransBooster % of Baseline101.7%100.6% TransBooster vs. Rule-Based MT on 800-sentence test set from Penn II Treebank. TB vs. SMT: WSJBLEUNIST SMT TransBooster % of Baseline102.7%99.7% TransBooster vs. SMT on 800-sentence test set from Penn II Treebank. TB vs. EBMT: WSJBLEUNIST EBMT TransBooster % of Baseline103.8%100.5% TransBooster vs. EBMT on 800-sentence test set from Penn II Treebank.

Summary TransBooster is a universal technology to decompose and recompose MT text Net improvement in translation quality against EBMT: Fluency 19.33% of sentences Accuracy 15.67% of sentences Successful experiments to date: rule-based MT, phrase-based SMT, multi-engine MT, EBMT Journal article in preparation

Thank You