Ankit Srivastava CNGL, DCU Sergio Penkale CNGL, DCU

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Hybridity in MT: Experiments on the Europarl Corpus Declan Groves 24 th May, NCLT Seminar Series 2006.
Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
Current Trends in MT Andy Way NCLT, School of Computing, Dublin City University, Dublin 9, Ireland
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Corpora and Translation Parallel corpora Statistical MT (not to mention: Corpus of translated text, for translation studies)
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU
PFA Node Alignment Algorithm Consider the parse trees of a Chinese-English parallel pair of sentences.
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Achieving Domain Specificity in SMT without Over Siloing William Lewis, Chris Wendt, David Bullock Microsoft Research Machine Translation.
Phrase Extraction in PB-SMT Ankit K Srivastava NCLT/CNGL Presentation: May 6, 2009.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
Korea Maritime and Ocean University NLP Jung Tae LEE
National Centre for Language Technology Comparing Example-Based & Statistical Machine Translation Andy Way*† Nano Gough*, Declan Groves† National Centre.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
Improving a Statistical MT System with Automatically Learned Rewrite Rules Fei Xia and Michael McCord IBM T. J. Watson Research Center Yorktown Heights,
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.
Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Is Neural Machine Translation the New State of the Art?
A CASE STUDY OF GERMAN INTO ENGLISH BY MACHINE TRANSLATION: MOSES EVALUATED USING MOSES FOR MERE MORTALS. Roger Haycock 
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
Statistical Machine Translation Part II: Word Alignments and EM
Approaches to Machine Translation
PRESENTED BY: PEAR A BHUIYAN
CSE 517 Natural Language Processing Winter 2015
Alexander Fraser CIS, LMU München Machine Translation
Suggestions for Class Projects
--Mengxue Zhang, Qingyang Li
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Deep Learning based Machine Translation
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Approaches to Machine Translation
Memory-augmented Chinese-Uyghur Neural Machine Translation
Statistical Machine Translation Papers from COLING 2004
A Path-based Transfer Model for Machine Translation
Statistical NLP Spring 2011
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Statistical Machine Translation Part VI – Phrase-based Decoding
Presentation transcript:

Evaluating Syntax-Driven Approaches to Phrase Extraction for Machine Translation Ankit Srivastava CNGL, DCU Sergio Penkale CNGL, DCU Declan Groves Traslán John Tinsley NCLT, DCU 3rd International Workshop on Example-Based Machine Translation, 12-13 November, 2009: Dublin, Ireland

about EBMT-influenced data sources used in a PB-SMT model (Moses) Marker Hypothesis Parallel Treebanks Lessons learned from work carried out over a number of years at DCU Focus on techniques for supplementing Moses phrases with syntactically motivated phrases PB-SMT, EBMT Relate this work to the theme of the workshop What this talk is about What this talk is not about (just briefly touch on the actual phrase extraction method, more focus on combination and using them in PB-SMT) First a brief overview of the different phrase extraction in training MT

pb-smt system Moses framework [Koehn et al., 2007] Translation model Heuristics-based phrase extraction from bidirectional word alignments Syntactically-motivated phrase extraction: marker / treebank LANGUAGE MODEL TRANSLATION TUNING DECODING (MOSES) POST PROCESSING & EVALUATION TARGET SOURCE We explore alternate ways of extracting phrases in the translation model and then combining them with the pbsmt (baseline) phrases. The next slide will cover an example of the moses way of extracting phrases.

moses phrases: an example Official journal of the European Communities Journal officiel des Communautés européennes Official journal ↔ Journal officiel Official journal of ↔ Journal officiel des Official journal of the \ ↔ Journal officiel des \ European Communities Communautés européennes of ↔ des of the European Communities ↔ des Communautés européennes the European Communities ↔ Communautés européennes European ↔ européennes

marker-based Chunk sentences on encountering a ‘marker’ word Founded on the Marker Hypothesis [Green, 1979] Marker words are closed class of lexemes / morphemes Each marker word associated with a marker category (tag) 7 marker categories identified. E.g. DET, PREP, PRON Each marker chunk must contain at least 1 non-marker word Align bilingual marker chunks Use marker tag and relative positions in the sentence Use cognate and MI scores Obtain marker-based phrase pairs

marker-based: an example That is almost a personal record for me this autumn C’ est pratiquement un record personnel pour moi cet automne <DET>That is almost <DET>a personal record <PREP>for <PRON>me <DET>this autumn <DET>C’ est pratiquement <DET>un record personnel <PREP>pour <PRON>moi <DET>cet automne <DET>That is almost <DET>a personal record <PREP>for me this autumn <DET>C’ est pratiquement <DET>un record personnel <PREP>pour moi cet automne That is almost ↔ C’ est pratiquement a personal record ↔ un record personnel for me this autumn ↔ pour moi cet automne Identify marker words in the sentence Each marker word associated with a marker category (tag) Get marker chunks: Each marker chunk must contain at least 1 non-marker word Align bilingual marker chunks Obtain marker-based phrase pairs

marker-based: direct Merging phrase pairs in a single phrase table Fr-En Europarl data: (3-gram lang model, Pharaoh decoder) System performance as training data increases 13% new phrases added via marker-based phrases Fr-En En-Fr Es-En Europarl data: (200K train, 5-gram lang model, Moses decoder)

treebank-based Monolingual parsing of sentences Parse both sides Requires constituency-structure parsers Align bilingual parse trees Requires a sub-tree aligner [Zhechev & Way, 2008] Get aligned phrases Extract surface-level chunks Also implemented using dependency structure Using off-the-shelf dependency parsers Head percolation of constituency trees [Magerman, 1995]

treebank-based: an example [con] the green witch la bruja verde the green witch ↔ la bruja verde green witch ↔ bruja verde the ↔ la green ↔ verde witch ↔ bruja

treebank-based: direct En-Es Europarl data: (700K train, 5-gram lang model, Moses decoder) Moses (Baseline), Constituency (Syntax) Merging phrase pairs in a single phrase table 24M phrases in Baseline Vs 6M phrases in Syntax 4.87% overlap between Moses and Syntax 16.79% new phrases added

treebank-based: direct Fr-En Europarl data: (100K train, 5-gram lang model, Moses decoder) Moses (B), Constituency (C), Dependency (D), Percolated (P) Merging phrase pairs in a single phrase table (1 / 2 / 3 / 4) Compare sizes of B with C/D/P Overlap between tables

recap Baseline system (Moses): Alternate phrase pairs source_phrase ||| target_phrase ||| [feature_value] Alternate phrase pairs Marker-based: src ||| tgt Treebank-based (con, dep): src ||| tgt Experiments on direct combination Merging phrase pairs and re-estimating probabilities Other ways to supplement the Moses phrase table with alternate phrase segmentation approaches

combining strategies Direct combination Weighted combination Prioritised combination Feature-based System combination

weighted combination Instead of simple merging, add ‘n’ copies of a type of phrase pair This modifies the relative frequency of the syntax-based phrase pairs Generally does not improve over direct combination Experiments on adding n copies of marker-based phrases Experiments on adding n copies of constituency-based phrases Half-weights helps… reducing probability of low-quality syntax phrases while not affecting the intersection (compare with baseline prioritized)

prioritized combination A phrase pair consists of src ||| tgt ||| [feature_value] Alternative to direct combination Prioritize set A over set B Add only those B phrase pairs when src not in A Experiments on baseline & constituency No improvements over direct combination

feature-based combination A phrase pair consists of src ||| tgt ||| [feature_value] Add a new feature Binary: type of phrase pair MERT tuning assigns weight like other features Merging like direct combination Experiments on marker-based Improvements in translation quality

system combination So far, all methods have altered how phrases merged into one phrase table An alternative is to combine translated sentences (after decoding) rather than phrase pairs (during training) Use MBR-CN system combination [Du et al., 2009] Experiments on B/C/D/P Output sentences are unique enough to profit 7.16% relative (4 systems) , 12.3% relative (15 systems)

lessons learned Syntax-based phase pairs are a unique knowledge source Overlap between phrase pairs Using only syntax-based phrases deteriorates Large coverage of PB-SMT method Supplementing PB-SMT with syntax-based helps Explored 5 different strategies for combining System combination helps the most Decrease in gains as training data increases

endnote Examined a number of different phrase segmentation approaches for MT Explored ways of using linguistic information (borrowed from EBMT research) in a PB-SMT system Level of improvement is dependent on amount of training data Useful for languages with limited training data and MT systems with a smaller footprint Difficult to improve the PB-SMT alignment / extraction / decoding pipeline without significant remodeling Tabulate relative improvement BLEU for each method? EBMT knowledge base creation: marker, treebank

thank you! Questions? Contact info Declan: dgroves @ traslan.ie Sergio: spenkale @ computing.dcu.ie John: jtinsley @ computing.dcu.ie Ankit: asrivastava @ computing.dcu.ie Acknowledgements?

Bonus Slide: Sample Output REF: Does the commission intend to seek more transparency in this area? MOSES: Will the commission ensure that more than transparency in this respect? CON: The commission will the commission ensure greater transparency in this respect? DEP: The commission will the commission ensure greater transparency in this respect? PERC: Does the commission intend to ensure greater transparency in this regard?