Phrase Extraction in PB-SMT Ankit K Srivastava NCLT/CNGL Presentation: May 6, 2009.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Dependency-Based Automatic Evaluation for Machine Translation Karolina Owczarzak, Josef van Genabith, Andy Way National Centre for Language Technology.
Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Hybridity in MT: Experiments on the Europarl Corpus Declan Groves 24 th May, NCLT Seminar Series 2006.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
TopicTrend By: Jovian Lin Discover Emerging and Novel Research Topics.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
© 2014 The MITRE Corporation. All rights reserved. Stacey Bailey and Keith Miller On the Value of Machine Translation Adaptation LREC Workshop: Automatic.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Statistical Machine Translation Part V - Advanced Topics Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.
Grammatical Machine Translation Stefan Riezler & John Maxwell.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
A Language Independent Method for Question Classification COLING 2004.
Dependency Tree-to-Dependency Tree Machine Translation November 4, 2011 Presented by: Jeffrey Flanigan (CMU) Lori Levin, Jaime Carbonell In collaboration.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Haitham Elmarakeby.  Speech recognition
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Improving a Statistical MT System with Automatically Learned Rewrite Rules Fei Xia and Michael McCord IBM T. J. Watson Research Center Yorktown Heights,
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
A CASE STUDY OF GERMAN INTO ENGLISH BY MACHINE TRANSLATION: MOSES EVALUATED USING MOSES FOR MERE MORTALS. Roger Haycock 
Neural Machine Translation
Statistical Machine Translation Part II: Word Alignments and EM
Approaches to Machine Translation
CSE 517 Natural Language Processing Winter 2015
An Introduction to the Government and Binding Theory
Ankit Srivastava CNGL, DCU Sergio Penkale CNGL, DCU
Semantic Parsing for Question Answering
Suggestions for Class Projects
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Approaches to Machine Translation
Statistical Machine Translation Papers from COLING 2004
A Path-based Transfer Model for Machine Translation
Statistical NLP Spring 2011
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presentation transcript:

Phrase Extraction in PB-SMT Ankit K Srivastava NCLT/CNGL Presentation: May 6, 2009

Phrase Extraction | Ankit | 6-May-09 2 About Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote

Phrase Extraction | Ankit | 6-May-09 3 PB-SMT Modeling Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote

Phrase Extraction | Ankit | 6-May-09 4 PB-SMT Process sequence of words as opposed to mere words Segment input, translate input, reorder output Translation model, Language Model, Decoder argmax e p(e|f) = argmax e p(f|e) p(e)

Phrase Extraction | Ankit | 6-May-09 5 Learning Phrase Translations Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote

Phrase Extraction | Ankit | 6-May-09 6 Extraction I Input is sentence-aligned parallel corpora Most approaches use word alignments Extract (learn) phrase pairs Build a phrase translation table

Phrase Extraction | Ankit | 6-May-09 7 Extraction II Get word alignments (src2tgt, tgt2src) Perform grow-diag-final heuristics Extract phrase pairs consistent with the word alignments Non-syntactic phrases :: STR [Koehn et al., ’03]

Phrase Extraction | Ankit | 6-May-09 8 Extraction III Sentence-aligned and word-aligned text Monolingual parsing of both SRC & TGT Align subtrees and extract string pairs Syntactic phrases

Phrase Extraction | Ankit | 6-May-09 9 Extraction IV Parse using constituency parser Phrases are syntactic constituents :: CON [Tinsley et al., ’07] (ROOT (S (NP (NNP Vinken)) (VP (MD will) (VP (VB join) (NP (DT the) (NN board)) (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP (NNP Nov) (CD 29))))))

Phrase Extraction | Ankit | 6-May Extraction V Parse using dependency parser Phrases have head-dependent relationships :: DEP [Hearne et al., ’08] HEADDEPENDENT joinVinken joinwill boardthe joinboard joinas directora directornonexecutive asdirector 29Nov join29

Phrase Extraction | Ankit | 6-May Extraction VI Numerous other phrase extractions Estimate phrase translations directly [Marcu & Wong ’02] Use heuristic other than grow-diag-final Use marker-based chunks [Groves & Way ’05] String-to-String translation models herein

Phrase Extraction | Ankit | 6-May Head Percolation and Phrase Extraction Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote

Phrase Extraction | Ankit | 6-May Percolation I It is straightforward to convert constituency tree to an unlabeled dependency tree [Gaifman ’65] Use head percolation tables to identify head child in a constituency representation[Magerman ’95] Dependency tree is obtained by recursively applying head child and non-head child heuristics [Xia & Palmer ’01]

Phrase Extraction | Ankit | 6-May Percolation II (NP (DT the) (NN board)) NP right NN/NNP/CD/JJ (NP-board (DT the) (NN board)) the is dependent on board

Phrase Extraction | Ankit | 6-May Percolation III (ROOT (S (NP (NNP Vinken)) (VP (MD will) (VP (VB join) (NP (DT the) (NN board)) (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP (NNP Nov) (CD 29)))))) HEADDEPENDENT joinVinken joinwill boardthe joinboard joinas directora directornonexecutive asdirector 29Nov join29 NP right NN / NNP / CD / JJ PP left IN / PP S right VP / S VP left VB / VP INPUT OUTPUT

Phrase Extraction | Ankit | 6-May Percolation IV cf. slide - Extraction III (syntactic phrases) Parse by applying head percolation tables on constituency-annotated trees Align trees, extract surface chunks Phrases have head-dependent relations :: PERC

Phrase Extraction | Ankit | 6-May Tools, Resources, and MT System Performance Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote

Phrase Extraction | Ankit | 6-May System setup I RESOURCE TYPENAMEDETAILS CorporaJOC EUROPARL Chiao et al., ‘06 Koehn, ’05 ParsersBerkeley Parser Syntex Parser Head Percolation Petrov et al., ’06 Bourigault et al.,’05 Xia & Palmer ‘01 Alignment ToolsGIZA++ Phrase Heuristics Tree Aligner Och & Ney ’03 Koehn et al., ‘03 Zhechev ‘09 Lang ModelingSRILM ToolkitStolcke ‘02 DecoderMosesKoehn et al., ‘07 Evaluation ScriptsBLEU NIST METEOR, WER, PER Papineni et al., ’02, Doddington ’02, Banerjee & Lavie ‘05

Phrase Extraction | Ankit | 6-May System setup II All 4 “systems” are run with the same configurations (with MERT tuning) on 2 different datasets They only differ in their phrase tables (# chunks) CORPORATRAINDEVTEST JOC7, EUROPARL100,0001,8892,000 CORPORASTRCONDEPPERC JOC236 K79 K74 K72 K EUROPARL2145 K663 K583K565 K

Phrase Extraction | Ankit | 6-May System setup III SYSTEMBLEUNISTMETEORWERPER On JOC (7K) data On EUROPARL (100K) data STR CON DEP PERC

Phrase Extraction | Ankit | 6-May Analyzing Str, Con, Dep, and Perc Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote Analysis w.r.t. Europarl data only

Phrase Extraction | Ankit | 6-May Analysis I No. of common & unique phrase pairs Maybe we should combine the phrase tables… Phrase Types Common to both Unique in 1 st type Unique in 2 nd type DEP & PERC369K213K195K CON & PERC492K171K72K STR & PERC127K2,018K437K CON & DEP391K271K191K STR & DEP128K2,016K454K STR & CON144K2,000K518K

Phrase Extraction | Ankit | 6-May Analysis II Concatenate phrase tables and re-estimate probabilities 15 different phrase table combinations: ∑4Cr, 1≤r≤4 STR + CON + DEP + PERC UNIBITRIQUAD SSC, SD, SPSCD, SCP, SDPSCDP CCD, CPCDP- DDP-- P---

Phrase Extraction | Ankit | 6-May Analysis III All 15 “systems” are run with the same configurations (with MERT tuning) They only differ in their phrase tables This is combining at “translation model” level

Phrase Extraction | Ankit | 6-May Analysis IV Performance on Europarl

Phrase Extraction | Ankit | 6-May Analysis V REF: Does the commission intend to seek more transparency in this area? S: Will the commission ensure that more than transparency in this respect? C: The commission will the commission ensure greater transparency in this respect? D: The commission will the commission ensure greater transparency in this respect? P: Does the commission intend to ensure greater transparency in this regard? SC: Will the commission ensure that more transparent in this respect? SD: Will the commission ensure that more transparent in this respect? SP: Does the commission intend to take to ensure that more than openness in this regard? CD: The commission will the commission ensure greater transparency in this respect? CP: The commission will the commission ensure greater transparency in this respect? DP: The commission will the commission ensure greater transparency in this respect? SCD: Does the commission intend to take to ensure that more transparent commit? SCP: Does the commission intend to take in this regard to ensure greater transparency? SDP: Does the commission intend to take in this regard to ensure greater transparency? CDP: The commission will the commission ensure greater transparency in this respect? SCDP: Does the commission intend to take to ensure that more transparent suspected?

Phrase Extraction | Ankit | 6-May Analysis VI Which phrases does the decoder use? Decoder trace on S+C+D+P Out of 11,748 phrases: S(5204); C(2441); D(2319); P(2368)

Phrase Extraction | Ankit | 6-May Analysis VII Automatic per-sentence evaluation using TER on testset of 2000 sentences [Snover et al., ’06] C (1120); P (331); D (301); S (248) Manual per-sentence evaluation on a random testset of 100 sentences using pairwise system comparison P=C (27%); P>D (5%); SC>SCP(11%)

Phrase Extraction | Ankit | 6-May Analysis VIII Treat the different phrase table combinations as individual MT systems Perform system combination using MBR-CN framework [Du et al., 2009] This is combining at “system” level SYSTEMBLEUNISTMETEORWERPER STR CON DEP PERC ||MBR|| ||CN||

Phrase Extraction | Ankit | 6-May Analysis IX Using Moses baseline phrases (STR) is essential for coverage. SIZE matters! However, adding any system to STR increases baseline score. Symbiotic! Hence, do not replace STR, but supplement it.

Phrase Extraction | Ankit | 6-May Analysis X CON seems to be the best combination with STR (S+C seems to be the best performing system) Has most common chunks with PERC Does PERC harm a CON system – needs more analysis (bias between CON & PERC)

Phrase Extraction | Ankit | 6-May Analysis XI DEP is different from PERC chunks, despite being equivalent in syntactic representation DEP can be substituted by PERC Difference between knowledge induced from dependency and constituency. A different aligner?

Phrase Extraction | Ankit | 6-May Analysis XII PERC is a unique knowledge source. Is it just a simple case of parser combination? Sometimes, it helps. Needs more work on finding connection with CON / DEP

Phrase Extraction | Ankit | 6-May Customizing Moses for syntax- supplemented phrase tables Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote

Phrase Extraction | Ankit | 6-May Moses customization Incorporating syntax (CON, DEP, PERC) Reordering model Phrase scoring (new features) Decoder Parameters Log-linear combination of T-tables Good phrase translations may be lost by the decoder. How can we ensure they remain intact? M OSES

Phrase Extraction | Ankit | 6-May Work in Progress and Future Plans Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote

Phrase Extraction | Ankit | 6-May Ongoing & future work Scaling (data size) (lang. pair) (lang. dir.) Bias between CON & PERC Combining Phrase pairs Combining Systems Classify performance into sentence types Improve quality of phrase pairs in PBSMT

Phrase Extraction | Ankit | 6-May Endnote… Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote

Phrase Extraction | Ankit | 6-May Endnote Explored 3 linguistically motivated phrase extractions against Moses phrases Improves baseline. Highest recorded is 10% relative increase in BLEU on 100K Rather than pursuing ONE way, combine options Need more analysis of supplementing phrase table with multiple syntactic T-tables

Phrase Extraction | Ankit | 6-May Thank You!

Phrase Extraction | Ankit | 6-May Phrase Extraction in PB-SMT P hrase-based Statistical Machine Translation (PB-SMT) models – the most widely researched paradigm in MT today – rely heavily on the quality of phrase pairs induced from large amounts of training data. There are numerous methods for extracting these phrase translations from parallel corpora. In this talk I will describe phrase pairs induced from percolated dependencies and contrast them with three pre-existing phrase extractions. I will also present the performance of the individual phrase tables and their combinations in a PB-SMT system. I will then conclude with ongoing experiments and future research directions.

Phrase Extraction | Ankit | 6-May Thanks! Andy Way Patrik Lambert John Tinsley Sylwia Ozdowska Ventisislav Zhechev Sergio Penkale Jinhua Du