A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
1/13 Parsing III Probabilistic Parsing and Conclusions.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Corpora and Translation Parallel corpora Statistical MT (not to mention: Corpus of translated text, for translation studies)
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
Statistical Machine Translation Part IX – Better Word Alignment, Morphology and Syntax Alexander Fraser ICL, U. Heidelberg CIS, LMU München
Microsoft Research Faculty Summit Robert Moore Principal Researcher Microsoft Research.
PFA Node Alignment Algorithm Consider the parse trees of a Chinese-English parallel pair of sentences.
Statistical Machine Translation Part VIII – Log-Linear Models Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Syntax for MT EECS 767 Feb. 1, Outline Motivation Syntax-based translation model  Formalization  Training Using syntax in MT  Using multiple.
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
Statistical Machine Translation Part V - Advanced Topics Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
Grammatical Machine Translation Stefan Riezler & John Maxwell.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart
SYNTAX BASED MACHINE TRANSLATION UNDER GUIDANCE OF PROF PUSHPAK BHATTACHARYYA PRESENTED BY ROUVEN R Ӧ HRIG (10V05101) ERANKI KIRAN ( ) SRIHARSA.
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.
Statistical Machine Translation Part V – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München Seminar: Open Source.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Supertagging CMSC Natural Language Processing January 31, 2006.
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
NLP. Machine Translation Tree-to-tree – Yamada and Knight Phrase-based – Och and Ney Syntax-based – Och et al. Alignment templates – Och and Ney.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
Improving a Statistical MT System with Automatically Learned Rewrite Rules Fei Xia and Michael McCord IBM T. J. Watson Research Center Yorktown Heights,
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Natural Language Processing Vasile Rus
Approaches to Machine Translation
PRESENTED BY: PEAR A BHUIYAN
Approaches to Machine Translation
Statistical Machine Translation Papers from COLING 2004
The XMU SMT System for IWSLT 2007
Statistical NLP Spring 2011
Presentation transcript:

A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted from referenced papers

Outline Phrase Order in Phrase-based Statistical MT Using synchronous CFGs to solve the issue Integrating the idea into an SMT system Results Conclusions Future work My Thoughts/Questions

Phrase Order in Phrase-based Statistical MT Example from [Chiang2005]:Chiang2005

Phrase Order in Phrase-based Statistical MT Translation of the example with a phrase-based SMT system (Pharao, [Koehn2004])Koehn2004 [Aozhou] [shi] [yu] [Bei Han] [you] [bangjiao] 1 [de shaoshu guojia zhiyi] [Australia] [is] [dipl. rels.] 1 [with] [North Korea] [is] [one of the few countries] Uses learned phrase translations Accomplishes local phrase-reordering Fails on overall reordering of phrases Not only applicable to Chinese, but also Japanese (SOV order), German (scrambling)

Idea: Rules for Subphrases Motivation “phrases are good for learning reorderings of words, we can use them to learn reorderings of phrases as well” Rules with “placeholders” for subphrases – Learned automatically from bitext without syntactical annotation Formally syntax-based but not linguistically syntax-based “the result sometimes resembles a syntactician’s grammar but often does not”

Synchronous CFGs Developed in the 60’s for programming- language compilation [Aho1969]Aho1969 Separate tutorial by Chiang describing them [Chiang2005b]Chiang2005b In NLP synchronous CFGs have been used for –Machine translation –Semantic interpretation

Synchronous CFGs Like CFGs, but production have two right hand sides –Source side –Target side –Related through linked non-terminal symbols E.g. VP → One-to-one correspondence Non-terminal of type X is always linked to same type Productions applied in parallel to both sides to linked non-terminals

Synchronous CFGs

Limitations –No Chomsky normal form Has implications for complexity of decoder –Only limited closure under composition –Sister-reordering only

Model Using the log-linear model [Och2002]Och2002 –Presented by Bill last week

Model – Rule Features P(γ|α) and P(α|γ) Lexical weights P w (γ|α) and P w (α|γ) –Estimation how we words in α translate to words in γ Phrase penalty exp(1) –Allows model to learn longer/shorter derivations Exception: glue rule weights –w(S → ) =1 –w(S → ) = exp(-λ g ) –Λ g controls model’s preference for hierarchical phrases over serial phrase combination

Model – Additional Features Separated out from rule weights –Notational convenience –Conceptually cleaner (necessary for polynominal-time decoding) Derivation D –Set of triples : apply grammar rule r for rewriting a non-terminal in span f(D) from i to j –Ambiguous

Training Training is starting from a symetrical, word-aligned corpus Adopted from [Och2004] and [Koehn2003]Och2004Koehn2003 –How to get from a one-directional alignment to a symetrical alignment –How to find initial phrase pairs alternative would be Marcu & Wong 2002 that Ping presented [Marcu2002]Marcu2002

Training

Scheme leads unfortunately –To a large number of rules –With false ambiguity Grammar is filtered to –Balance grammar size and performance –Five filter criteria e.g. produce only two non-terminals Initial phrase length limited to 10

Decoding Our good old friend - the CKY parser Enhanced with –Beam search –Postprocessor to map French derivations to English derivations

Results Baseline –Pharao [Koehn2003], [Koehn2004]Koehn2003Koehn2004 –Minimum error rate training on BLEU measure Hierarchical model –2.2 Million rules after filtering down from 24 Million –7.5% relative improvement Additional constituent feature –Additional feature favoring syntactic parses –Trained on 250k sentences Penn Chinese Treebank –Improved accuracy only in development set

Learned Feature Weights Word = word penalty Phr = phrase penalty (pp) λ g penalizes glue rules much less than λ pp does regular rules –i.e. “This suggests that the model will prefer serial combination of phrases, unless some other factor supports the use of hierarchical phrases ”

Conclusions Hierarchical phrase pairs that can be learned data without syntactically annotation Hierarchical phrase pairs improve translation accuracy significantly Added syntactic information (constituent feature did not provide statistically significant gain

Future Work Move to more syntactically motivated grammar Reducing grammar size to allow more aggressive training settings

My Thoughts/Questions Really interesting approach to bring “syntactic” information into SMT Example sentence was not translated correctly –Missing words are problematic Can phrase reordering be also learned by lexicalized phrase reordering models [Och2004]?Och2004 Why did constituent feature only improve accuracy in development set, but not in test set? Does data sparseness influence the learned feature weights? What syntactical features are already built into Pharao?

References [Aho1969] Aho, A. V. and J. D. Ullman Syntax directed translations and the pushdown assembler. Journal of Computer and System Sciences, 3:37–56. [Chiang2005]: Chiang, David A Hierarchical Phrase-Based Model for Statistical Machine Translation. In Proceedings of ACL 2005, pages 263–270. [Chiang2005b]: [Koehn2003]: Koehn, Philipp Noun Phrase Translation. Ph.D. thesis, University of Southern California. [Koehn2004]: Koehn, Phillip Pharaoh: a beam search decoder for phrase- based statistical machine translation models. In Proceedings of the Sixth Conference of the Association for Machine Translation in the Americas, pages 115–124. [Marcu2002]: Marcu, Daniel and William Wong A phrasebased, joint probability model for statistical machine translation. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 133–139. [Och 2002]: Och, Franz Josef and Hermann Ney Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40 th Annual Meeting of the ACL, pages 295–302. [Och2004]: Och, Franz Josef, Hermann Ney The alignment template approach to statistical machine translation. Computational Linguistics, 30:417–449.