Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
Novel Reordering Approaches in Phrase-Based Statistical Machine Translation S. Kanthak, D. Vilar, E. Matusov, R. Zens & H. Ney ACL Workshop on Building.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Features and Unification
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation Linguistics 580 (Machine Translation) Scott Drellishak, 2/21/2006.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Alignment by Bilingual Generation and Monolingual Derivation Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Bayesian Subtree Alignment Model based on Dependency Trees Toshiaki Nakazawa Sadao Kurohashi Kyoto University 1 IJCNLP2011.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.
Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information.
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Improving a Statistical MT System with Automatically Learned Rewrite Rules Fei Xia and Michael McCord IBM T. J. Watson Research Center Yorktown Heights,
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Towards Syntactically Constrained Statistical Word Alignment Greg Hanneman : Advanced Machine Translation Seminar April 30, 2008.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Approaches to Machine Translation
David Mareček and Zdeněk Žabokrtský
Statistical NLP Spring 2011
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Statistical Machine Translation Papers from COLING 2004
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
David Kauchak CS159 – Spring 2019
Presentation transcript:

Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro Sumita, and Keiichi Tokuda Nagoya Institute of Technology National Institute of Information and Communications Technology Kinki University ATR Spoken Language Communication Research Labs ,3 2,4 1,2

Background (1/2)  Phrase-based statistical machine translation  Can model local word reordering Short idioms Insertions and deletions of words  Errors in global word reordering  Word reordering constraint technique  Linguistically syntax based approach Source tree, target tree, both tree structures  Formal constraints on word permutations IBM distortion, lexical reordering model, ITG 2

Background (2/2)  Imposing a source tree on ITG (IST-ITG)  Extension of ITG constraints  Introduce a source sentence tree structure  Cannot evaluate the accuracy of the target word orders  Reordering model using syntactic information  Extension of IST-ITG constraints  Rotation of source-side parse-tree  Can be briefly introduce to the phrase-based translation system 3

Outline  Background  ITG & IST-ITG constraints  Proposed reordering model  Training of the proposed model  Decoding using the proposed model  Experiments  Conclusions and future work 4

Inversion transduction grammar  ITG constraints  All possible binary tree structures are generated from the source word sequence  The target sentence is obtained by rotating any node of the generated binary trees  Can reduce the number of target word orders  Not consider the tree structure instance 5

Imposing source tree on ITG  Directly introduce a source sentence tree structure to ITG 6 Source sentence tree structure This is a pen Source sentence Thisisapen The target sentence is obtained by rotating any node of source sentence tree structure The number of word orders is reduced to

Non-binary tree  The parsing results sometimes produce non- binary trees 7 ABCDE cdedceecd ceddecedc # of orders in non-binary subtree is Any reordering of child nodes in non-binary subtree is allowed

Problem of IST-ITG  Cannot evaluate the accuracy of the target word reordering ⇒ Assign an equal probability to all rotations 8 Propose reordering model using syntactic information Equal probability : source sentence

Outline  Background  ITG & IST-ITG constraints  Proposed reordering model  Training of the proposed model  Decoding using the proposed model  Experiments  Conclusions and future work 9

 Rotation of each subtree type is modeled Abstract of proposed method 10 This is a pen Source sentence Reordering probability : monotone or swap = S+NP+VP = VP+AUX+NP = NP+DT+NN Subtree type Source-side parse-tree NP S VP AUX NP DT NN Thisisapen Reordering model using syntactic information

 Statistical syntax-directed translation with extended domain of locality [Liang Huang et al. 2006]  Extract rules for tree-to-string translation  Consider syntactic information  Consider multi-level trees on the source-side Related work 1 11 NP VP NP VB S S( :NP, VP( :VB, :NP)) →

 Proposed reordering model  Used in phrase-based translation  Estimation of proposed model is independently conducted from phrase extraction  Child node reordering in one-level subtree  Cannot represent complex reordering  Reordering using syntactic information can be briefly introduced to phrase-based translation Related work 2 12

Training algorithm (1/3)  Reordering model training 1. Word alignment 2. Parsing source sentence NP S VP AUX NP DT NN 2. source target

Training algorithm (2/3) 3. Word alignments and source-side parse-trees are combined 4. Rotation position is checked (monotone or swap) NP S VP AUX NP DT NN 1,2,3,4 2,3,4 1 2, = S+NP+VP ⇒ monotone = VP+AUX+NP ⇒ swap = NP+DT+NN ⇒ monotone 4.

5. Reordering probability of the subtree is estimated by counting each rotation position  Non-binary subtree  Any orderings for child nodes are allowed  Rotation positions are categorized into only two type ⇒ Monotone or other (swap) Training algorithm (3/3) 15 is the count of rotation position t included all training samples for the subtree type s

 Target word orders which are not derived from rotating nodes of source-side parse-tree  Linguistic reasons Difference of sentence structures  Non-linguistic reasons Errors of word alignments and syntactic analysis Remove subtree samples 16 Subtree and are used as training samples Subtree is removed from training samples

Clustering of subtree type  Number of possible subtree types is large  Unseen subtree type  Subtree type observed a few times ⇒ Cannot model exactly  Clustering of subtree type  The number of training samples is less than a heuristic threshold  Estimate clustered model from the counts of clustered subtree types 17

Decode using proposed model  Phrase-based decoder  Constrained by IST-ITG constraints  Target sentence is generated by rotating any node of the source-side parse-tree  Target word ordering that destroys a source phrase is not allowed  Check the rotation positions of subtrees  Calculate the reordering probabilities 18

 Calculate reordering probability Decode using proposed model 19 ABCDE ba Subtree Rotation position monotone swap monotone cde : monotone or swap Source sentence Target sentence

 Calculate reordering probability Decode using proposed model 20 ABCDE cd Subtree Rotation position swap monotone eab : monotone or swap Source sentence Target sentence

Rotation position included in a phrase  Cannot determine the rotation position  Word alignments included a phrase are not clear ⇒ Assign the higher probability, monotone or swap 21 ABCDE Subtree Rotation position swap higher abcde Phrase

Outline  Background  ITG & IST-ITG constraints  Proposed reordering model  Training of the proposed model  Decoding using the proposed model  Experiments  Conclusions and future work 22

Experimental conditions  Compared methods  Baseline : IBM distortion, lexical reordering models  IST-ITG : Baseline + IST-ITG constraint  Proposed : Baseline + proposed reordering model  Training  GIZA++ toolkit  SRI language model toolkit  Minimum error rate training (BLEU-4)  Charniak parser 23

Experimental conditions (E-J)  English-to-Japanese translation experiment JST Japanese-English paper abstract corpus 24 EnglishJapanese Training dataSentences1.0M Words24.6M28.8M Development dataSentences2.0K Words50.1K58.7K Test dataSentences2.0K Words49.5K58.0K Dev. and test data: single reference

Experimental results (E-J)  Proposed reordering model  Results of test set 25 BaselineIST-ITGProposed BLEU Subtree sample13M Remove sample3M (25.38%) Subtree type54K Threshold10 Number of models6K + clustered Coverage99.29% Improved 0.49 points from IST-ITG

Experimental conditions (E-C)  English-to-Chinese translation experiment NIST MT08 English-to-Chinese translation track 26 EnglishChinese Training dataSentences4.6M Words79.6M73.4M Development dataSentences1.6K Words46.4K39.0K Test dataSentences1.9K Words45.7K47.0K (Ave.) Test data: 4 referencesDev. data: single references

Experimental results (E-C)  Proposed reordering model  Results of test set 27 BaselineIST-ITGProposed BLEU Subtree sample50M Remove sample10M (20.36%) Subtree type2M Threshold10 Number of models19K + clustered Coverage99.45% Improved 0.33 points from IST-ITG

Conclusions and future work  Conclusions  Extension of the IST-ITG constraints  Reordering using syntactic information can be briefly introduced to the phrase-based translation  Improve 0.49 points in BLEU from IST-ITG  Future work  Simultaneous training of translation and reordering models  Deal with the complex reordering which is due to difference of sentence tree structures 28

29 Thank you very much!

Number of target word orders  Number of target word orders in a target word sequence (binary tree) 30 # of wordsIST-ITGITGNo Constraint ,55840, ,0983,628, ,384745,387,0381,307,674,368,000

Example of subtree model  Monotone probability 31 Subtree type s S+PP+,NP+VP NP+DT+NN+NN0.816 VP+AUX+VP0.664 VP+VBN+PP0.864 NP+NP+PP0.837 NP+DP+JJ+NN0.805 Swap probability = 1.0 – Monotone probability