Linguistically-motivated Tree-based Probabilistic Phrase Alignment Toshiaki Nakazawa, Sadao Kurohashi (Kyoto University)

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Combining Word-Alignment Symmetrizations in Dependency Tree Projection David Mareček Charles University in Prague Institute of.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Statistical Machine Translation Part IV - Assignments and Advanced Topics Alex Fraser Institute for Natural Language Processing University of Stuttgart.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation Linguistics 580 (Machine Translation) Scott Drellishak, 2/21/2006.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
Alignment by Bilingual Generation and Monolingual Derivation Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
National Institute of Informatics Kiyoko Uchiyama 1 A Study for Introductory Terms in Logical Structure of Scientific Papers.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Language Knowledge Engineering Lab. Kyoto University NTCIR-10 PatentMT, Japan, Jun , 2013 Description of KYOTO EBMT System in PatentMT at NTCIR-10.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
Chapter 6 Binary Trees. 6.1 Trees, Binary Trees, and Binary Search Trees Linked lists usually are more flexible than arrays, but it is difficult to use.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Japanese Dependency Analysis using Cascaded Chunking Taku Kudo 工藤 拓 Yuji Matsumoto 松本 裕治 Nara Institute Science and Technology, JAPAN.
Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also called nodes, in the tree. The algorithm adds a node to.
Bayesian Subtree Alignment Model based on Dependency Trees Toshiaki Nakazawa Sadao Kurohashi Kyoto University 1 IJCNLP2011.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.
2007/4/201 Extracting Parallel Texts from Massive Web Documents Chikayama Taura lab. M2 Dai Saito.
Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information.
Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
Structural Phrase Alignment Based on Consistency Criteria Toshiaki Nakazawa, Kun Yu, Sadao Kurohashi (Graduate School of Informatics, Kyoto University)
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Example-based Machine Translation Pursuing Fully Structural NLP Sadao Kurohashi, Toshiaki Nakazawa, Kauffmann Alexis, Daisuke Kawahara University of Tokyo.
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
NLP. Machine Translation Tree-to-tree – Yamada and Knight Phrase-based – Och and Ney Syntax-based – Och et al. Alignment templates – Och and Ney.
Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem Mikhail Zaslavskiy Marc Dymetman Nicola Cancedda ACL 2009.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Towards Syntactically Constrained Statistical Word Alignment Greg Hanneman : Advanced Machine Translation Seminar April 30, 2008.
Getting the structure right for word alignment: LEAF Alexander Fraser and Daniel Marcu Presenter Qin Gao.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
Japanese-Chinese Phrase Alignment Exploiting Shared Chinese Characters Chenhui Chu, Toshiaki Nakazawa and Sadao Kurohashi Graduate School of Informatics,
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Statistical Machine Translation Papers from COLING 2004
Kyoto University Participation to WAT 2016
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presentation transcript:

Linguistically-motivated Tree-based Probabilistic Phrase Alignment Toshiaki Nakazawa, Sadao Kurohashi (Kyoto University)

Outline  Background  Tree-based Probabilistic Phrase Alignment Model  Model Training  Symmetrization Algorithm  Experiments  Conclusions 21/18/2016

Background  Many of state-of-the-art SMT systems are based on “word-based” alignment results  Phrase-based SMT [Koehn et al., 2003]  Hierarchical Phrase-based SMT [Chiang, 2005]  and so on  Some of them incorporate syntactic information “after” word-based alignment  [Quirk et al., 2005], [Galley et al., 2006] and so on  Is it enough?  Is it able to achieve “practical” translation quality? 31/18/2016

Background (cont.)  Word-based alignment model works well for structurally similar language pairs  It is not effective for language pairs with great difference in linguistic structure such as Japanese and English  SOV versus SVO  For such language pair, syntactic information is necessary even during alignment process 41/18/2016

Related Work  Syntactic tree-based model  [Yamada and Knight, 2001], [Gildea, 2003], ITG by Wu  Incorporating some operations which control sub- trees (re-order, insert, delete, clone) to reproduce the opposite tree structure  Our model does not require any operations  Our model utilizes dependency trees  Dependency tree-based model  [Cherry and Lin, 2003]  Word-to-word, and one-to-one alignment  Our model makes phrase-to-phrase alignment, and can make many-to-many links 51/18/2016

Features of Proposed Tree-based Probabilistic Phrase Alignment Model  Generation model similar to IBM models  Using phrase dependency structures  “phrase” means a linguistic phrase (cf. phrase-based SMT)  Phrase to phrase alignment model  Each phrase (node) consists of basically 1 content word and 0 or more function words  Source side content words can be aligned to content words of target side only (same for function words)  Generation starts from the root node and end up with one of leaf nodes (cf. IBM model is from first word to last word) 61/18/2016

Outline  Background  Tree-based Probabilistic Phrase Alignment Model  Model Training  Symmetrization Algorithm  Experiments  Conclusions 71/18/2016

Dependency Analysis of Sentences プロピレングリコールは血中グル コースインスリンを上昇させ、血中 NEFA 濃度を減少させる Propylene glycol increases in blood glucose and insulin and decreases in NEFA concentration in the blood SourceTarget Word order Head node Root node 81/18/2016

IBM Model v.s Tree-based Model  IBM Model [Brown et al., 93]  Tree-based Model : source sentence : target sentence : alignment : parameters : source tree : target tree 91/18/2016

Model Decomposition: Lexicon Probability  Suppose consists of nodes and consists of nodes  is calculated as a product of two probabilities Ex) 濃度 を - in concentration 上昇 さ せ - increase Phrase translation probability 101/18/2016

Model Decomposition: Alignment Probability  Define the parent node of as  is decomposed as a product of target side dependency relation probability conditioned on source side relation  If the parent node has been aligned to NULL, indicates the grandparent of, and this continues until has been aligned to other than NULL  models a tree-based reordering Dependency relation probability 111/18/2016

Outline  Background  Tree-based Probabilistic Phrase Alignment Model  Model Training  Symmetrization Algorithm  Experiments  Conclusions 121/18/2016

Model Training  The proposed model is trained by EM algorithm  First, phrase translation probability is learned (Model 1)  Model 1 can be efficiently learned without approximation (cf. IBM model 1 and 2)  Next, dependency relation probability is learned (Model 2) with probabilities learned in Model 1 as initial parameters  Model 2 needs some approximation (cf. IBM model 3 or greater), we use beam-search algorithm 131/18/2016

Model 1  Each phrase in source side can correspond to an arbitrary phrase in target side a or NULL phrase  A probability of one possible alignment is:  Then, tree translation probability is:  Efficiently calculated as: 141/18/2016

Model 2 (imaginary ROOT node)  Root node of a sentence is supposed to depend on the imaginary ROOT node, which works as a Start-Of-Sentence (SOS) in word-based model  The ROOT node in source tree always corresponds to that of target tree 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した ROOT necessary the point through the case in the viewpoint of the assist was confirmed ROOT 151/18/2016

Model 2 (beam-search algorithm)  It is impossible to enumerate all the possible alignment  Consider only a subset of “good-looking” alignments using beam-search algorithm  Ex) beam-width = 4 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 161/18/2016

Model 2 (beam-search algorithm) 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 171/18/2016

Model 2 (parameter notations)  Dependency relation between two phrases and is defined as a path from to using the following notations:  “c-” if is a pre-child of  “c+” if is a post-child of  “p-” if is a post-child of  “p+” if is a pre-child of  “INCL” if and are same phrase  “ROOT” if is an imaginary ROOT node  “NULL” if is aligned to NULL 181/18/2016 c- c+ p- p+ ROOT

Model 2 (parameter notations, cont.)  In a case where and are two or more nodes distant from each other, the relation is described by combining the notations Ex) 1/18/ c- c+ c-;c+ c- c+ p- p-;c+;c-

Dependency Relation Probability Examples 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 201/18/2016

Example 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した ROOT necessary the point through the case in the viewpoint of the assist was confirmed ROOT 211/18/2016

Outline  Background  Tree-based Probabilistic Phrase Alignment Model  Model Training  Symmetrization Algorithm  Experiments  Conclusions 221/18/2016

Symmetrization Algorithm  Since our model is directed, we run the model bi-directionally and symmetrize two alignment results heuristically  Symmetrization algorithm is similar to [Koehn et al. 2003], which uses 1-best GIZA++ word alignment result of each direction  Our algorithm exploits n-best alignment results of each direction  Three steps:  Superimposition  Growing  Handling isolations 231/18/2016

Symmetrization Algorithm 1. Superimposition Source to Target 5-best Target to Source 5-best /18/2016

Symmetrization Algorithm 1. Superimposition (cont.)  Definitive alignment points are adopted  The points which don’t have same or higher scored point in their same row or column  Conflicting points are discarded  The points which is in the same row or column of the adopted point and is not contiguous to the adopted point on tree /18/2016

Symmetrization Algorithm 2. Growing  Adopt contiguous points to adopted points in both source and target tree  In descending order of the score  From top to bottom  From left to right  Discard conflicting points  The points which have adopted point both in the same row and column /18/2016

Symmetrization Algorithm 3. Handling Isolation  Adopt points which are not aligned to any phrase in both source and target language /18/2016

Alignment Experiment  Training corpus  Japanese-English paper abstract corpus provided by JST which consists of about 1M parallel sentences  Gold-standard alignment  Manually annotated 100 sentence pairs among the training corpus  Sure (S) alignment only [Och and Ney, 2003]  Evaluation unit  Morpheme-based for Japanese  Word-based for English  Iterations  5 iterations for Model 1, and 5 iterations for Model 2 281/18/2016

Alignment Experiment (cont.)  Comparative experiment (word-base alignment)  GIZA++ and various symmetrization heuristics [Koehn et al., 2007]  Default settings for GIZA++  Use original forms of words for both Japanese and English 291/18/2016

Results PrecisionRecallF-measure proposed 1-best-intersection best-grow best-grow best-grow GIZA++ intersection grow grow-final grow-final-and grow-diag grow-diag-final grow-diag-final-and /18/2016

Example of Alignment Improvement Proposed modelWord-base alignment 311/18/2016

Example of Alignment Error Proposed modelWord-base alignment 321/18/2016

Translation Experiments  Training corpus  Same to alignment experiments  Test corpus  500 paper abstract sentences  Decoder  Moses [Koehn et al., 2007]  Use default options except for phrase table limit (20 -> 10) and distortion limit (6 -> -1)  No minimum error rate training  Evaluation  BLEU  No punctuations and case-insensitive 331/18/2016

Results PreRecFBLEU proposed 1-best-intersection best-grow GIZA++ intersection grow-diag grow-diag-final-and /18/2016  Definition of function words is improper  Articles? Auxiliary verbs? …  Tree-based decoder is necessary  BLEU is essentially insensitive to syntactic structure  Translation quality potentially improved

Potentially Improved Example  Input: これ は LB 膜 の 厚み が アビジン を 吸着 する こと で 増加 した こと に よる 。  Proposed (30.13): this is due to the increase in the thickness of the lb film avidin adsorb  GIZA++ (33.78): the thickness of the lb film avidin to adsorption increased by it  Reference: this was due to increased thickness of the lb film by adsorbing avidin 1/18/201635

Conclusion  Tree-based probabilistic phrase alignment model using dependency tree structures  Phrase translation probability  Dependency relation probability  N-best symmetrization algorithm  Achieve high alignment accuracy compared to word-based models  Syntactic information is useful during alignment process  BUT: Unable to improve the BLEU scores of translation 361/18/2016

Future Work  More flexible model  Content words sometimes correspond to function words and vice versa  Integrate parsing probabilities into the model  Parsing errors easily lead to alignment errors  By integrating parsing probabilities, parsing results and alignment can be revised complementary  More syntactical information  Use POS or phrase category into the model 371/18/2016

38 Thank You!