Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro Sumita, and Keiichi Tokuda Nagoya Institute of Technology National Institute of Information and Communications Technology Kinki University ATR Spoken Language Communication Research Labs ,3 2,4 1,2
Background (1/2) Phrase-based statistical machine translation Can model local word reordering Short idioms Insertions and deletions of words Errors in global word reordering Word reordering constraint technique Linguistically syntax based approach Source tree, target tree, both tree structures Formal constraints on word permutations IBM distortion, lexical reordering model, ITG 2
Background (2/2) Imposing a source tree on ITG (IST-ITG) Extension of ITG constraints Introduce a source sentence tree structure Cannot evaluate the accuracy of the target word orders Reordering model using syntactic information Extension of IST-ITG constraints Rotation of source-side parse-tree Can be briefly introduce to the phrase-based translation system 3
Outline Background ITG & IST-ITG constraints Proposed reordering model Training of the proposed model Decoding using the proposed model Experiments Conclusions and future work 4
Inversion transduction grammar ITG constraints All possible binary tree structures are generated from the source word sequence The target sentence is obtained by rotating any node of the generated binary trees Can reduce the number of target word orders Not consider the tree structure instance 5
Imposing source tree on ITG Directly introduce a source sentence tree structure to ITG 6 Source sentence tree structure This is a pen Source sentence Thisisapen The target sentence is obtained by rotating any node of source sentence tree structure The number of word orders is reduced to
Non-binary tree The parsing results sometimes produce non- binary trees 7 ABCDE cdedceecd ceddecedc # of orders in non-binary subtree is Any reordering of child nodes in non-binary subtree is allowed
Problem of IST-ITG Cannot evaluate the accuracy of the target word reordering ⇒ Assign an equal probability to all rotations 8 Propose reordering model using syntactic information Equal probability : source sentence
Outline Background ITG & IST-ITG constraints Proposed reordering model Training of the proposed model Decoding using the proposed model Experiments Conclusions and future work 9
Rotation of each subtree type is modeled Abstract of proposed method 10 This is a pen Source sentence Reordering probability : monotone or swap = S+NP+VP = VP+AUX+NP = NP+DT+NN Subtree type Source-side parse-tree NP S VP AUX NP DT NN Thisisapen Reordering model using syntactic information
Statistical syntax-directed translation with extended domain of locality [Liang Huang et al. 2006] Extract rules for tree-to-string translation Consider syntactic information Consider multi-level trees on the source-side Related work 1 11 NP VP NP VB S S( :NP, VP( :VB, :NP)) →
Proposed reordering model Used in phrase-based translation Estimation of proposed model is independently conducted from phrase extraction Child node reordering in one-level subtree Cannot represent complex reordering Reordering using syntactic information can be briefly introduced to phrase-based translation Related work 2 12
Training algorithm (1/3) Reordering model training 1. Word alignment 2. Parsing source sentence NP S VP AUX NP DT NN 2. source target
Training algorithm (2/3) 3. Word alignments and source-side parse-trees are combined 4. Rotation position is checked (monotone or swap) NP S VP AUX NP DT NN 1,2,3,4 2,3,4 1 2, = S+NP+VP ⇒ monotone = VP+AUX+NP ⇒ swap = NP+DT+NN ⇒ monotone 4.
5. Reordering probability of the subtree is estimated by counting each rotation position Non-binary subtree Any orderings for child nodes are allowed Rotation positions are categorized into only two type ⇒ Monotone or other (swap) Training algorithm (3/3) 15 is the count of rotation position t included all training samples for the subtree type s
Target word orders which are not derived from rotating nodes of source-side parse-tree Linguistic reasons Difference of sentence structures Non-linguistic reasons Errors of word alignments and syntactic analysis Remove subtree samples 16 Subtree and are used as training samples Subtree is removed from training samples
Clustering of subtree type Number of possible subtree types is large Unseen subtree type Subtree type observed a few times ⇒ Cannot model exactly Clustering of subtree type The number of training samples is less than a heuristic threshold Estimate clustered model from the counts of clustered subtree types 17
Decode using proposed model Phrase-based decoder Constrained by IST-ITG constraints Target sentence is generated by rotating any node of the source-side parse-tree Target word ordering that destroys a source phrase is not allowed Check the rotation positions of subtrees Calculate the reordering probabilities 18
Calculate reordering probability Decode using proposed model 19 ABCDE ba Subtree Rotation position monotone swap monotone cde : monotone or swap Source sentence Target sentence
Calculate reordering probability Decode using proposed model 20 ABCDE cd Subtree Rotation position swap monotone eab : monotone or swap Source sentence Target sentence
Rotation position included in a phrase Cannot determine the rotation position Word alignments included a phrase are not clear ⇒ Assign the higher probability, monotone or swap 21 ABCDE Subtree Rotation position swap higher abcde Phrase
Outline Background ITG & IST-ITG constraints Proposed reordering model Training of the proposed model Decoding using the proposed model Experiments Conclusions and future work 22
Experimental conditions Compared methods Baseline : IBM distortion, lexical reordering models IST-ITG : Baseline + IST-ITG constraint Proposed : Baseline + proposed reordering model Training GIZA++ toolkit SRI language model toolkit Minimum error rate training (BLEU-4) Charniak parser 23
Experimental conditions (E-J) English-to-Japanese translation experiment JST Japanese-English paper abstract corpus 24 EnglishJapanese Training dataSentences1.0M Words24.6M28.8M Development dataSentences2.0K Words50.1K58.7K Test dataSentences2.0K Words49.5K58.0K Dev. and test data: single reference
Experimental results (E-J) Proposed reordering model Results of test set 25 BaselineIST-ITGProposed BLEU Subtree sample13M Remove sample3M (25.38%) Subtree type54K Threshold10 Number of models6K + clustered Coverage99.29% Improved 0.49 points from IST-ITG
Experimental conditions (E-C) English-to-Chinese translation experiment NIST MT08 English-to-Chinese translation track 26 EnglishChinese Training dataSentences4.6M Words79.6M73.4M Development dataSentences1.6K Words46.4K39.0K Test dataSentences1.9K Words45.7K47.0K (Ave.) Test data: 4 referencesDev. data: single references
Experimental results (E-C) Proposed reordering model Results of test set 27 BaselineIST-ITGProposed BLEU Subtree sample50M Remove sample10M (20.36%) Subtree type2M Threshold10 Number of models19K + clustered Coverage99.45% Improved 0.33 points from IST-ITG
Conclusions and future work Conclusions Extension of the IST-ITG constraints Reordering using syntactic information can be briefly introduced to the phrase-based translation Improve 0.49 points in BLEU from IST-ITG Future work Simultaneous training of translation and reordering models Deal with the complex reordering which is due to difference of sentence tree structures 28
29 Thank you very much!
Number of target word orders Number of target word orders in a target word sequence (binary tree) 30 # of wordsIST-ITGITGNo Constraint ,55840, ,0983,628, ,384745,387,0381,307,674,368,000
Example of subtree model Monotone probability 31 Subtree type s S+PP+,NP+VP NP+DT+NN+NN0.816 VP+AUX+VP0.664 VP+VBN+PP0.864 NP+NP+PP0.837 NP+DP+JJ+NN0.805 Swap probability = 1.0 – Monotone probability