Download presentation
Presentation is loading. Please wait.
Published byShanna Morton Modified over 8 years ago
1
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009
2
把 7 月 11 日 設立 為 航海 節 Introduction Machine Translation –Chinese to English –Chinese 把 7 月 11 日 設立 為 航海 節 An ideal case: to establish July 11 as Sailing Festival day
3
Wrong Linguistic Structure 航海 節 is a syntactic constituent 把 7 月 11 日 設立 為 航海 節 to set up for navigation on July 11 knots
4
A Naive Solution Employ syntactic constraints –Fully respect linguistic structures
5
把 今天 設立 為航海 節 A Naive Solution (2) Unfortunately, it damages the performance –Non-syntactic translations are sometimes useful Sailing Festival dayestablish today as
6
Syntax-Driven Bracketing Model SDB model Translation unit is more important –Whether it is syntactic or non-syntactic Include but not limited to constituent matching/violation Protect the strength of the phrase-based system
7
Translation Unit Bracketable source phrase and its corresponding translation Bracketable –A source phrase is bracketable Its translation is contiguous –A pair of neighboring phrases is bracketable Their translations are contiguous after combined
8
establish today as Translation Unit Examples Bracketable 把 今天 設立 為 establish today as 把 今天 設立為 把 今天 設立 and 為 are bracketable 把 今天 設立 為 is bracketable
9
把 今天 設立 為 establish today as Translation Unit Examples Unbracketable 設立 and 為 are unbracketable 設立 為 is unbracketable
10
Bracketing Instances Extraction Extract bracketable and unbracketable instances from training data –Aligned sentence pair + parsed source sentence Estimate whether a source phrase is bracketable at run time
12
SDB Features
13
Rule Features Rule Features (RF) –CFG rule –Horizontal context
14
Rule Features (2) S 1 : ADVP AD S 2 : VP VV AS NP S: VP ADVP VP
15
Path Features Path features (PF) –Path to roots S1 to the root of S S2 to the root of S S to the root of this tree –Vertical context
16
Path Features (2) S 1 : ADVP VP S 2 : VP VP S: VP IP
17
Constituent Boundary Matching Features Constituent Boundary Matching Features (CBMF) –Exact match Source phrase covers the boundaries of its tree –Inside match Source phrase covers a sequence of its tree –Crossing match Source phrase crosses the subtree of its tree
18
Constituent Boundary Matching Features (3) Exact match Inside match Crossing match
19
Integration into Phrase-based MT SDB model estimate the probability that a source phrase is bracketable. –Whether it can be translated as a unit Integrated into BTG MT system –Bracketing Transduction Grammar (Wu, 1997) establish today as 把 今天 設立為 as establish today 把 今天 設立為 Straight Inverted
20
Experiment Comparing models –Baseline: BTG system –XP+ (Marton and Resnik, 2008) NP, VP, PP, ADVP…. Penalize each time when violating the syntactic boundaries. (soft constraint) –UniSDB Only S features –BiSDB S 1, S 2 and S features
21
Experiment (2) Chinese parser –Lexicalized PCFG parser (Xiong et al., 2005) Parallel corpus –FBIS corpus Word alignment –GIZA++ Four-gram language model –Built with SRILM –Xinhua section of the the English Gigaword corpus Maximum Entropy (ME) Trainer –Zhang 2004
22
Result SDB receives the largest feature weight –Imply its impact on decoder. Baseline features (Common for phrase-based systems) XP+ and SDB
23
Result (2) NIST MT-05 test set –Improvement of 1.67 BLEU over baseline –Improvement of 0.59 BLEU over XP+
24
Result (3) Based on CBMF, adding rule and path feature achieves further improvement BiSDB is constantly better than UniSDB –Inner contexts (S1 and S2) are useful
25
XP+ and SDB Same –Consider syntactic constituent Different –XP+ only punishes non-syntactic source phrase –SDB is able to encourage non-syntactic if the phrase is bracketable
26
XP+ and SDB
27
Conclusion SDM model predict whether a source phrase can be translated as a unit. Appropriate constituent violations are helpful –Because it better inherit the strength of phrase-based approach
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.