Download presentation
Presentation is loading. Please wait.
Published byOswald Kelly Modified over 9 years ago
1
Linguistically-motivated Tree-based Probabilistic Phrase Alignment Toshiaki Nakazawa, Sadao Kurohashi (Kyoto University)
2
Outline Background Tree-based Probabilistic Phrase Alignment Model Model Training Symmetrization Algorithm Experiments Conclusions 21/18/2016
3
Background Many of state-of-the-art SMT systems are based on “word-based” alignment results Phrase-based SMT [Koehn et al., 2003] Hierarchical Phrase-based SMT [Chiang, 2005] and so on Some of them incorporate syntactic information “after” word-based alignment [Quirk et al., 2005], [Galley et al., 2006] and so on Is it enough? Is it able to achieve “practical” translation quality? 31/18/2016
4
Background (cont.) Word-based alignment model works well for structurally similar language pairs It is not effective for language pairs with great difference in linguistic structure such as Japanese and English SOV versus SVO For such language pair, syntactic information is necessary even during alignment process 41/18/2016
5
Related Work Syntactic tree-based model [Yamada and Knight, 2001], [Gildea, 2003], ITG by Wu Incorporating some operations which control sub- trees (re-order, insert, delete, clone) to reproduce the opposite tree structure Our model does not require any operations Our model utilizes dependency trees Dependency tree-based model [Cherry and Lin, 2003] Word-to-word, and one-to-one alignment Our model makes phrase-to-phrase alignment, and can make many-to-many links 51/18/2016
6
Features of Proposed Tree-based Probabilistic Phrase Alignment Model Generation model similar to IBM models Using phrase dependency structures “phrase” means a linguistic phrase (cf. phrase-based SMT) Phrase to phrase alignment model Each phrase (node) consists of basically 1 content word and 0 or more function words Source side content words can be aligned to content words of target side only (same for function words) Generation starts from the root node and end up with one of leaf nodes (cf. IBM model is from first word to last word) 61/18/2016
7
Outline Background Tree-based Probabilistic Phrase Alignment Model Model Training Symmetrization Algorithm Experiments Conclusions 71/18/2016
8
Dependency Analysis of Sentences プロピレングリコールは血中グル コースインスリンを上昇させ、血中 NEFA 濃度を減少させる Propylene glycol increases in blood glucose and insulin and decreases in NEFA concentration in the blood SourceTarget Word order Head node Root node 81/18/2016
9
IBM Model v.s Tree-based Model IBM Model [Brown et al., 93] Tree-based Model : source sentence : target sentence : alignment : parameters : source tree : target tree 91/18/2016
10
Model Decomposition: Lexicon Probability Suppose consists of nodes and consists of nodes is calculated as a product of two probabilities Ex) 濃度 を - in concentration 上昇 さ せ - increase Phrase translation probability 101/18/2016
11
Model Decomposition: Alignment Probability Define the parent node of as is decomposed as a product of target side dependency relation probability conditioned on source side relation If the parent node has been aligned to NULL, indicates the grandparent of, and this continues until has been aligned to other than NULL models a tree-based reordering Dependency relation probability 111/18/2016
12
Outline Background Tree-based Probabilistic Phrase Alignment Model Model Training Symmetrization Algorithm Experiments Conclusions 121/18/2016
13
Model Training The proposed model is trained by EM algorithm First, phrase translation probability is learned (Model 1) Model 1 can be efficiently learned without approximation (cf. IBM model 1 and 2) Next, dependency relation probability is learned (Model 2) with probabilities learned in Model 1 as initial parameters Model 2 needs some approximation (cf. IBM model 3 or greater), we use beam-search algorithm 131/18/2016
14
Model 1 Each phrase in source side can correspond to an arbitrary phrase in target side a or NULL phrase A probability of one possible alignment is: Then, tree translation probability is: Efficiently calculated as: 141/18/2016
15
Model 2 (imaginary ROOT node) Root node of a sentence is supposed to depend on the imaginary ROOT node, which works as a Start-Of-Sentence (SOS) in word-based model The ROOT node in source tree always corresponds to that of target tree 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した ROOT necessary the point through the case in the viewpoint of the assist was confirmed ROOT 151/18/2016
16
Model 2 (beam-search algorithm) It is impossible to enumerate all the possible alignment Consider only a subset of “good-looking” alignments using beam-search algorithm Ex) beam-width = 4 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 161/18/2016
17
Model 2 (beam-search algorithm) 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 171/18/2016
18
Model 2 (parameter notations) Dependency relation between two phrases and is defined as a path from to using the following notations: “c-” if is a pre-child of “c+” if is a post-child of “p-” if is a post-child of “p+” if is a pre-child of “INCL” if and are same phrase “ROOT” if is an imaginary ROOT node “NULL” if is aligned to NULL 181/18/2016 c- c+ p- p+ ROOT
19
Model 2 (parameter notations, cont.) In a case where and are two or more nodes distant from each other, the relation is described by combining the notations Ex) 1/18/201619 c- c+ c-;c+ c- c+ p- p-;c+;c-
20
Dependency Relation Probability Examples 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した necessary the point through the case in the viewpoint of the assist was confirmed NULL 201/18/2016
21
Example 事例 を 通して 援助 の 視点 に 必要な ポイント を 確認 した ROOT necessary the point through the case in the viewpoint of the assist was confirmed ROOT 211/18/2016
22
Outline Background Tree-based Probabilistic Phrase Alignment Model Model Training Symmetrization Algorithm Experiments Conclusions 221/18/2016
23
Symmetrization Algorithm Since our model is directed, we run the model bi-directionally and symmetrize two alignment results heuristically Symmetrization algorithm is similar to [Koehn et al. 2003], which uses 1-best GIZA++ word alignment result of each direction Our algorithm exploits n-best alignment results of each direction Three steps: Superimposition Growing Handling isolations 231/18/2016
24
Symmetrization Algorithm 1. Superimposition Source to Target 5-best Target to Source 5-best 52 10 53 5 15 73 59 7 71 241/18/2016
25
Symmetrization Algorithm 1. Superimposition (cont.) 52 10 53 5 15 73 59 7 71 Definitive alignment points are adopted The points which don’t have same or higher scored point in their same row or column Conflicting points are discarded The points which is in the same row or column of the adopted point and is not contiguous to the adopted point on tree 52 10 53 5 15 73 59 7 71 52 3 5 7 59 7 7 251/18/2016
26
Symmetrization Algorithm 2. Growing Adopt contiguous points to adopted points in both source and target tree In descending order of the score From top to bottom From left to right Discard conflicting points The points which have adopted point both in the same row and column 52 10 3 5 7 59 7 7 5 3 5 7 9 7 7 261/18/2016
27
Symmetrization Algorithm 3. Handling Isolation Adopt points which are not aligned to any phrase in both source and target language 5 10 3 5 7 9 7 7 5 3 5 7 9 7 7 271/18/2016
28
Alignment Experiment Training corpus Japanese-English paper abstract corpus provided by JST which consists of about 1M parallel sentences Gold-standard alignment Manually annotated 100 sentence pairs among the training corpus Sure (S) alignment only [Och and Ney, 2003] Evaluation unit Morpheme-based for Japanese Word-based for English Iterations 5 iterations for Model 1, and 5 iterations for Model 2 281/18/2016
29
Alignment Experiment (cont.) Comparative experiment (word-base alignment) GIZA++ and various symmetrization heuristics [Koehn et al., 2007] Default settings for GIZA++ Use original forms of words for both Japanese and English 291/18/2016
30
Results PrecisionRecallF-measure proposed 1-best-intersection90.9241.6957.17 1-best-grow83.3054.3365.76 3-best-grow81.2156.5266.65 5-best-grow80.5957.3367.00 GIZA++ intersection88.1440.1855.20 grow83.5049.6562.27 grow-final67.1956.9161.63 grow-final-and78.0052.9363.06 grow-diag77.3453.1863.03 grow-diag-final67.2456.6361.48 grow-diag-final-and74.9554.2662.95 301/18/2016
31
Example of Alignment Improvement Proposed modelWord-base alignment 311/18/2016
32
Example of Alignment Error Proposed modelWord-base alignment 321/18/2016
33
Translation Experiments Training corpus Same to alignment experiments Test corpus 500 paper abstract sentences Decoder Moses [Koehn et al., 2007] Use default options except for phrase table limit (20 -> 10) and distortion limit (6 -> -1) No minimum error rate training Evaluation BLEU No punctuations and case-insensitive 331/18/2016
34
Results PreRecFBLEU proposed 1-best-intersection90.9241.6957.1712.73 5-best-grow80.5957.3367.0015.40 GIZA++ intersection88.1440.1855.2016.35 grow-diag77.3453.1863.0317.89 grow-diag-final-and74.9554.2662.9517.76 341/18/2016 Definition of function words is improper Articles? Auxiliary verbs? … Tree-based decoder is necessary BLEU is essentially insensitive to syntactic structure Translation quality potentially improved
35
Potentially Improved Example Input: これ は LB 膜 の 厚み が アビジン を 吸着 する こと で 増加 した こと に よる 。 Proposed (30.13): this is due to the increase in the thickness of the lb film avidin adsorb GIZA++ (33.78): the thickness of the lb film avidin to adsorption increased by it Reference: this was due to increased thickness of the lb film by adsorbing avidin 1/18/201635
36
Conclusion Tree-based probabilistic phrase alignment model using dependency tree structures Phrase translation probability Dependency relation probability N-best symmetrization algorithm Achieve high alignment accuracy compared to word-based models Syntactic information is useful during alignment process BUT: Unable to improve the BLEU scores of translation 361/18/2016
37
Future Work More flexible model Content words sometimes correspond to function words and vice versa Integrate parsing probabilities into the model Parsing errors easily lead to alignment errors By integrating parsing probabilities, parsing results and alignment can be revised complementary More syntactical information Use POS or phrase category into the model 371/18/2016
38
38 Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.