Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011.

Similar presentations


Presentation on theme: "A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011."— Presentation transcript:

1 A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011

2 2 Background PSCFG (Chiang 2005,2007) –Rules: X => (γ/α/ w) X=>( / held talk with Sharon) X=>( X 1 / held talk with X 1 ) X=>( X 1 X 2 / held X 2 with X 1 ) –Glue rules: S=>(X / X) S=>(S X / S X) –Decoding: cube-pruning, etc.

3 3 Motivation Only S and X are used in PSCFG, can not model different rule categories. Example: –X=>( X 1 X 2 / held X 2 with X 1 ) –No difference between X 1 and X 2 Maybe we want … –VP=>( PRP NP / held NP with PRP) Idea: multi-label PSCFG. How to label hierarchical phrases?

4 4 Simple: boundary (POS) tags –I[PRP] saw[VBD] him[PRP] Extracted rules: –PRP-PRP => (ich / I) –PRP-PRP => (ihn / him) –VBD-VBD => (gesehen / saw) –VBD-PRP => (habe ihn gesehen / saw him) –VBD-PRP => (Ich habe ihn gesehen / I saw him) –VBD-PRP => (habe PRP-PRP gesehen / saw PRP- PRP Labeling from word classes(1/4)

5 5 Labeling from word classes(2/4) Accounting for phrase size –1-word PRP=>(Ich | I) PRP=>(ihn | him) –2-word VBD-PRP => (habe ihn gesehen / saw him) VBD-PRP => (habe PRP gesehen / saw PRP) –multiple-word VBD..PRP => (Ich habe ihn gesehen / I saw him) VBD..PRP => (Ich habe PRP gesehen / I saw PRP)

6 6 Labeling from word classes(3/4) Bilingually tagged corpus –Ich[PRP] habe[AUX] ihn[PRP] gesehen[VBN] –I[PRP] saw[VBD] him[PRP] Extracted rules: (src label+tgt label) –PRP+PRP => (ich / I) –PRP+PRP => (ihn / him) –VBN+VBD => (gesehen / saw) –AUX..VBN+VBD-PRP => (habe ihn gesehen / saw him) –PRP..VBN+PRP..PRP => (Ich habe ihn gesehen / I saw him) –AUX..VBN+VBD-PRP => (habe PRP+PRP gesehen / saw PRP+PRP

7 7 Labeling from word classes(4/4) Unsupervised word class clustering –MKCLS –Morphological information Problems of word classes: –Huge grammar size –Data sparseness –Solution: directly clustering rules

8 8 Clustering phrase pairs Directly clustering phrase pairs K-means clustering algorithm

9 9 Experiments Baseline PTB POS Tags Word Class Clustering Phrase Clustering

10 10 Experiments

11 11 Related Work JHU workshop 2010 –http://www.clsp.jhu.edu/workshops/ws10/grou ps/msgismt/http://www.clsp.jhu.edu/workshops/ws10/grou ps/msgismt/ Other approaches –Phrase clustering –Syntax-augmented MT Source code: –SAMT system

12 12 Problems Too simple, sometimes naïve. –Simple features –Simple clustering method –How to control model complexity Future work –Other learning method instead of clustering –Combining hierarchical phrase based model with syntactical trees


Download ppt "A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011."

Similar presentations


Ads by Google