Download presentation
Presentation is loading. Please wait.
Published byBrooke Bartlett Modified over 11 years ago
1
A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011
2
2 Background PSCFG (Chiang 2005,2007) –Rules: X => (γ/α/ w) X=>( / held talk with Sharon) X=>( X 1 / held talk with X 1 ) X=>( X 1 X 2 / held X 2 with X 1 ) –Glue rules: S=>(X / X) S=>(S X / S X) –Decoding: cube-pruning, etc.
3
3 Motivation Only S and X are used in PSCFG, can not model different rule categories. Example: –X=>( X 1 X 2 / held X 2 with X 1 ) –No difference between X 1 and X 2 Maybe we want … –VP=>( PRP NP / held NP with PRP) Idea: multi-label PSCFG. How to label hierarchical phrases?
4
4 Simple: boundary (POS) tags –I[PRP] saw[VBD] him[PRP] Extracted rules: –PRP-PRP => (ich / I) –PRP-PRP => (ihn / him) –VBD-VBD => (gesehen / saw) –VBD-PRP => (habe ihn gesehen / saw him) –VBD-PRP => (Ich habe ihn gesehen / I saw him) –VBD-PRP => (habe PRP-PRP gesehen / saw PRP- PRP Labeling from word classes(1/4)
5
5 Labeling from word classes(2/4) Accounting for phrase size –1-word PRP=>(Ich | I) PRP=>(ihn | him) –2-word VBD-PRP => (habe ihn gesehen / saw him) VBD-PRP => (habe PRP gesehen / saw PRP) –multiple-word VBD..PRP => (Ich habe ihn gesehen / I saw him) VBD..PRP => (Ich habe PRP gesehen / I saw PRP)
6
6 Labeling from word classes(3/4) Bilingually tagged corpus –Ich[PRP] habe[AUX] ihn[PRP] gesehen[VBN] –I[PRP] saw[VBD] him[PRP] Extracted rules: (src label+tgt label) –PRP+PRP => (ich / I) –PRP+PRP => (ihn / him) –VBN+VBD => (gesehen / saw) –AUX..VBN+VBD-PRP => (habe ihn gesehen / saw him) –PRP..VBN+PRP..PRP => (Ich habe ihn gesehen / I saw him) –AUX..VBN+VBD-PRP => (habe PRP+PRP gesehen / saw PRP+PRP
7
7 Labeling from word classes(4/4) Unsupervised word class clustering –MKCLS –Morphological information Problems of word classes: –Huge grammar size –Data sparseness –Solution: directly clustering rules
8
8 Clustering phrase pairs Directly clustering phrase pairs K-means clustering algorithm
9
9 Experiments Baseline PTB POS Tags Word Class Clustering Phrase Clustering
10
10 Experiments
11
11 Related Work JHU workshop 2010 –http://www.clsp.jhu.edu/workshops/ws10/grou ps/msgismt/http://www.clsp.jhu.edu/workshops/ws10/grou ps/msgismt/ Other approaches –Phrase clustering –Syntax-augmented MT Source code: –SAMT system
12
12 Problems Too simple, sometimes naïve. –Simple features –Simple clustering method –How to control model complexity Future work –Other learning method instead of clustering –Combining hierarchical phrase based model with syntactical trees
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.