Download presentation
Presentation is loading. Please wait.
Published byElaine Wyatt Modified over 9 years ago
1
Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun Lang 2011-10-21 I2R SMT-Reading Group 1
2
Paper info Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing ACL-08 Long Paper Cited :Thirty Seven Authors: Hao Z hang Chris Quirk Robert C. Moore Daniel Gildea 2
3
Core Ideas Variational Bayes Tic-tac-toe pruning Word-to-phrase bootstrapping 3
4
Outline Paper present – Pipeline – Model – Training – Parsing (Pruning) – Result Shortcomings Discussion 4
5
Summary of the Pipeline Run IBM Model 1 on sentence-aligned data Use tic-tac-toe pruning to prune the bitext space Word-based ITG, Variational Bayes training, get the Viterbi alignment Non-compositional constraints to constrain the space of phrase pairs Phrasal ITG, VB training, Viterbi pass to get the phrasal alignment 5
6
Phrasal Inversion Transduction Grammar 6
7
Dirichlet Prior for Phrasal ITG 7
8
X1X1 X n-1 ZnZn X n+1 XNXN …….. root 0/0T/Vt/vs/u i Review : Inside-Outside Algorithm …….. Forward-backward Algorithm: not only used for HMM, but also for any State Space Model Inside-Outside Algorithm is a special case of Forward-backward Algorithm. Shujie liu 8
9
VB Algorithm for Training SITGs - E1 Inside probabilities : Initialization : Recursion : i (s/u-t/v) t/vs/u S/U j (s/u-S/U) k (S/U-t/v) Copy from liu 9
10
VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) Copy from liu 10
11
VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) Copy from liu 11
12
VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) Copy from liu 12
13
VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) Copy from liu 13
14
VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) j (s/u-t/v) S/Us/u i (S/U-s/u) k (s/u-t/v) t/v Copy from liu 14
15
VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) j (s/u-t/v) S/Us/u i (S/U-s/u) k (s/u-t/v) t/v Copy from liu 15
16
VB Algorithm for Training SITGs - M s=3, is the number of right-hand-sides for X m is the number of observed phrase pairs ψ is the digamma function 16
17
Pruning Tic-tac-toe pruning (Hao Z hang 2005) Fast Tic-tac-toe pruning (Hao Z hang 2008) High-precision alignments pruning (Haghighi ACL2009) – Prune all bitext cells that would invalidate more than 8 of high-precision alignments 1-1 alignment posterior pruning (Haghighi ACL2009) – Prune all 1-1 bitext cells that have a posterior below 10 -4 in both HMM Models 17
18
Tic-tac-toe pruning (Hao Z hang 2005) 18
19
Non-compositional Phrases Constraint e(i,j) number of links emitted from substring f(l,m) number of links emitted from substring 19
20
Word Alignment Evaluation Both 10 iterations training EM : lowest AER is achieved after the second iteration, which is 0.40. At iteration 10, AER for EM increase to 0.42 VB : ac is 1e-9, VB get AER close to 0.35 at iteration 10. 20
21
End-to-end Evaluation NIST Chinese-English training data NIST 2002 evaluation datasets for tuning and evalution 10-reference development set was used for MERT 4-reference test set was used for evaluation. 21
22
Shortcomings Grammar is not perfect Itg ordering is context independent Phrasal pairs are sparse 22
23
Grammar is not perfect Over-counting problem alternative ITG parse trees have the same word alignment matching, which is called over-counting problem. ITG Parser Tree SpaceWord Alignment Space I am rich ! ^^ vv 23
24
A better-constrained grammar A series of nested constituents with the same orientation will always have a left-heavy derivation And the second parser tree of the former example will not be generated. C->1/3C->2/4C-> 3/2C-> 4/1 A -> [C C] B -> ? 24
25
Thanks Q&A 25
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.