Rule Markov Models for Fast Tree-to-String Translation Author Ashish Vaswani USC Liang Huang USC David Chiang USC Haitao Mi Chinese Academy of Sciences Presenter Justin Chiu
Recall Last Week Binarized Forest-to-Tree Translation Tree to String translation Constructing composed rules Weakening independence assumptions Having redundant grammars Training and decoding needs more time
This Week Tree to String translation Focus on minimal rules rules cannot be formed out of other rules Constructing rule Markov Model for Translation
Contribution A comparison between rule Markov Model and composed rule method RMM > minimal rules RMM = vertically composed rules, 30% faster RMM ≈ full composed rule, save space and time Methods for pruning rule Markov Model Fast decoder with rule Markov Model
RULE MARKOV MODEL
Tree-to-string Grammar IP@є IP@є
Tree-to-string Grammar IP@є NP@1 VP@2 NP@1 VP@2
Tree-to-string Grammar IP@є Bush VP@2.2 PP@2.1 NP@1 VP@2 布希 PP@2.1 VP@2.2
Tree-to-string Grammar IP@є Bush held talks P@2.1.1 NP@2.1.2 NP@1 VP@2 布希 PP@2.1 VP@2.2 P@2.1.1 NP@2.1.2 VV AS NP 舉行 了 會談
Tree-to-string Grammar IP@є Bush held talks with Sharon NP@1 VP@2 布希 PP@2.1 VP@2.2 P@2.1.1 NP@2.1.2 VV AS NP 與 夏隆 舉行 了 會談
Probability of a derivation tree T For any node r = parent of r (є means no parent) = grandparent of r = anc1(r) * anc2(r)… ancn(r) P(r1| є)=probability of generate the root node Probability of a derivation tree T
Example r1 IP@є Bush held talks with Sharon r2 r3 NP@1 VP@2 r4 r5 布希 PP@2.1 VP@2.2 r6 r7 P@2.1.1 NP@2.1.2 VV AS NP 與 夏隆 舉行 了 會談
Example r1 IP@є P(T)=P(r1|є)P(r2|r1)P(r3|r1)P(r4|r1,r3)P(r6|r1,r3,r4)P(r7|r1,r3,r4)P(r5|r1,r3) r2 r3 NP@1 VP@2 r4 r5 布希 PP@2.1 VP@2.2 r6 r7 P@2.1.1 NP@2.1.2 VV AS NP
Training for rule Markov Model From Galley et al.(2004) What’s in a translation rule? Can be trained on the path set of these derivation trees
Smoothing for rule Markov Model
Pruning rule Markov Model RM-A: Keep context only more than P unique rules were observe, P=12 RM-B: Keep context only observed more than P times, P=12 RM-C: A context is added if the KL-divergence between its predictive distribution and that of its parent is above a threshold
DECODING WITH RULE MARKOV MODEL
Decoding Algorithm Input: Input parsing tree with tree address Decoder maintain a stack of active rules The dot(.) indicates the next symbol to process in the English word order
Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 2 3 4 5 6 7 8 9 10 11 12
Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 3 4 5 6 7 8 9 10 11 12
Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 4 5 6 7 8 9 10 11 12
Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 [<s>. IP@є</s>] [. NP@1 VP@2][Bush .] … Bush 4 5 6 7 8 9 10 11 12
Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 [<s>. IP@є</s>] [. NP@1 VP@2][Bush .] … Bush 4 [<s>. IP@є</s>] [NP@1 . VP@2] 5 6 7 8 9 10 11 12
Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 [<s>. IP@є</s>] [. NP@1 VP@2][Bush .] … Bush 4 [<s>. IP@є</s>] [NP@1 . VP@2] 5 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] P(r3|r1) 6 7 8 9 10 11 12
Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 [<s>. IP@є</s>] [. NP@1 VP@2][Bush .] … Bush 4 [<s>. IP@є</s>] [NP@1 . VP@2] 5 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] P(r3|r1) 6 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [. held talk] P(r5|r1,r3) 7 8 9 10 11 12
Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 [<s>. IP@є</s>] [. NP@1 VP@2][Bush .] … Bush 4 [<s>. IP@є</s>] [NP@1 . VP@2] 5 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] P(r3|r1) 6 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [. held talk] P(r5|r1,r3) 7 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [held . talk] … held 8 9 10 11 12
Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 [<s>. IP@є</s>] [. NP@1 VP@2][Bush .] … Bush 4 [<s>. IP@є</s>] [NP@1 . VP@2] 5 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] P(r3|r1) 6 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [. held talk] P(r5|r1,r3) 7 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [held . talk] … held 8 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [held talk . ] … talks 9 10 11 12
Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 [<s>. IP@є</s>] [. NP@1 VP@2][Bush .] … Bush 4 [<s>. IP@є</s>] [NP@1 . VP@2] 5 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] P(r3|r1) 6 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [. held talk] P(r5|r1,r3) 7 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [held . talk] … held 8 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [held talk . ] … talks 9 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] 10 11 12
Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 [<s>. IP@є</s>] [. NP@1 VP@2][Bush .] … Bush 4 [<s>. IP@є</s>] [NP@1 . VP@2] 5 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] P(r3|r1) 6 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [. held talk] P(r5|r1,r3) 7 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [held . talk] … held 8 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [held talk . ] … talks 9 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] P(r4|r1,r3) 11 12
Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3)
Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3) 11 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. with] … with P(r6|r3,r4) 11’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. and] … and P(r’6|r3,r4)
Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3) 11 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. with] … with P(r6|r3,r4) 12 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [with .] ... with 11’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. and] … and P(r’6|r3,r4) 12’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [and .] ... and
Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3) 11 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. with] … with P(r6|r3,r4) 12 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [with .] ... with 13 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] 11’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. and] … and P(r’6|r3,r4) 12’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [and .] ... and 13’
Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3) 11 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. with] … with P(r6|r3,r4) 12 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [with .] ... with 13 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] 14 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [. Sharon] P(r7|r3,r4) 11’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. and] … and P(r’6|r3,r4) 12’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [and .] ... and 13’ 14’
Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3) 11 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. with] … with P(r6|r3,r4) 12 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [with .] ... with 13 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] 14 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [. Sharon] P(r7|r3,r4) 11’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. and] … and P(r’6|r3,r4) 12’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [and .] ... and 13’ 14’ 15 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [Sharon .] Sharon
Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3) 11 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. with] … with P(r6|r3,r4) 12 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [with .] ... with 13 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] 14 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [. Sharon] P(r7|r3,r4) 11’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. and] … and P(r’6|r3,r4) 12’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [and .] ... and 13’ 14’ 15 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [Sharon .] Sharon 16 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 NP@2.1.2 .]
Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3) 11 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. with] … with P(r6|r3,r4) 12 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [with .] ... with 13 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] 14 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [. Sharon] P(r7|r3,r4) 11’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. and] … and P(r’6|r3,r4) 12’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [and .] ... and 13’ 14’ 15 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [Sharon .] Sharon 16 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 NP@2.1.2 .] 17 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 PP@2.1 .]
Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3) 11 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. with] … with P(r6|r3,r4) 12 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [with .] ... with 13 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] 14 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [. Sharon] P(r7|r3,r4) 11’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. and] … and P(r’6|r3,r4) 12’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [and .] ... and 13’ 14’ 15 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [Sharon .] Sharon 16 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 NP@2.1.2 .] 17 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 PP@2.1 .] 18 [<s>. IP@є</s>] [NP@1 VP@2 .]
Handling Branch
Complexity Analysis Rule Markov Model CKY style bottom-up decoder O(nc|V|g-1) n:sentence length c:max number of incoming edge for each node V:target-language vocabulary, g:order of n-gram language model CKY style bottom-up decoder O(nCm-1|V|4(g-1)) C:max number of outgoing hyperedge for each node m:order of the rule Model
EXPERIMENTS
Experiment setup Training corpus Development set Test set 1.5M sentence pair 38M/32M words of Chinese/English Development set 2006 NIST MT Evaluation test set (616 sentences) Test set 2008 NIST MT Evaluation test set (691 sentences)
Main Results
Main Results Markov model value obtained by computing the product of their probability
Analysis Effect of pruning
Analysis Robustness & vertical composed rules
Analysis Rule Markov Model with fully composed rule
Discussion