Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rule Markov Models for Fast Tree-to-String Translation

Similar presentations


Presentation on theme: "Rule Markov Models for Fast Tree-to-String Translation"— Presentation transcript:

1 Rule Markov Models for Fast Tree-to-String Translation
Author Ashish Vaswani USC Liang Huang USC David Chiang USC Haitao Mi Chinese Academy of Sciences Presenter Justin Chiu

2 Recall Last Week Binarized Forest-to-Tree Translation
Tree to String translation Constructing composed rules Weakening independence assumptions Having redundant grammars Training and decoding needs more time

3 This Week Tree to String translation Focus on minimal rules
rules cannot be formed out of other rules Constructing rule Markov Model for Translation

4 Contribution A comparison between rule Markov Model and composed rule method RMM > minimal rules RMM = vertically composed rules, 30% faster RMM ≈ full composed rule, save space and time Methods for pruning rule Markov Model Fast decoder with rule Markov Model

5 RULE MARKOV MODEL

6 Tree-to-string Grammar

7 Tree-to-string Grammar

8 Tree-to-string Grammar
Bush 布希

9 Tree-to-string Grammar
Bush held talks 布希 VV AS NP 舉行 會談

10 Tree-to-string Grammar
Bush held talks with Sharon 布希 VV AS NP 夏隆 舉行 會談

11 Probability of a derivation tree T
For any node r = parent of r (є means no parent) = grandparent of r = anc1(r) * anc2(r)… ancn(r) P(r1| є)=probability of generate the root node Probability of a derivation tree T

12 Example r1 IP@є Bush held talks with Sharon r2 r3 NP@1 VP@2 r4 r5 布希
r6 r7 VV AS NP 夏隆 舉行 會談

13 Example r1 P(T)=P(r1|є)P(r2|r1)P(r3|r1)P(r4|r1,r3)P(r6|r1,r3,r4)P(r7|r1,r3,r4)P(r5|r1,r3) r2 r3 r4 r5 布希 r6 r7 VV AS NP

14 Training for rule Markov Model
From Galley et al.(2004) What’s in a translation rule? Can be trained on the path set of these derivation trees

15 Smoothing for rule Markov Model

16 Pruning rule Markov Model
RM-A: Keep context only more than P unique rules were observe, P=12 RM-B: Keep context only observed more than P times, P=12 RM-C: A context is added if the KL-divergence between its predictive distribution and that of its parent is above a threshold

17 DECODING WITH RULE MARKOV MODEL

18 Decoding Algorithm Input: Input parsing tree with tree address
Decoder maintain a stack of active rules The dot(.) indicates the next symbol to process in the English word order

19 Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>]
[<s>. <s> 1 2 3 4 5 6 7 8 9 10 11 12

20 Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>]
[<s>. <s> 1 [<s>. [. P(r1|є) 2 3 4 5 6 7 8 9 10 11 12

21 Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>]
[<s>. <s> 1 [<s>. [. P(r1|є) 2 [<s>. [. Bush] P(r2|r1) 3 4 5 6 7 8 9 10 11 12

22 Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>]
[<s>. <s> 1 [<s>. [. P(r1|є) 2 [<s>. [. Bush] P(r2|r1) 3 [<s>. [. .] … Bush 4 5 6 7 8 9 10 11 12

23 Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>]
[<s>. <s> 1 [<s>. [. P(r1|є) 2 [<s>. [. Bush] P(r2|r1) 3 [<s>. [. .] … Bush 4 [<s>. . 5 6 7 8 9 10 11 12

24 Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>]
[<s>. <s> 1 [<s>. [. P(r1|є) 2 [<s>. [. Bush] P(r2|r1) 3 [<s>. [. .] … Bush 4 [<s>. . 5 [<s>. . [. P(r3|r1) 6 7 8 9 10 11 12

25 Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>]
[<s>. <s> 1 [<s>. [. P(r1|є) 2 [<s>. [. Bush] P(r2|r1) 3 [<s>. [. .] … Bush 4 [<s>. . 5 [<s>. . [. P(r3|r1) 6 [<s>. . [. [. held talk] P(r5|r1,r3) 7 8 9 10 11 12

26 Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>]
[<s>. <s> 1 [<s>. [. P(r1|є) 2 [<s>. [. Bush] P(r2|r1) 3 [<s>. [. .] … Bush 4 [<s>. . 5 [<s>. . [. P(r3|r1) 6 [<s>. . [. [. held talk] P(r5|r1,r3) 7 [<s>. . [. [held . talk] … held 8 9 10 11 12

27 Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>]
[<s>. <s> 1 [<s>. [. P(r1|є) 2 [<s>. [. Bush] P(r2|r1) 3 [<s>. [. .] … Bush 4 [<s>. . 5 [<s>. . [. P(r3|r1) 6 [<s>. . [. [. held talk] P(r5|r1,r3) 7 [<s>. . [. [held . talk] … held 8 [<s>. . [. [held talk . ] … talks 9 10 11 12

28 Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>]
[<s>. <s> 1 [<s>. [. P(r1|є) 2 [<s>. [. Bush] P(r2|r1) 3 [<s>. [. .] … Bush 4 [<s>. . 5 [<s>. . [. P(r3|r1) 6 [<s>. . [. [. held talk] P(r5|r1,r3) 7 [<s>. . [. [held . talk] … held 8 [<s>. . [. [held talk . ] … talks 9 [<s> 10 11 12

29 Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>]
[<s>. <s> 1 [<s>. [. P(r1|є) 2 [<s>. [. Bush] P(r2|r1) 3 [<s>. [. .] … Bush 4 [<s>. . 5 [<s>. . [. P(r3|r1) 6 [<s>. . [. [. held talk] P(r5|r1,r3) 7 [<s>. . [. [held . talk] … held 8 [<s>. . [. [held talk . ] … talks 9 [<s> 10 [<s> [. P(r4|r1,r3) 11 12

30 Sample Decoding Stack Hyp. MR prob. 10
[<s> [. talks P(r4|r1,r3)

31 Sample Decoding Stack Hyp. MR prob. 10
[<s> [. talks P(r4|r1,r3) 11 [<s> [. [. with] … with P(r6|r3,r4) 11’ [<s> [. [. and] … and P(r’6|r3,r4)

32 Sample Decoding Stack Hyp. MR prob. 10
[<s> [. talks P(r4|r1,r3) 11 [<s> [. [. with] … with P(r6|r3,r4) 12 [<s> [. [with .] ... with 11’ [<s> [. [. and] … and P(r’6|r3,r4) 12’ [<s> [. [and .] ... and

33 Sample Decoding Stack Hyp. MR prob. 10
[<s> [. talks P(r4|r1,r3) 11 [<s> [. [. with] … with P(r6|r3,r4) 12 [<s> [. [with .] ... with 13 [<s> 11’ [<s> [. [. and] … and P(r’6|r3,r4) 12’ [<s> [. [and .] ... and 13’

34 Sample Decoding Stack Hyp. MR prob. 10
[<s> [. talks P(r4|r1,r3) 11 [<s> [. [. with] … with P(r6|r3,r4) 12 [<s> [. [with .] ... with 13 [<s> 14 [<s> [. Sharon] P(r7|r3,r4) 11’ [<s> [. [. and] … and P(r’6|r3,r4) 12’ [<s> [. [and .] ... and 13’ 14’

35 Sample Decoding Stack Hyp. MR prob. 10
[<s> [. talks P(r4|r1,r3) 11 [<s> [. [. with] … with P(r6|r3,r4) 12 [<s> [. [with .] ... with 13 [<s> 14 [<s> [. Sharon] P(r7|r3,r4) 11’ [<s> [. [. and] … and P(r’6|r3,r4) 12’ [<s> [. [and .] ... and 13’ 14’ 15 [<s> [Sharon .] Sharon

36 Sample Decoding Stack Hyp. MR prob. 10
[<s> [. talks P(r4|r1,r3) 11 [<s> [. [. with] … with P(r6|r3,r4) 12 [<s> [. [with .] ... with 13 [<s> 14 [<s> [. Sharon] P(r7|r3,r4) 11’ [<s> [. [. and] … and P(r’6|r3,r4) 12’ [<s> [. [and .] ... and 13’ 14’ 15 [<s> [Sharon .] Sharon 16 [<s> ]

37 Sample Decoding Stack Hyp. MR prob. 10
[<s> [. talks P(r4|r1,r3) 11 [<s> [. [. with] … with P(r6|r3,r4) 12 [<s> [. [with .] ... with 13 [<s> 14 [<s> [. Sharon] P(r7|r3,r4) 11’ [<s> [. [. and] … and P(r’6|r3,r4) 12’ [<s> [. [and .] ... and 13’ 14’ 15 [<s> [Sharon .] Sharon 16 [<s> ] 17 [<s> ]

38 Sample Decoding Stack Hyp. MR prob. 10
[<s> [. talks P(r4|r1,r3) 11 [<s> [. [. with] … with P(r6|r3,r4) 12 [<s> [. [with .] ... with 13 [<s> 14 [<s> [. Sharon] P(r7|r3,r4) 11’ [<s> [. [. and] … and P(r’6|r3,r4) 12’ [<s> [. [and .] ... and 13’ 14’ 15 [<s> [Sharon .] Sharon 16 [<s> ] 17 [<s> ] 18 [<s>. .]

39 Handling Branch

40 Complexity Analysis Rule Markov Model CKY style bottom-up decoder
O(nc|V|g-1) n:sentence length c:max number of incoming edge for each node V:target-language vocabulary, g:order of n-gram language model CKY style bottom-up decoder O(nCm-1|V|4(g-1)) C:max number of outgoing hyperedge for each node m:order of the rule Model

41 EXPERIMENTS

42 Experiment setup Training corpus Development set Test set
1.5M sentence pair 38M/32M words of Chinese/English Development set 2006 NIST MT Evaluation test set (616 sentences) Test set 2008 NIST MT Evaluation test set (691 sentences)

43 Main Results

44 Main Results Markov model value obtained by computing the product of their probability

45 Analysis Effect of pruning

46 Analysis Robustness & vertical composed rules

47 Analysis Rule Markov Model with fully composed rule

48 Discussion


Download ppt "Rule Markov Models for Fast Tree-to-String Translation"

Similar presentations


Ads by Google