Rule Markov Models for Fast Tree-to-String Translation

Slides:



Advertisements
Similar presentations
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
Probabilistic Earley Parsing Charlie Kehoe, Spring 2004 Based on the 1995 paper by Andreas Stolcke: An Efficient Probabilistic Context-Free Parsing Algorithm.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
A Scalable Decoder for Parsing-based Machine Translation with Equivalent Language Model State Maintenance Zhifei Li and Sanjeev Khudanpur Johns Hopkins.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Topological Surgery Progressive Forest Split Papers by Gabriel Taubin et al Presented by João Comba.
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Syntax Directed Translation. Syntax directed translation Yacc can do a simple kind of syntax directed translation from an input sentence to C code We.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author : Michela Becchi 、 Patrick Crowley Publisher : ANCS’07 Presenter : Wen-Tse Liang.
Efficient Minimal Perfect Hash Language Models David Guthrie, Mark Hepple, Wei Liu University of Sheffield.
Some Probability Theory and Computational models A short overview.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-based Semantic Role Labeling Hao Xiong, Haitao Mi, Yang Liu and Qun Liu Institute of Computing Technology Academy.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Natural Language Processing Vasile Rus
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
Statistical NLP Winter 2009
Context-free grammars, derivation trees, and ambiguity
Unit-3 Bottom-Up-Parsing.
Authorship Attribution Using Probabilistic Context-Free Grammars
Semantic Parsing for Question Answering
Table-driven parsing Parsing performed by a finite state machine.
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Simplifications of Context-Free Grammars
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
CKY Parser 0Book 1 the 2 flight 3 through 4 Houston5 11/16/2018
Recitation Akshay Srivatsan
CS 388: Natural Language Processing: Syntactic Parsing
Training Tree Transducers
CSCI 5832 Natural Language Processing
Zhifei Li and Sanjeev Khudanpur Johns Hopkins University
Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing
N-Gram Model Formulas Word sequences Chain rule of probability
Early Profile Pruning on XML-aware Publish-Subscribe Systems
Statistical Machine Translation Papers from COLING 2004
SYNTAX DIRECTED DEFINITION
LR(1) grammars The Chinese University of Hong Kong Fall 2011
CPSC 503 Computational Linguistics
Learning linguistic structure with simple recurrent neural networks
Written by Yoshihiko Hasegawa and Hitoshi Iba
Motion Graphs Davey Krill May 3, 2006.
Probabilistic Parsing
Clustering.
Dekai Wu Presented by David Goss-Grubbs
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
David Kauchak CS159 – Spring 2019
Prof. Pushpak Bhattacharyya, IIT Bombay
Presentation transcript:

Rule Markov Models for Fast Tree-to-String Translation Author Ashish Vaswani USC Liang Huang USC David Chiang USC Haitao Mi Chinese Academy of Sciences Presenter Justin Chiu

Recall Last Week Binarized Forest-to-Tree Translation Tree to String translation Constructing composed rules Weakening independence assumptions Having redundant grammars Training and decoding needs more time

This Week Tree to String translation Focus on minimal rules rules cannot be formed out of other rules Constructing rule Markov Model for Translation

Contribution A comparison between rule Markov Model and composed rule method RMM > minimal rules RMM = vertically composed rules, 30% faster RMM ≈ full composed rule, save space and time Methods for pruning rule Markov Model Fast decoder with rule Markov Model

RULE MARKOV MODEL

Tree-to-string Grammar IP@є IP@є

Tree-to-string Grammar IP@є NP@1 VP@2 NP@1 VP@2

Tree-to-string Grammar IP@є Bush VP@2.2 PP@2.1 NP@1 VP@2 布希 PP@2.1 VP@2.2

Tree-to-string Grammar IP@є Bush held talks P@2.1.1 NP@2.1.2 NP@1 VP@2 布希 PP@2.1 VP@2.2 P@2.1.1 NP@2.1.2 VV AS NP 舉行 了 會談

Tree-to-string Grammar IP@є Bush held talks with Sharon NP@1 VP@2 布希 PP@2.1 VP@2.2 P@2.1.1 NP@2.1.2 VV AS NP 與 夏隆 舉行 了 會談

Probability of a derivation tree T For any node r = parent of r (є means no parent) = grandparent of r = anc1(r) * anc2(r)… ancn(r) P(r1| є)=probability of generate the root node Probability of a derivation tree T

Example r1 IP@є Bush held talks with Sharon r2 r3 NP@1 VP@2 r4 r5 布希 PP@2.1 VP@2.2 r6 r7 P@2.1.1 NP@2.1.2 VV AS NP 與 夏隆 舉行 了 會談

Example r1 IP@є P(T)=P(r1|є)P(r2|r1)P(r3|r1)P(r4|r1,r3)P(r6|r1,r3,r4)P(r7|r1,r3,r4)P(r5|r1,r3) r2 r3 NP@1 VP@2 r4 r5 布希 PP@2.1 VP@2.2 r6 r7 P@2.1.1 NP@2.1.2 VV AS NP

Training for rule Markov Model From Galley et al.(2004) What’s in a translation rule? Can be trained on the path set of these derivation trees

Smoothing for rule Markov Model    

Pruning rule Markov Model RM-A: Keep context only more than P unique rules were observe, P=12 RM-B: Keep context only observed more than P times, P=12 RM-C: A context is added if the KL-divergence between its predictive distribution and that of its parent is above a threshold

DECODING WITH RULE MARKOV MODEL

Decoding Algorithm Input: Input parsing tree with tree address Decoder maintain a stack of active rules The dot(.) indicates the next symbol to process in the English word order

Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 2 3 4 5 6 7 8 9 10 11 12

Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 3 4 5 6 7 8 9 10 11 12

Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 4 5 6 7 8 9 10 11 12

Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 [<s>. IP@є</s>] [. NP@1 VP@2][Bush .] … Bush 4 5 6 7 8 9 10 11 12

Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 [<s>. IP@є</s>] [. NP@1 VP@2][Bush .] … Bush 4 [<s>. IP@є</s>] [NP@1 . VP@2] 5 6 7 8 9 10 11 12

Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 [<s>. IP@є</s>] [. NP@1 VP@2][Bush .] … Bush 4 [<s>. IP@є</s>] [NP@1 . VP@2] 5 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] P(r3|r1) 6 7 8 9 10 11 12

Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 [<s>. IP@є</s>] [. NP@1 VP@2][Bush .] … Bush 4 [<s>. IP@є</s>] [NP@1 . VP@2] 5 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] P(r3|r1) 6 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [. held talk] P(r5|r1,r3) 7 8 9 10 11 12

Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 [<s>. IP@є</s>] [. NP@1 VP@2][Bush .] … Bush 4 [<s>. IP@є</s>] [NP@1 . VP@2] 5 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] P(r3|r1) 6 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [. held talk] P(r5|r1,r3) 7 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [held . talk] … held 8 9 10 11 12

Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 [<s>. IP@є</s>] [. NP@1 VP@2][Bush .] … Bush 4 [<s>. IP@є</s>] [NP@1 . VP@2] 5 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] P(r3|r1) 6 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [. held talk] P(r5|r1,r3) 7 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [held . talk] … held 8 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [held talk . ] … talks 9 10 11 12

Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 [<s>. IP@є</s>] [. NP@1 VP@2][Bush .] … Bush 4 [<s>. IP@є</s>] [NP@1 . VP@2] 5 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] P(r3|r1) 6 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [. held talk] P(r5|r1,r3) 7 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [held . talk] … held 8 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [held talk . ] … talks 9 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] 10 11 12

Sample Decoding Stack Hyp. MR prob. [<s>. IP@є</s>] [<s>. IP@є</s>] <s> 1 [<s>. IP@є</s>] [. NP@1 VP@2] P(r1|є) 2 [<s>. IP@є</s>] [. NP@1 VP@2][. Bush] P(r2|r1) 3 [<s>. IP@є</s>] [. NP@1 VP@2][Bush .] … Bush 4 [<s>. IP@є</s>] [NP@1 . VP@2] 5 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] P(r3|r1) 6 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [. held talk] P(r5|r1,r3) 7 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [held . talk] … held 8 [<s>. IP@є</s>] [NP@1 . VP@2] [. VP@2.2 PP@2.1] [held talk . ] … talks 9 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] P(r4|r1,r3) 11 12

Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3)

Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3) 11 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. with] … with P(r6|r3,r4) 11’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. and] … and P(r’6|r3,r4)

Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3) 11 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. with] … with P(r6|r3,r4) 12 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [with .] ... with 11’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. and] … and P(r’6|r3,r4) 12’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [and .] ... and

Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3) 11 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. with] … with P(r6|r3,r4) 12 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [with .] ... with 13 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] 11’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. and] … and P(r’6|r3,r4) 12’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [and .] ... and 13’

Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3) 11 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. with] … with P(r6|r3,r4) 12 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [with .] ... with 13 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] 14 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [. Sharon] P(r7|r3,r4) 11’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. and] … and P(r’6|r3,r4) 12’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [and .] ... and 13’ 14’

Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3) 11 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. with] … with P(r6|r3,r4) 12 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [with .] ... with 13 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] 14 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [. Sharon] P(r7|r3,r4) 11’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. and] … and P(r’6|r3,r4) 12’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [and .] ... and 13’ 14’ 15 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [Sharon .] Sharon

Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3) 11 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. with] … with P(r6|r3,r4) 12 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [with .] ... with 13 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] 14 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [. Sharon] P(r7|r3,r4) 11’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. and] … and P(r’6|r3,r4) 12’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [and .] ... and 13’ 14’ 15 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [Sharon .] Sharon 16 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 NP@2.1.2 .]

Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3) 11 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. with] … with P(r6|r3,r4) 12 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [with .] ... with 13 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] 14 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [. Sharon] P(r7|r3,r4) 11’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. and] … and P(r’6|r3,r4) 12’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [and .] ... and 13’ 14’ 15 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [Sharon .] Sharon 16 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 NP@2.1.2 .] 17 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 PP@2.1 .]

Sample Decoding Stack Hyp. MR prob. 10 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] talks P(r4|r1,r3) 11 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. with] … with P(r6|r3,r4) 12 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [with .] ... with 13 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] 14 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [. Sharon] P(r7|r3,r4) 11’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [. and] … and P(r’6|r3,r4) 12’ [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [. P@2.1.1 NP@2.1.2] [and .] ... and 13’ 14’ 15 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 . NP@2.1.2] [Sharon .] Sharon 16 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 . PP@2.1] [P@2.1.1 NP@2.1.2 .] 17 [<s>. IP@є</s>] [NP@1 . VP@2] [VP@2.2 PP@2.1 .] 18 [<s>. IP@є</s>] [NP@1 VP@2 .]

Handling Branch

Complexity Analysis Rule Markov Model CKY style bottom-up decoder O(nc|V|g-1) n:sentence length c:max number of incoming edge for each node V:target-language vocabulary, g:order of n-gram language model CKY style bottom-up decoder O(nCm-1|V|4(g-1)) C:max number of outgoing hyperedge for each node m:order of the rule Model

EXPERIMENTS

Experiment setup Training corpus Development set Test set 1.5M sentence pair 38M/32M words of Chinese/English Development set 2006 NIST MT Evaluation test set (616 sentences) Test set 2008 NIST MT Evaluation test set (691 sentences)

Main Results

Main Results Markov model value obtained by computing the product of their probability

Analysis Effect of pruning

Analysis Robustness & vertical composed rules

Analysis Rule Markov Model with fully composed rule

Discussion