Approximate Factoring for A* Search Aria Haghighi, John DeNero, and Dan Klein Computer Science Division University of California Berkeley
Inference for NLP Tasks A* Search
Inference as Search y a1a1 a2a2 a3a3 Partial Hypothesis a2a2
VP S NP Bitext Parsing as Search translation is hard, la traducción es dificil Weighted Synchronous Grammar Parsing O(n 6 ) Modified CKY over bi-spans (X[i,j],X’[i’,j’]) Source Target VP S NP SS’
A* Search Completion ScoreScore So Far y
A* Search Heuristic Design Tight small Admissible Efficient to compute This way hypothesis! A* Heuristic Man Optimal Result
A* Example: Bitext Search Viterbi Inside Score Cost So Far Bi-Span
A* Bitext Search Viterbi Outside Score Completion Score O(n6)O(n6) Ideal Heuristic
Of Stately Projections ¼ SS’ S S VP S NP S S’ S VP S NP VP’ S’ NP’VP’ S’ NP’
A* Bitext Search Suppose, Then, VP S NP S S’ VP S NP S VP’ S’ NP’
Projection Heuristic O(n3)O(n3) O(n3)O(n3) O(n6)O(n6) Klein and Manning [2003]
When models don’t factorize
Pointwise Admissibility y c( a ) x ¼s(y)¼s(y) Ás(a)Ás(a) ¼s(x)¼s(x) ¼t(y)¼t(y) Át(a)Át(a) ¼t(x)¼t(x)
When models don’t factorize Admissibility ¼s(y)¼s(y) ¼t(y)¼t(y) y
Finding Factored Costs Pointwise Gap How to find Á s and Á t ?
Finding Factored Costs Small gaps
Finding Factored Costs Pointwise Admissibility
Finding Factored Costs
Bitext Experiments Synchronous Tree-to-Tree Transducer Trained on 40k sentences of English-Spanish Europarl [Galley et. al, 2004] Rare words replaced with POS tags Tested on 1,200 sent. max length 5-15 Optimization Problem Solved only once per grammar 206K Variables 160K Constraints 29 minutes
Bitext Experiments
Zhang and Gildea (2006)
Bitext Experiments Zhang and Gildea (2006)
Lexicalized Parsing NP- (translation,NN) S- (is,VBZ) VP-(is,VBZ) (is,VBZ) (translation, NN) NP S VP Klein and Manning [2003]
Lexicalized Parsing
Too many constraints to efficiently solve! Over 64e 13 possible lexicalized rules
Lexicalized Parsing
Lexicalized Model Experiments Standard Setup Train on section 2-21 of the treebank Test on section 23 (length · 40) Models Tested Factored model [Klein and Manning, 2003] Non-Factored Model
Lexicalized Parsing Factored Model [Klein and Manning, 2003]
Lexicalized Parsing Non-Factored Model
Conclusions General technique for generating A* estimates Can explicitly control admissibility tightness trade-off Future Work: Explore different objectives and applications
Thanks