Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein
The Game of Designing a Grammar Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98]
The Game of Designing a Grammar Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00]
The Game of Designing a Grammar Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00] Automatic clustering?
Forward Learning Latent Annotations EM algorithm: X1X1 X2X2 X7X7 X4X4 X5X5 X6X6 X3X3 Hewasright. Brackets are known Base categories are known Only induce subcategories Just like Forward-Backward for HMMs. Backward [Matsuzaki et al. ‘05]
Overview Limit of computational resources - Hierarchical Training - Adaptive Splitting - Parameter Smoothing
Refinement of the DT tag DT-1 DT-2 DT-3 DT-4 DT
Refinement of the DT tag DT
Hierarchical refinement of the DT tag DT
Hierarchical Estimation Results ModelF1 Baseline87.3 Hierarchical Training88.4
Refinement of the, tag Splitting all categories the same amount is wasteful:
Adaptive Splitting Want to split complex categories more Idea: split everything, roll back splits which were least useful Likelihood with split reversed Likelihood with split
Adaptive Splitting Want to split complex categories more Idea: split everything, roll back splits which were least useful Likelihood with split reversed Likelihood with split
Adaptive Splitting Results ModelF1 Previous88.4 With 50% Merging89.5
Number of Phrasal Subcategories
PP VP NPNP Number of Phrasal Subcategories
X NA C Number of Phrasal Subcategories
TOTO, PO S Number of Lexical Subcategories
N NN S NN P JJ
Smoothing Heavy splitting can lead to overfitting Idea: Smoothing allows us to pool statistics
ModelF1 Previous89.5 With Smoothing90.7 Result Overview
Proper Nouns (NNP): Personal pronouns (PRP): NNP-14Oct.Nov.Sept. NNP-12JohnRobertJames NNP-2J.E.L. NNP-1BushNoriegaPeters NNP-15NewSanWall NNP-3YorkFranciscoStreet PRP-0ItHeI PRP-1ithethey PRP-2itthemhim Linguistic Candy
Relative adverbs (RBR): Cardinal Numbers (CD): RBR-0furtherlowerhigher RBR-1morelessMore RBR-2earlierEarlierlater CD-7onetwoThree CD CD-11millionbilliontrillion CD CD CD
Inference She heard the noise. Exhaustive parsing: 1 min per sentence
Coarse-to-Fine Parsing [Goodman ‘97, Charniak&Johnson ‘05] Coarse grammar NP … VP Treebank Parse Prune NP-17 NP-12 NP-1 VP-6 VP-31… Refined grammar … Parse
Hierarchical Pruning Consider again the span 5 to 12: …QPNPVP… coarse: split in two: …QP1QP2NP1NP2VP1VP2… …QP1 QP3QP4NP1NP2NP3NP4VP1VP2VP3VP4… split in four: split in eight: …………………………………………… < t
Intermediate Grammars X-Bar= G 0 G= G1G2G3G4G5G6G1G2G3G4G5G6 Learning DT 1 DT 2 DT 3 DT 4 DT 5 DT 6 DT 7 DT 8 DT 1 DT 2 DT 3 DT 4 DT 1 DT DT 2
G1G2G3G4G5G6G1G2G3G4G5G6 Learning G1G2G3G4G5G6G1G2G3G4G5G6 Projected Grammars X-Bar= G 0 G= Projection i 0(G)1(G)2(G)3(G)4(G)5(G)0(G)1(G)2(G)3(G)4(G)5(G) G
Final Results (Efficiency) Parsing the development set (1600 sentences) Berkeley Parser: 10 min Implemented in Java Charniak & Johnson ‘05 Parser 19 min Implemented in C
Final Results (Accuracy) ≤ 40 words F1 all F1 ENG Charniak&Johnson ‘05 (generative) This Work GER Dubey ‘ This Work CHN Chiang et al. ‘ This Work
Extensions Acoustic modeling Infinite Grammars Nonparametric Bayesian Learning [Petrov, Pauls & Klein ‘07] [Liang, Petrov, Jordan & Klein ‘07]
Conclusions Split & Merge Learning Hierarchical Training Adaptive Splitting Parameter Smoothing Hierarchical Coarse-to-Fine Inference Projections Marginalization Multi-lingual Unlexicalized Parsing
Thank You!