Download presentation
Presentation is loading. Please wait.
Published byTomas Sorrells Modified over 9 years ago
1
Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein
2
The Game of Designing a Grammar Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98]
3
The Game of Designing a Grammar Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00]
4
The Game of Designing a Grammar Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00] Automatic clustering?
5
Forward Learning Latent Annotations EM algorithm: X1X1 X2X2 X7X7 X4X4 X5X5 X6X6 X3X3 Hewasright. Brackets are known Base categories are known Only induce subcategories Just like Forward-Backward for HMMs. Backward [Matsuzaki et al. ‘05]
6
Overview Limit of computational resources - Hierarchical Training - Adaptive Splitting - Parameter Smoothing
7
Refinement of the DT tag DT-1 DT-2 DT-3 DT-4 DT
8
Refinement of the DT tag DT
9
Hierarchical refinement of the DT tag DT
10
Hierarchical Estimation Results ModelF1 Baseline87.3 Hierarchical Training88.4
11
Refinement of the, tag Splitting all categories the same amount is wasteful:
12
Adaptive Splitting Want to split complex categories more Idea: split everything, roll back splits which were least useful Likelihood with split reversed Likelihood with split
13
Adaptive Splitting Want to split complex categories more Idea: split everything, roll back splits which were least useful Likelihood with split reversed Likelihood with split
14
Adaptive Splitting Results ModelF1 Previous88.4 With 50% Merging89.5
15
Number of Phrasal Subcategories
16
PP VP NPNP Number of Phrasal Subcategories
17
X NA C Number of Phrasal Subcategories
18
TOTO, PO S Number of Lexical Subcategories
19
N NN S NN P JJ
20
Smoothing Heavy splitting can lead to overfitting Idea: Smoothing allows us to pool statistics
21
ModelF1 Previous89.5 With Smoothing90.7 Result Overview
22
Proper Nouns (NNP): Personal pronouns (PRP): NNP-14Oct.Nov.Sept. NNP-12JohnRobertJames NNP-2J.E.L. NNP-1BushNoriegaPeters NNP-15NewSanWall NNP-3YorkFranciscoStreet PRP-0ItHeI PRP-1ithethey PRP-2itthemhim Linguistic Candy
23
Relative adverbs (RBR): Cardinal Numbers (CD): RBR-0furtherlowerhigher RBR-1morelessMore RBR-2earlierEarlierlater CD-7onetwoThree CD-4198919901988 CD-11millionbilliontrillion CD-0150100 CD-313031 CD-9785834
24
Inference She heard the noise. Exhaustive parsing: 1 min per sentence
25
Coarse-to-Fine Parsing [Goodman ‘97, Charniak&Johnson ‘05] Coarse grammar NP … VP Treebank Parse Prune NP-17 NP-12 NP-1 VP-6 VP-31… Refined grammar … Parse
26
Hierarchical Pruning Consider again the span 5 to 12: …QPNPVP… coarse: split in two: …QP1QP2NP1NP2VP1VP2… …QP1 QP3QP4NP1NP2NP3NP4VP1VP2VP3VP4… split in four: split in eight: …………………………………………… < t
27
Intermediate Grammars X-Bar= G 0 G= G1G2G3G4G5G6G1G2G3G4G5G6 Learning DT 1 DT 2 DT 3 DT 4 DT 5 DT 6 DT 7 DT 8 DT 1 DT 2 DT 3 DT 4 DT 1 DT DT 2
28
G1G2G3G4G5G6G1G2G3G4G5G6 Learning G1G2G3G4G5G6G1G2G3G4G5G6 Projected Grammars X-Bar= G 0 G= Projection i 0(G)1(G)2(G)3(G)4(G)5(G)0(G)1(G)2(G)3(G)4(G)5(G) G
29
Final Results (Efficiency) Parsing the development set (1600 sentences) Berkeley Parser: 10 min Implemented in Java Charniak & Johnson ‘05 Parser 19 min Implemented in C
30
Final Results (Accuracy) ≤ 40 words F1 all F1 ENG Charniak&Johnson ‘05 (generative)90.189.6 This Work90.690.1 GER Dubey ‘0576.3- This Work80.880.1 CHN Chiang et al. ‘0280.076.6 This Work86.383.4
31
Extensions Acoustic modeling Infinite Grammars Nonparametric Bayesian Learning [Petrov, Pauls & Klein ‘07] [Liang, Petrov, Jordan & Klein ‘07]
32
Conclusions Split & Merge Learning Hierarchical Training Adaptive Splitting Parameter Smoothing Hierarchical Coarse-to-Fine Inference Projections Marginalization Multi-lingual Unlexicalized Parsing
33
Thank You! http://nlp.cs.berkeley.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.