Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.

Similar presentations


Presentation on theme: "Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein."— Presentation transcript:

1 Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein

2 The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]

3 The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]  Head lexicalization [Collins ’99, Charniak ’00]

4 The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]  Head lexicalization [Collins ’99, Charniak ’00]  Automatic clustering?

5 Forward Learning Latent Annotations EM algorithm: X1X1 X2X2 X7X7 X4X4 X5X5 X6X6 X3X3 Hewasright.  Brackets are known  Base categories are known  Only induce subcategories Just like Forward-Backward for HMMs. Backward [Matsuzaki et al. ‘05]

6 Overview Limit of computational resources - Hierarchical Training - Adaptive Splitting - Parameter Smoothing

7 Refinement of the DT tag DT-1 DT-2 DT-3 DT-4 DT

8 Refinement of the DT tag DT

9 Hierarchical refinement of the DT tag DT

10 Hierarchical Estimation Results ModelF1 Baseline87.3 Hierarchical Training88.4

11 Refinement of the, tag  Splitting all categories the same amount is wasteful:

12 Adaptive Splitting  Want to split complex categories more  Idea: split everything, roll back splits which were least useful Likelihood with split reversed Likelihood with split

13 Adaptive Splitting  Want to split complex categories more  Idea: split everything, roll back splits which were least useful Likelihood with split reversed Likelihood with split

14 Adaptive Splitting Results ModelF1 Previous88.4 With 50% Merging89.5

15 Number of Phrasal Subcategories

16 PP VP NPNP Number of Phrasal Subcategories

17 X NA C Number of Phrasal Subcategories

18 TOTO, PO S Number of Lexical Subcategories

19 N NN S NN P JJ

20 Smoothing  Heavy splitting can lead to overfitting  Idea: Smoothing allows us to pool statistics

21 ModelF1 Previous89.5 With Smoothing90.7 Result Overview

22  Proper Nouns (NNP):  Personal pronouns (PRP): NNP-14Oct.Nov.Sept. NNP-12JohnRobertJames NNP-2J.E.L. NNP-1BushNoriegaPeters NNP-15NewSanWall NNP-3YorkFranciscoStreet PRP-0ItHeI PRP-1ithethey PRP-2itthemhim Linguistic Candy

23  Relative adverbs (RBR):  Cardinal Numbers (CD): RBR-0furtherlowerhigher RBR-1morelessMore RBR-2earlierEarlierlater CD-7onetwoThree CD-4198919901988 CD-11millionbilliontrillion CD-0150100 CD-313031 CD-9785834

24 Inference She heard the noise. Exhaustive parsing: 1 min per sentence

25 Coarse-to-Fine Parsing [Goodman ‘97, Charniak&Johnson ‘05] Coarse grammar NP … VP Treebank Parse Prune NP-17 NP-12 NP-1 VP-6 VP-31… Refined grammar … Parse

26 Hierarchical Pruning Consider again the span 5 to 12: …QPNPVP… coarse: split in two: …QP1QP2NP1NP2VP1VP2… …QP1 QP3QP4NP1NP2NP3NP4VP1VP2VP3VP4… split in four: split in eight: …………………………………………… < t

27 Intermediate Grammars X-Bar= G 0 G= G1G2G3G4G5G6G1G2G3G4G5G6 Learning DT 1 DT 2 DT 3 DT 4 DT 5 DT 6 DT 7 DT 8 DT 1 DT 2 DT 3 DT 4 DT 1 DT DT 2

28 G1G2G3G4G5G6G1G2G3G4G5G6 Learning G1G2G3G4G5G6G1G2G3G4G5G6 Projected Grammars X-Bar= G 0 G= Projection  i 0(G)1(G)2(G)3(G)4(G)5(G)0(G)1(G)2(G)3(G)4(G)5(G) G

29 Final Results (Efficiency)  Parsing the development set (1600 sentences)  Berkeley Parser:  10 min  Implemented in Java  Charniak & Johnson ‘05 Parser  19 min  Implemented in C

30 Final Results (Accuracy) ≤ 40 words F1 all F1 ENG Charniak&Johnson ‘05 (generative)90.189.6 This Work90.690.1 GER Dubey ‘0576.3- This Work80.880.1 CHN Chiang et al. ‘0280.076.6 This Work86.383.4

31 Extensions  Acoustic modeling  Infinite Grammars  Nonparametric Bayesian Learning [Petrov, Pauls & Klein ‘07] [Liang, Petrov, Jordan & Klein ‘07]

32 Conclusions  Split & Merge Learning  Hierarchical Training  Adaptive Splitting  Parameter Smoothing  Hierarchical Coarse-to-Fine Inference  Projections  Marginalization  Multi-lingual Unlexicalized Parsing

33 Thank You! http://nlp.cs.berkeley.edu


Download ppt "Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein."

Similar presentations


Ads by Google