Download presentation
Presentation is loading. Please wait.
1
Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein
2
The Game of Designing a Grammar Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98]
3
The Game of Designing a Grammar Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00]
4
The Game of Designing a Grammar Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00] Automatic clustering?
5
Previous Work: Manual Annotation Manually split categories NP: subject vs object DT: determiners vs demonstratives IN: sentential vs prepositional Advantages: Fairly compact grammar Linguistic motivations Disadvantages: Performance leveled out Manually annotated [Klein & Manning ’03] ModelF1 Naïve Treebank Grammar72.6 Klein & Manning ’0386.3
6
Previous Work: Automatic Annotation Induction Advantages: Automatically learned: Label all nodes with latent variables. Same number k of subcategories for all categories. Disadvantages: Grammar gets too large Most categories are oversplit while others are undersplit. [Matsuzaki et. al ’05, Prescher ’05] ModelF1 Klein & Manning ’0386.3 Matsuzaki et al. ’0586.7
7
Previous work is complementary Manual Annotation Allocates splits where needed Very tedious Compact Grammar Misses Features Automatic Annotation Splits uniformly Automatically learned Large Grammar Captures many features This Work Allocates splits where needed Automatically learned Compact Grammar Captures many features
8
Forward Learning Latent Annotations EM algorithm: X1X1 X2X2 X7X7 X4X4 X5X5 X6X6 X3X3 Hewasright. Brackets are known Base categories are known Only induce subcategories Just like Forward-Backward for HMMs. Backward
9
Overview Limit of computational resources - Hierarchical Training - Adaptive Splitting - Parameter Smoothing
10
Refinement of the DT tag DT DT-1 DT-2 DT-3 DT-4
11
Refinement of the DT tag DT
12
Hierarchical refinement of the DT tag
13
Hierarchical Estimation Results ModelF1 Baseline87.3 Hierarchical Training88.4
14
Refinement of the, tag Splitting all categories the same amount is wasteful:
15
The DT tag revisited Oversplit?
16
Adaptive Splitting Want to split complex categories more Idea: split everything, roll back splits which were least useful
17
Adaptive Splitting Want to split complex categories more Idea: split everything, roll back splits which were least useful
18
Adaptive Splitting Want to split complex categories more Idea: split everything, roll back splits which were least useful
19
Adaptive Splitting Evaluate loss in likelihood from removing each split = Data likelihood with split reversed Data likelihood with split No loss in accuracy when 50% of the splits are reversed.
20
Adaptive Splitting Results ModelF1 Previous88.4 With 50% Merging89.5
21
Number of Phrasal Subcategories
22
PP VP NPNP
23
Number of Phrasal Subcategories X NA C
24
Number of Lexical Subcategories TOTO, PO S
25
Number of Lexical Subcategories IN DT RBRB VBx
26
Number of Lexical Subcategories N NN S NN P JJ
27
Smoothing Heavy splitting can lead to overfitting Idea: Smoothing allows us to pool statistics
28
Linear Smoothing
29
Result Overview
31
ModelF1 Previous89.5 With Smoothing90.7
32
Final Results F1 ≤ 40 words F1 all words Parser Klein & Manning ’0386.385.7 Matsuzaki et al. ’0586.786.1 This Work90.289.7
33
Final Results F1 ≤ 40 words F1 all words Parser Klein & Manning ’0386.385.7 Matsuzaki et al. ’0586.786.1 Collins ’9988.688.2 Charniak & Johnson ’0590.189.6 This Work90.289.7
34
Linguistic Candy Proper Nouns (NNP): Personal pronouns (PRP): NNP-14Oct.Nov.Sept. NNP-12JohnRobertJames NNP-2J.E.L. NNP-1BushNoriegaPeters NNP-15NewSanWall NNP-3YorkFranciscoStreet PRP-0ItHeI PRP-1ithethey PRP-2itthemhim
35
Linguistic Candy Relative adverbs (RBR): Cardinal Numbers (CD): RBR-0furtherlowerhigher RBR-1morelessMore RBR-2earlierEarlierlater CD-7onetwoThree CD-4198919901988 CD-11millionbilliontrillion CD-0150100 CD-313031 CD-9785834
36
Conclusions New Ideas: Hierarchical Training Adaptive Splitting Parameter Smoothing State of the Art Parsing Performance: Improves from X-Bar initializer 63.4 to 90.2 Linguistically interesting grammars to sift through.
37
Thank You! petrov@eecs.berkeley.edu
38
Other things we tried X-Bar vs structurally annotated grammar: X-Bar grammar starts at lower performance, but provides more flexibility Better Smoothing: Tried different (hierarchical) smoothing methods, all worked about the same (Linguistically) constraining rewrite possibilities between subcategories: Hurts performance EM automatically learns that most subcategory combinations are meaningless: ≥ 90% of the possible rewrites have 0 probability
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.