Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.

Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein

The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]

The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]  Head lexicalization [Collins ’99, Charniak ’00]

The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]  Head lexicalization [Collins ’99, Charniak ’00]  Automatic clustering?

Previous Work: Manual Annotation  Manually split categories  NP: subject vs object  DT: determiners vs demonstratives  IN: sentential vs prepositional  Advantages:  Fairly compact grammar  Linguistic motivations  Disadvantages:  Performance leveled out  Manually annotated [Klein & Manning ’03] ModelF1 Naïve Treebank Grammar72.6 Klein & Manning ’0386.3

Previous Work: Automatic Annotation Induction  Advantages:  Automatically learned: Label all nodes with latent variables. Same number k of subcategories for all categories.  Disadvantages:  Grammar gets too large  Most categories are oversplit while others are undersplit. [Matsuzaki et. al ’05, Prescher ’05] ModelF1 Klein & Manning ’0386.3 Matsuzaki et al. ’0586.7

Previous work is complementary Manual Annotation Allocates splits where needed Very tedious Compact Grammar Misses Features Automatic Annotation Splits uniformly Automatically learned Large Grammar Captures many features This Work Allocates splits where needed Automatically learned Compact Grammar Captures many features

Forward Learning Latent Annotations EM algorithm: X1X1 X2X2 X7X7 X4X4 X5X5 X6X6 X3X3 Hewasright.  Brackets are known  Base categories are known  Only induce subcategories Just like Forward-Backward for HMMs. Backward

Overview Limit of computational resources - Hierarchical Training - Adaptive Splitting - Parameter Smoothing

Refinement of the DT tag DT DT-1 DT-2 DT-3 DT-4

Refinement of the DT tag DT

Hierarchical refinement of the DT tag

Hierarchical Estimation Results ModelF1 Baseline87.3 Hierarchical Training88.4

Refinement of the, tag  Splitting all categories the same amount is wasteful:

The DT tag revisited Oversplit?

Adaptive Splitting  Want to split complex categories more  Idea: split everything, roll back splits which were least useful

Adaptive Splitting  Evaluate loss in likelihood from removing each split = Data likelihood with split reversed Data likelihood with split  No loss in accuracy when 50% of the splits are reversed.

Adaptive Splitting Results ModelF1 Previous88.4 With 50% Merging89.5

Number of Phrasal Subcategories

PP VP NPNP

Number of Phrasal Subcategories X NA C

Number of Lexical Subcategories TOTO, PO S

Number of Lexical Subcategories IN DT RBRB VBx

Number of Lexical Subcategories N NN S NN P JJ

Smoothing  Heavy splitting can lead to overfitting  Idea: Smoothing allows us to pool statistics

Linear Smoothing

Result Overview

ModelF1 Previous89.5 With Smoothing90.7

Final Results F1 ≤ 40 words F1 all words Parser Klein & Manning ’0386.385.7 Matsuzaki et al. ’0586.786.1 This Work90.289.7

Final Results F1 ≤ 40 words F1 all words Parser Klein & Manning ’0386.385.7 Matsuzaki et al. ’0586.786.1 Collins ’9988.688.2 Charniak & Johnson ’0590.189.6 This Work90.289.7

Linguistic Candy  Proper Nouns (NNP):  Personal pronouns (PRP): NNP-14Oct.Nov.Sept. NNP-12JohnRobertJames NNP-2J.E.L. NNP-1BushNoriegaPeters NNP-15NewSanWall NNP-3YorkFranciscoStreet PRP-0ItHeI PRP-1ithethey PRP-2itthemhim

Linguistic Candy  Relative adverbs (RBR):  Cardinal Numbers (CD): RBR-0furtherlowerhigher RBR-1morelessMore RBR-2earlierEarlierlater CD-7onetwoThree CD-4198919901988 CD-11millionbilliontrillion CD-0150100 CD-313031 CD-9785834

Conclusions  New Ideas:  Hierarchical Training  Adaptive Splitting  Parameter Smoothing  State of the Art Parsing Performance:  Improves from X-Bar initializer 63.4 to 90.2  Linguistically interesting grammars to sift through.

Thank You! petrov@eecs.berkeley.edu

Other things we tried  X-Bar vs structurally annotated grammar:  X-Bar grammar starts at lower performance, but provides more flexibility  Better Smoothing:  Tried different (hierarchical) smoothing methods, all worked about the same  (Linguistically) constraining rewrite possibilities between subcategories:  Hurts performance  EM automatically learns that most subcategory combinations are meaningless: ≥ 90% of the possible rewrites have 0 probability

Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.

Similar presentations

Presentation on theme: "Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.

Similar presentations

Presentation on theme: "Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein."— Presentation transcript:

Similar presentations

About project

Feedback