Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.
Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley.
Inducing Structure for Perception Slav Petrov Advisors: Dan Klein, Jitendra Malik Collaborators: L. Barrett, R. Thibaux, A. Faria, A. Pauls, P. Liang,
Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Albert Gatt Corpora and Statistical Methods Lecture 11.
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.
Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.
Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010.
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
CS 388: Natural Language Processing: Statistical Parsing
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,
In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 5.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
What is the internet? 
Hand video 
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
Statistical NLP Spring 2010 Lecture 13: Parsing II Dan Klein – UC Berkeley.
Spring /22/071 Beyond PCFGs Chris Brew Ohio State University.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
1 CS546: Machine Learning and Natural Language Latent-Variable Models for Structured Prediction Problems: Syntactic Parsing Slides / Figures from Slav.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
ADVANCED PARSING David Kauchak CS159 – Fall 2014 some slides adapted from Dan Klein.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Albert Gatt Corpora and Statistical Methods Lecture 11.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.
Statistical NLP Spring 2010 Lecture 14: PCFGs Dan Klein – UC Berkeley.
CSA2050 Introduction to Computational Linguistics Parsing I.
Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.
Supertagging CMSC Natural Language Processing January 31, 2006.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Hand video 
Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Question Classification Ling573 NLP Systems and Applications April 25, 2013.
Statistical Parsing IP disclosure: Content borrowed from J&M 3 rd edition and Raymond Mooney.
CSC 594 Topics in AI – Natural Language Processing
COSC 6336 Natural Language Processing Statistical Parsing
Introduction to Machine Learning and Text Mining
CSE 517 Natural Language Processing Winter 2015
David Mareček and Zdeněk Žabokrtský
Authorship Attribution Using Probabilistic Context-Free Grammars
Statistical NLP Spring 2011
Statistical NLP Spring 2011
CS 388: Natural Language Processing: Statistical Parsing
LING/C SC 581: Advanced Computational Linguistics
Probabilistic Parsing
David Kauchak CS159 – Spring 2019
Presentation transcript:

Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein

The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]

The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]  Head lexicalization [Collins ’99, Charniak ’00]

The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]  Head lexicalization [Collins ’99, Charniak ’00]  Automatic clustering?

Previous Work: Manual Annotation  Manually split categories  NP: subject vs object  DT: determiners vs demonstratives  IN: sentential vs prepositional  Advantages:  Fairly compact grammar  Linguistic motivations  Disadvantages:  Performance leveled out  Manually annotated [Klein & Manning ’03] ModelF1 Naïve Treebank Grammar72.6 Klein & Manning ’0386.3

Previous Work: Automatic Annotation Induction  Advantages:  Automatically learned: Label all nodes with latent variables. Same number k of subcategories for all categories.  Disadvantages:  Grammar gets too large  Most categories are oversplit while others are undersplit. [Matsuzaki et. al ’05, Prescher ’05] ModelF1 Klein & Manning ’ Matsuzaki et al. ’0586.7

Previous work is complementary Manual Annotation Allocates splits where needed Very tedious Compact Grammar Misses Features Automatic Annotation Splits uniformly Automatically learned Large Grammar Captures many features This Work Allocates splits where needed Automatically learned Compact Grammar Captures many features

Forward Learning Latent Annotations EM algorithm: X1X1 X2X2 X7X7 X4X4 X5X5 X6X6 X3X3 Hewasright.  Brackets are known  Base categories are known  Only induce subcategories Just like Forward-Backward for HMMs. Backward

Overview Limit of computational resources - Hierarchical Training - Adaptive Splitting - Parameter Smoothing

Refinement of the DT tag DT DT-1 DT-2 DT-3 DT-4

Refinement of the DT tag DT

Hierarchical refinement of the DT tag

Hierarchical Estimation Results ModelF1 Baseline87.3 Hierarchical Training88.4

Refinement of the, tag  Splitting all categories the same amount is wasteful:

The DT tag revisited Oversplit?

Adaptive Splitting  Want to split complex categories more  Idea: split everything, roll back splits which were least useful

Adaptive Splitting  Want to split complex categories more  Idea: split everything, roll back splits which were least useful

Adaptive Splitting  Want to split complex categories more  Idea: split everything, roll back splits which were least useful

Adaptive Splitting  Evaluate loss in likelihood from removing each split = Data likelihood with split reversed Data likelihood with split  No loss in accuracy when 50% of the splits are reversed.

Adaptive Splitting Results ModelF1 Previous88.4 With 50% Merging89.5

Number of Phrasal Subcategories

PP VP NPNP

Number of Phrasal Subcategories X NA C

Number of Lexical Subcategories TOTO, PO S

Number of Lexical Subcategories IN DT RBRB VBx

Number of Lexical Subcategories N NN S NN P JJ

Smoothing  Heavy splitting can lead to overfitting  Idea: Smoothing allows us to pool statistics

Linear Smoothing

Result Overview

ModelF1 Previous89.5 With Smoothing90.7

Final Results F1 ≤ 40 words F1 all words Parser Klein & Manning ’ Matsuzaki et al. ’ This Work

Final Results F1 ≤ 40 words F1 all words Parser Klein & Manning ’ Matsuzaki et al. ’ Collins ’ Charniak & Johnson ’ This Work

Linguistic Candy  Proper Nouns (NNP):  Personal pronouns (PRP): NNP-14Oct.Nov.Sept. NNP-12JohnRobertJames NNP-2J.E.L. NNP-1BushNoriegaPeters NNP-15NewSanWall NNP-3YorkFranciscoStreet PRP-0ItHeI PRP-1ithethey PRP-2itthemhim

Linguistic Candy  Relative adverbs (RBR):  Cardinal Numbers (CD): RBR-0furtherlowerhigher RBR-1morelessMore RBR-2earlierEarlierlater CD-7onetwoThree CD CD-11millionbilliontrillion CD CD CD

Conclusions  New Ideas:  Hierarchical Training  Adaptive Splitting  Parameter Smoothing  State of the Art Parsing Performance:  Improves from X-Bar initializer 63.4 to 90.2  Linguistically interesting grammars to sift through.

Thank You!

Other things we tried  X-Bar vs structurally annotated grammar:  X-Bar grammar starts at lower performance, but provides more flexibility  Better Smoothing:  Tried different (hierarchical) smoothing methods, all worked about the same  (Linguistically) constraining rewrite possibilities between subcategories:  Hurts performance  EM automatically learns that most subcategory combinations are meaningless: ≥ 90% of the possible rewrites have 0 probability