Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.

Slides:



Advertisements
Similar presentations
Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.
Advertisements

Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley.
Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.
Inducing Structure for Perception Slav Petrov Advisors: Dan Klein, Jitendra Malik Collaborators: L. Barrett, R. Thibaux, A. Faria, A. Pauls, P. Liang,
Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.
Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov.
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Albert Gatt Corpora and Statistical Methods Lecture 11.
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.
Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010.
Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance July 27 EMNLP 2011 Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University.
CS 388: Natural Language Processing: Statistical Parsing
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
September PROBABILISTIC CFGs & PROBABILISTIC PARSING Universita’ di Venezia 3 Ottobre 2003.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 5.
Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.
Approximate Factoring for A* Search Aria Haghighi, John DeNero, and Dan Klein Computer Science Division University of California Berkeley.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
1 I256 Applied Natural Language Processing Fall 2009 Sentence Structure Barbara Rosario.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
What is the internet? 
Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.
Hand video 
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
Statistical NLP Spring 2010 Lecture 13: Parsing II Dan Klein – UC Berkeley.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
1 CS546: Machine Learning and Natural Language Latent-Variable Models for Structured Prediction Problems: Syntactic Parsing Slides / Figures from Slav.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
ADVANCED PARSING David Kauchak CS159 – Fall 2014 some slides adapted from Dan Klein.
University of Edinburgh27/10/20151 Lexical Dependency Parsing Chris Brew OhioState University.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Albert Gatt Corpora and Statistical Methods Lecture 11.
Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
Statistical NLP Spring 2010 Lecture 14: PCFGs Dan Klein – UC Berkeley.
CSA2050 Introduction to Computational Linguistics Parsing I.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
NLP. Introduction to NLP The probabilities don’t depend on the specific words –E.g., give someone something (2 arguments) vs. see something (1 argument)
NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.
CPSC 422, Lecture 28Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 28 Nov, 18, 2015.
Hand video 
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
/02/20161 Probabilistic Context Free Grammars Chris Brew Ohio State University.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.
Statistical Parsing IP disclosure: Content borrowed from J&M 3 rd edition and Raymond Mooney.
Statistical Natural Language Parsing Parsing: The rise of data and statistics.
CSC 594 Topics in AI – Natural Language Processing
COSC 6336 Natural Language Processing Statistical Parsing
Introduction to Machine Learning and Text Mining
CSE 517 Natural Language Processing Winter 2015
Statistical NLP Spring 2011
Statistical NLP Spring 2011
CS 388: Natural Language Processing: Statistical Parsing
LING/C SC 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
David Kauchak CS159 – Spring 2019
Presentation transcript:

Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein

The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]

The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]  Head lexicalization [Collins ’99, Charniak ’00]

The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]  Head lexicalization [Collins ’99, Charniak ’00]  Automatic clustering?

Forward Learning Latent Annotations EM algorithm: X1X1 X2X2 X7X7 X4X4 X5X5 X6X6 X3X3 Hewasright.  Brackets are known  Base categories are known  Only induce subcategories Just like Forward-Backward for HMMs. Backward [Matsuzaki et al. ‘05]

Overview Limit of computational resources - Hierarchical Training - Adaptive Splitting - Parameter Smoothing

Refinement of the DT tag DT-1 DT-2 DT-3 DT-4 DT

Refinement of the DT tag DT

Hierarchical refinement of the DT tag DT

Hierarchical Estimation Results ModelF1 Baseline87.3 Hierarchical Training88.4

Refinement of the, tag  Splitting all categories the same amount is wasteful:

Adaptive Splitting  Want to split complex categories more  Idea: split everything, roll back splits which were least useful Likelihood with split reversed Likelihood with split

Adaptive Splitting  Want to split complex categories more  Idea: split everything, roll back splits which were least useful Likelihood with split reversed Likelihood with split

Adaptive Splitting Results ModelF1 Previous88.4 With 50% Merging89.5

Number of Phrasal Subcategories

PP VP NPNP Number of Phrasal Subcategories

X NA C Number of Phrasal Subcategories

TOTO, PO S Number of Lexical Subcategories

N NN S NN P JJ

Smoothing  Heavy splitting can lead to overfitting  Idea: Smoothing allows us to pool statistics

ModelF1 Previous89.5 With Smoothing90.7 Result Overview

 Proper Nouns (NNP):  Personal pronouns (PRP): NNP-14Oct.Nov.Sept. NNP-12JohnRobertJames NNP-2J.E.L. NNP-1BushNoriegaPeters NNP-15NewSanWall NNP-3YorkFranciscoStreet PRP-0ItHeI PRP-1ithethey PRP-2itthemhim Linguistic Candy

 Relative adverbs (RBR):  Cardinal Numbers (CD): RBR-0furtherlowerhigher RBR-1morelessMore RBR-2earlierEarlierlater CD-7onetwoThree CD CD-11millionbilliontrillion CD CD CD

Inference She heard the noise. Exhaustive parsing: 1 min per sentence

Coarse-to-Fine Parsing [Goodman ‘97, Charniak&Johnson ‘05] Coarse grammar NP … VP Treebank Parse Prune NP-17 NP-12 NP-1 VP-6 VP-31… Refined grammar … Parse

Hierarchical Pruning Consider again the span 5 to 12: …QPNPVP… coarse: split in two: …QP1QP2NP1NP2VP1VP2… …QP1 QP3QP4NP1NP2NP3NP4VP1VP2VP3VP4… split in four: split in eight: …………………………………………… < t

Intermediate Grammars X-Bar= G 0 G= G1G2G3G4G5G6G1G2G3G4G5G6 Learning DT 1 DT 2 DT 3 DT 4 DT 5 DT 6 DT 7 DT 8 DT 1 DT 2 DT 3 DT 4 DT 1 DT DT 2

G1G2G3G4G5G6G1G2G3G4G5G6 Learning G1G2G3G4G5G6G1G2G3G4G5G6 Projected Grammars X-Bar= G 0 G= Projection  i 0(G)1(G)2(G)3(G)4(G)5(G)0(G)1(G)2(G)3(G)4(G)5(G) G

Final Results (Efficiency)  Parsing the development set (1600 sentences)  Berkeley Parser:  10 min  Implemented in Java  Charniak & Johnson ‘05 Parser  19 min  Implemented in C

Final Results (Accuracy) ≤ 40 words F1 all F1 ENG Charniak&Johnson ‘05 (generative) This Work GER Dubey ‘ This Work CHN Chiang et al. ‘ This Work

Extensions  Acoustic modeling  Infinite Grammars  Nonparametric Bayesian Learning [Petrov, Pauls & Klein ‘07] [Liang, Petrov, Jordan & Klein ‘07]

Conclusions  Split & Merge Learning  Hierarchical Training  Adaptive Splitting  Parameter Smoothing  Hierarchical Coarse-to-Fine Inference  Projections  Marginalization  Multi-lingual Unlexicalized Parsing

Thank You!