Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.

Slides:

Advertisements

Similar presentations

Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.

Advertisements

Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.

SEMANTIC ROLE LABELING BY TAGGING SYNTACTIC CHUNKS

Albert Gatt Corpora and Statistical Methods Lecture 11.

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.

Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.

Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.

CS 388: Natural Language Processing: Statistical Parsing

Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.

Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,

Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.

A Survey of Unsupervised Grammar Induction Baskaran Sankaran Senior Supervisor: Dr Anoop Sarkar School of Computing Science Simon Fraser University.

Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.

1 More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006.

Implemented two graph partition algorithms 1. Kernighan/Lin Algorithm Input: $network, a Clair::Network object Produce bi-partition to undirected weighted.

1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.

Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.

Approximate Factoring for A* Search Aria Haghighi, John DeNero, and Dan Klein Computer Science Division University of California Berkeley.

CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.

March 2006 CLINT-CS 1 Introduction to Computational Linguistics Chunk Parsing.

Statistical NLP Winter 2009 Lecture 17: Unsupervised Learning IV Tree-structured learning Roger Levy [thanks to Jason Eisner and Mike Frank for some slides]

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.

Learning PCFGs: Estimating Parameters, Learning Grammar Rules Many slides are taken or adapted from slides by Dan Klein.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.

Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short.

Ling 570 Day 17: Named Entity Recognition Chunking.

SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa

Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.

1 CS546: Machine Learning and Natural Language Latent-Variable Models for Structured Prediction Problems: Syntactic Parsing Slides / Figures from Slav.

Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.

Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages Dan Garrette, Jason Mielens, and Jason Baldridge Proceedings of ACL 2013.

CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 17: Parsing Ambiguity, Probabilistic Parsing, sample seminar 17.

Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.

Conversion of Penn Treebank Data to Text. Penn TreeBank Project “A Bank of Linguistic Trees” (as of 11/1992) University of Pennsylvania, LINC Laboratory.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.

CSA2050 Introduction to Computational Linguistics Parsing I.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.

Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.

NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.

Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.

John Lafferty Andrew McCallum Fernando Pereira

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

1 Semi-Supervised Approaches for Learning to Parse Natural Languages Slides are from Rebecca Hwa, Ray Mooney.

DERIVATION S RULES USEDPROBABILITY P(s) = Σ j P(T,S) where t is a parse of s = Σ j P(T) P(T) – The probability of a tree T is the product.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)

Part-of-Speech Tagging & Sequence Labeling Hongning Wang

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.

Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,

Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.

Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:

Question Classification Ling573 NLP Systems and Applications April 25, 2013.

Statistical Natural Language Parsing Parsing: The rise of data and statistics.

Statistical NLP Winter 2009

CS 388: Natural Language Processing: Statistical Parsing

Prototype-Driven Learning for Sequence Models

CS : Speech, NLP and the Web/Topics in AI

CS : Language Technology for the Web/Natural Language Processing

Presentation transcript:

Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley

Grammar Induction DT NN VBD DT NN IN NN The screen was a sea of red

First Attempt DT NN VBD DT NN IN NN The screen was a sea of red

Central Questions How do we specify what we want to learn? How do we fix observed errors? What’s an NP? That’s not quite it!

Experimental Set-up Binary Grammar { X 1, X 2, … X n } plus POS tags Data WSJ-10 [7k sentences] Evaluate on Labeled F 1 Grammar Upper Bound: 86.1 XjXj X i XkXk

Experiment Roadmap Unconstrained Induction Need bracket constraint! Gold Bracket Induction Prototypes and Similarity CCM Bracket Induction PhrasePrototype NPDT NN PPIN DT NN

Unconstrained PCFG Induction Learn PCFG with EM Inside-Outside Algorithm Lari & Young [93] Results 0 i j n  (Inside)  (Outside)

Gold Brackets Periera & Schables [93] Result Constrained PCFG Induction

Encoding Knowledge What’s an NP? Semi-Supervised Learning

Encoding Knowledge What’s an NP? For instance, DT NN JJ NNS NNP Prototype Learning

Add Prototypes Manually constructed Grammar Induction Experiments

How to use prototypes? ? ? ? ? DT The NN koala VBD sat IN in DT the NN tree ¦¦ ? NP PP VP S

How to use prototypes? NP PP VP DT The NN koala VBD sat IN in DT the NN tree ¦ ¦ S JJ hungry ?NP

Distributional Similarity Context Distribution  (DT JJ NN) = { ¦ __ VBD : 0.3, VBD __ ¦ : 0.2, IN __ VBD: 0.1, ….. } Similarity  (DT NN)  (DT JJ NN)  (JJ NNS)  (NNP NNP) NP

Distributional Similarity Prototype Approximation  (NP) ¼ Uniform (  (DT NN),  (JJ NNS),  (NNP NNP) ) Prototype Similarity Feature span(DT JJ NN) emits proto=NP span(MD NNS) emits proto=NONE

Prototype CFG + Model NP PP VP DT The NN koala VBD sat IN in DT the NN tree ¦ ¦ S JJ hungry NP P (DT NP | NP) P (proto=NP | NP)

Prototype CFG+ Induction Experimental Set-Up BLIPP corpus Gold Brackets Results

Summary So Far Bracket constraint and prototypes give good performance!

Constituent-Context Model

Product Model Different Aspects of Syntax CCM : Yield and Context properties CFG : Hierarchical properties Intersected EM [Klein 2005] Encourages mass on trees compatible with CCM and CFG

Grammar Induction Experiments Intersected CFG and CCM No prototypes Results

Grammar Induction Experiments Intersected CFG + and CCM Add Prototypes Results

Reacting to Errors Possessive NPs Our Tree Correct Tree

Reacting to Errors Add Prototype: NP-POS NN POS New Analysis

Error Analysis Modal VPs Our TreeCorrect Tree

Reacting to Errors Add Prototype: VP-INF VB NN New Analysis

Fixing Errors Supplement Prototypes NP-POS and VP-INF Results

Results Summary

Conclusion Prototype-Driven Learning Flexible Weakly Supervised Framework Merged distributional clustering techniques with supervised structured models

Thank You!

Unconstrained PCFG Induction Binary Grammar { X 1, X 2, … X n } Learn PCFG with EM Inside-Outside Algorithm Lari & Young [93] 0 i j n  (Inside)  (Outside) V XjXj X i XkXk N XkXk XjXj V N