Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.

Slides:

Advertisements

Similar presentations

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)

Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 11.

Probabilistic and Lexicalized Parsing CS Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010.

Probabilistic Earley Parsing Charlie Kehoe, Spring 2004 Based on the 1995 paper by Andreas Stolcke: An Efficient Probabilistic Context-Free Parsing Algorithm.

Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.

March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.

Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.

1/13 Parsing III Probabilistic Parsing and Conclusions.

Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.

Features and Unification

Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.

1/17 Probabilistic Parsing … and some other approaches.

Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.

Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.

Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.

PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.

1 Basic Parsing with Context- Free Grammars Slides adapted from Dan Jurafsky and Julia Hirschberg.

BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

1 Statistical Parsing Chapter 14 October 2012 Lecture #9.

December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

12/06/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Parsing Dr. Jan Hajič CS Dept., Johns Hopkins Univ.

Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.

PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.

Albert Gatt Corpora and Statistical Methods Lecture 11.

Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.

Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

NLP. Introduction to NLP Motivation –A lot of the work is repeated –Caching intermediate results improves the complexity Dynamic programming –Building.

CPSC 503 Computational Linguistics

csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.

Supertagging CMSC Natural Language Processing January 31, 2006.

CPSC 422, Lecture 28Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 28 Nov, 18, 2015.

Natural Language Processing Lecture 15—10/15/2015 Jim Martin.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)

October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.

/02/20161 Probabilistic Context Free Grammars Chris Brew Ohio State University.

PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.

Speech and Language Processing SLP Chapter 13 Parsing.

Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.

Chapter 12: Probabilistic Parsing and Treebanks Heshaam Faili University of Tehran.

Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:

Probabilistic and Lexicalized Parsing. Probabilistic CFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations –Use weights to.

Natural Language Processing Vasile Rus

CSC 594 Topics in AI – Natural Language Processing

Parsing Recommended Reading: Ch th Jurafsky & Martin 2nd edition

Probabilistic and Lexicalized Parsing

LING/C SC 581: Advanced Computational Linguistics

CSCI 5832 Natural Language Processing

Probabilistic and Lexicalized Parsing

CSCI 5832 Natural Language Processing

Probabilistic Parsing

CPSC 503 Computational Linguistics

David Kauchak CS159 – Spring 2019

Presentation transcript:

Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011

Parser Issues PCFGs make many (unwarranted) independence assumptions Structural Dependency NP -> Pronoun: much more likely in subject position Lexical Dependency Verb subcategorization Coordination ambiguity

Improving PCFGs: Structural Dependencies How can we capture Subject/Object asymmetry? E.g., NP subj -> Pron vs NP Obj ->Pron Parent annotation: Annotate each node with parent in parse tree E.g., NP^S vs NP^VP Also annotate pre-terminals: RB^ADVP vs RB^VP IN^SBAR vs IN^PP Can also split rules on other conditions

Parent Annotation

Parent Annotation: Pre-terminals

Parent Annotation Advantages: Captures structural dependency in grammars

Parent Annotation Advantages: Captures structural dependency in grammars Disadvantages: Increases number of rules in grammar

Parent Annotation Advantages: Captures structural dependency in grammars Disadvantages: Increases number of rules in grammar Decreases amount of training per rule Strategies to search for optimal # of rules

Improving PCFGs: Lexical Dependencies Lexicalized rules: Best known parsers: Collins, Charniak parsers Each non-terminal annotated with its lexical head E.g. verb with verb phrase, noun with noun phrase Each rule must identify RHS element as head Heads propagate up tree Conceptually like adding 1 rule per head value VP(dumped) -> VBD(dumped)NP(sacks)PP(into) VP(dumped) -> VBD(dumped)NP(cats)PP(into)

Lexicalized PCFGs Also, add head tag to non-terminals Head tag: Part-of-speech tag of head word VP(dumped) -> VBD(dumped)NP(sacks)PP(into) VP(dumped,VBD) -> VBD(dumped,VBD)NP(sacks,NNS)PP(into,IN) Two types of rules: Lexical rules: pre-terminal -> word Deterministic, probability 1 Internal rules: all other expansions Must estimate probabilities

PLCFGs Issue:

PLCFGs Issue: Too many rules No way to find corpus with enough examples

PLCFGs Issue: Too many rules No way to find corpus with enough examples (Partial) Solution: Independence assumed Condition rule on Category of LHS, head Condition head on Category of LHS and parent’s head

Disambiguation Example

Improving PCFGs: Tradeoffs Tensions: Increase accuracy: Increase specificity E.g. Lexicalizing, Parent annotation, Markovization,etc Increases grammar Increases processing times Increases training data requirements How can we balance?

CNF Factorization & Markovization CNF factorization: Converts n-ary branching to binary branching

CNF Factorization & Markovization CNF factorization: Converts n-ary branching to binary branching Can maintain information about original structure Neighborhood history and parent Issue: Potentially explosive

CNF Factorization & Markovization CNF factorization: Converts n-ary branching to binary branching Can maintain information about original structure Neighborhood history and parent Issue: Potentially explosive If keep all context: 72 -> 10K non-terminals!!!

CNF Factorization & Markovization CNF factorization: Converts n-ary branching to binary branching Can maintain information about original structure Neighborhood history and parent Issue: Potentially explosive If keep all context: 72 -> 10K non-terminals!!! How much context should we keep? What Markov order?

Different Markov Orders

Markovization & Costs (Mohri & Roark 2006)

Efficiency PCKY is Vn 3 Grammar can be huge Grammar can be extremely ambiguous 100s of analyses not unusual, esp. for long sentences However, only care about best parses Others can be pretty bad Can we use this to improve efficiency?

Beam Thresholding Inspired by beam search algorithm Assume low probability partial parses unlikely to yield high probability overall Keep only top k most probably partial parse Retain only k choices per cell For large grammars, could be 50 or 100 For small grammars, 5 or 10

Heuristic Filtering Intuition: Some rules/partial parses are unlikely to end up in best parse. Don’t store those in table.

Heuristic Filtering Intuition: Some rules/partial parses are unlikely to end up in best parse. Don’t store those in table. Exclusions: Low frequency: exclude singleton productions

Heuristic Filtering Intuition: Some rules/partial parses are unlikely to end up in best parse. Don’t store those in table. Exclusions: Low frequency: exclude singleton productions Low probability: exclude constituents x s.t. p(x) <

Heuristic Filtering Intuition: Some rules/partial parses are unlikely to end up in best parse. Don’t store those in table. Exclusions: Low frequency: exclude singleton productions Low probability: exclude constituents x s.t. p(x) < Low relative probability: Exclude x if there exists y s.t. p(y) > 100 * p(x)

Notes on HW#3 Outline: Induce grammar from (small) treebank Implement Probabilistic CKY Evaluate parser Improve parser

Treebank Format Adapted from Penn Treebank Format Rules simplified: Removed traces and other null elements Removed complex tags Reformatted POS tags as non-terminals

Reading the Parses POS unary collapse: (NP_NNP Ontario) was (NP (NNP Ontario)) Binarization: VP -> VP’ PP; VP’ -> VB PP Was VP -> VB PP PP

Start Early!