PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

Slides:



Advertisements
Similar presentations
Albert Gatt Corpora and Statistical Methods Lecture 11.
Advertisements

Probabilistic and Lexicalized Parsing CS Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05.
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.
1/13 Parsing III Probabilistic Parsing and Conclusions.
Features and Unification
1/17 Probabilistic Parsing … and some other approaches.
Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
1 Basic Parsing with Context- Free Grammars Slides adapted from Dan Jurafsky and Julia Hirschberg.
Hand video 
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Spring /22/071 Beyond PCFGs Chris Brew Ohio State University.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)
SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
Albert Gatt Corpora and Statistical Methods Lecture 11.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Motivation –A lot of the work is repeated –Caching intermediate results improves the complexity Dynamic programming –Building.
Quick Speech Synthesis CMSC Natural Language Processing April 29, 2003.
CPSC 503 Computational Linguistics
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.
Supertagging CMSC Natural Language Processing January 31, 2006.
CPSC 422, Lecture 28Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 28 Nov, 18, 2015.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Slides are from Rebecca Hwa, Ray Mooney.
October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
/02/20161 Probabilistic Context Free Grammars Chris Brew Ohio State University.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Syntactic Parsing: Part I Niranjan Balasubramanian Stony Brook University March 8 th and Mar 22 nd, 2016 Many slides adapted from: Ray Mooney, Michael.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.
Chapter 12: Probabilistic Parsing and Treebanks Heshaam Faili University of Tehran.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Natural Language Processing Vasile Rus
CSC 594 Topics in AI – Natural Language Processing
Basic Parsing with Context Free Grammars Chapter 13
Probabilistic CKY Parser
Probabilistic and Lexicalized Parsing
LING/C SC 581: Advanced Computational Linguistics
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
Parsing and More Parsing
CPSC 503 Computational Linguistics
Parsing I: CFGs & the Earley Parser
CPSC 503 Computational Linguistics
David Kauchak CS159 – Spring 2019
Presentation transcript:

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011

Roadmap Parsing PCGFs: Probabilistic CKY parsing Evaluation Parseval Issues: Positional and lexical independence assumptions Improvements: Lexicalization: PLCFGs

Parsing Problem for PCFGs Select T such that: String of words S is yield of parse tree over S Select tree that maximizes probability of parse Extend existing algorithms: CKY & Earley Most modern PCFG parsers based on CKY Augmented with probabilities

Probabilistic CKY Like regular CKY Assume grammar in Chomsky Normal Form (CNF) Productions: A -> B C or A -> w Represent input with indices b/t words E.g., 0 Book 1 that 2 flight 3 through 4 Houston 5 For input string length n and non-terminals V Cell[I,j,A] in (n+1)x(n+1)xV matrix contains Probability that constituent A spans [i,j]

Probabilistic CKY Algorithm

PCKY Grammar Segment

PCKY Matrix: The flight includes a meal Det: 0.4 [0,1]

PCKY Matrix: The flight includes a meal Det: 0.4 [0,1] N: 0.2 [1,2]

PCKY Matrix: The flight includes a meal Det: 0.4 [0,1] NP: 0.3*0.4*0.2= [0,2] N: 0.2 [1,2]

PCKY Matrix: The flight includes a meal Det: 0.4 [0,1] NP: 0.3*0.4*0.2= [0,2] N: 0.2 [1,2] V: 0.05 [2,3]

PCKY Matrix: The flight includes a meal Det: 0.4 [0,1] NP: 0.3*0.4*0.2= [0,2] N: 0.2 [1,2][1,3] V: 0.05 [2,3]

PCKY Matrix: The flight includes a meal Det: 0.4 [0,1] NP: 0.3*0.4*0.2= [0,2][0,3] N: 0.2 [1,2][1,3] V: 0.05 [2,3]

PCKY Matrix: The flight includes a meal Det: 0.4 [0,1] NP: 0.3*0.4*0.2= [0,2][0,3] N: 0.2 [1,2][1,3] V: 0.05 [2,3] Det: 0.4 [3,4]

PCKY Matrix: The flight includes a meal Det: 0.4 [0,1] NP: 0.3*0.4*0.2= [0,2][0,3] N: 0.2 [1,2][1,3] V: 0.05 [2,3][2,4] Det: 0.4 [3,4]

PCKY Matrix: The flight includes a meal Det: 0.4 [0,1] NP: 0.3*0.4*0.2= [0,2][0,3] N: 0.2 [1,2][1,3][1,4] V: 0.05 [2,3][2,4] Det: 0.4 [3,4]

PCKY Matrix: The flight includes a meal Det: 0.4 [0,1] NP: 0.3*0.4*0.2= [0,2][0,3][0,4] N: 0.2 [1,2][1,3][1,4] V: 0.05 [2,3][2,4] Det: 0.4 [3,4]

PCKY Matrix: The flight includes a meal Det: 0.4 [0,1] NP: 0.3*0.4*0.02=.0024 [0,2][0,3][0,4] N: 0.2 [1,2][1,3][1,4] V: 0.05 [2,3][2,4] Det: 0.4 [3,4] N: 0.01 [4,5]

PCKY Matrix: The flight includes a meal Det: 0.4 [0,1] NP: 0.3*0.4*0.02=.0024 [0,2][0,3][0,4] N: 0.2 [1,2][1,3][1,4] V: 0.05 [2,3][2,4] Det: 0.4 [3,4] NP: 0.3*0.4*0.01= [3,5] N: 0.01 [4,5]

PCKY Matrix: The flight includes a meal Det: 0.4 [0,1] NP: 0.3*0.4*0.02=.0024 [0,2][0,3][0,4] N: 0.2 [1,2][1,3][1,4] V: 0.05 [2,3][2,4] VP: 0.2*0.05* = [2,5] Det: 0.4 [3,4] NP: 0.3*0.4*0.01= [3,5] N: 0.01 [4,5]

PCKY Matrix: The flight includes a meal Det: 0.4 [0,1] NP: 0.3*0.4*0.02=.0024 [0,2][0,3][0,4] S: * [0.5] N: 0.2 [1,2][1,3][1,4][1,5] V: 0.05 [2,3][2,4] VP: 0.2*0.05* = [2,5] Det: 0.4 [3,4] NP: 0.3*0.4*0.01= [3,5] N: 0.01 [4,5]

Probabilistic Parser Development Paradigm Training: (Large) Set of sentences with associated parses (Treebank) E.g., Wall Street Journal section of Penn Treebank, sec ,830 sentences Used to estimate rule probabilities

Probabilistic Parser Development Paradigm Training: (Large) Set of sentences with associated parses (Treebank) E.g., Wall Street Journal section of Penn Treebank, sec ,830 sentences Used to estimate rule probabilities Development (dev): (Small) Set of sentences with associated parses (WSJ, 22) Used to tune/verify parser; check for overfitting, etc.

Probabilistic Parser Development Paradigm Training: (Large) Set of sentences with associated parses (Treebank) E.g., Wall Street Journal section of Penn Treebank, sec ,830 sentences Used to estimate rule probabilities Development (dev): (Small) Set of sentences with associated parses (WSJ, 22) Used to tune/verify parser; check for overfitting, etc. Test: (Small-med) Set of sentences w/parses (WSJ, 23) 2416 sentences Held out, used for final evaluation

Parser Evaluation Assume a ‘gold standard’ set of parses for test set How can we tell how good the parser is? How can we tell how good a parse is?

Parser Evaluation Assume a ‘gold standard’ set of parses for test set How can we tell how good the parser is? How can we tell how good a parse is? Maximally strict: identical to ‘gold standard’

Parser Evaluation Assume a ‘gold standard’ set of parses for test set How can we tell how good the parser is? How can we tell how good a parse is? Maximally strict: identical to ‘gold standard’ Partial credit:

Parser Evaluation Assume a ‘gold standard’ set of parses for test set How can we tell how good the parser is? How can we tell how good a parse is? Maximally strict: identical to ‘gold standard’ Partial credit: Constituents in output match those in reference Same start point, end point, non-terminal symbol

Parseval How can we compute parse score from constituents? Multiple measures: Labeled recall (LR): # of correct constituents in hyp. parse # of constituents in reference parse

Parseval How can we compute parse score from constituents? Multiple measures: Labeled recall (LR): # of correct constituents in hyp. parse # of constituents in reference parse Labeled precision (LP): # of correct constituents in hyp. parse # of total constituents in hyp. parse

Parseval (cont’d) F-measure: Combines precision and recall F1-measure: β=1 Crossing-brackets: # of constituents where reference parse has bracketing ((A B) C) and hyp. has (A (B C))

Precision and Recall Gold standard (S (NP (A a) ) (VP (B b) (NP (C c)) (PP (D d)))) Hypothesis (S (NP (A a)) (VP (B b) (NP (C c) (PP (D d)))))

Precision and Recall Gold standard (S (NP (A a) ) (VP (B b) (NP (C c)) (PP (D d)))) Hypothesis (S (NP (A a)) (VP (B b) (NP (C c) (PP (D d))))) G: S(0,4) NP(0,1) VP (1,4) NP (2,3) PP(3,4)

Precision and Recall Gold standard (S (NP (A a) ) (VP (B b) (NP (C c)) (PP (D d)))) Hypothesis (S (NP (A a)) (VP (B b) (NP (C c) (PP (D d))))) G: S(0,4) NP(0,1) VP (1,4) NP (2,3) PP(3,4) H: S(0,4) NP(0,1) VP (1,4) NP (2,4) PP(3,4)

Precision and Recall Gold standard (S (NP (A a) ) (VP (B b) (NP (C c)) (PP (D d)))) Hypothesis (S (NP (A a)) (VP (B b) (NP (C c) (PP (D d))))) G: S(0,4) NP(0,1) VP (1,4) NP (2,3) PP(3,4) H: S(0,4) NP(0,1) VP (1,4) NP (2,4) PP(3,4) LP: 4/5

Precision and Recall Gold standard (S (NP (A a) ) (VP (B b) (NP (C c)) (PP (D d)))) Hypothesis (S (NP (A a)) (VP (B b) (NP (C c) (PP (D d))))) G: S(0,4) NP(0,1) VP (1,4) NP (2,3) PP(3,4) H: S(0,4) NP(0,1) VP (1,4) NP (2,4) PP(3,4) LP: 4/5 LR: 4/5

Precision and Recall Gold standard (S (NP (A a) ) (VP (B b) (NP (C c)) (PP (D d)))) Hypothesis (S (NP (A a)) (VP (B b) (NP (C c) (PP (D d))))) G: S(0,4) NP(0,1) VP (1,4) NP (2,3) PP(3,4) H: S(0,4) NP(0,1) VP (1,4) NP (2,4) PP(3,4) LP: 4/5 LR: 4/5 F1: 4/5

State-of-the-Art Parsing Parsers trained/tested on Wall Street Journal PTB LR: 90%; LP: 90%; Crossing brackets: 1% Standard implementation of Parseval: evalb

Evaluation Issues Constituents?

Evaluation Issues Constituents? Other grammar formalisms LFG, Dependency structure,.. Require conversion to PTB format

Evaluation Issues Constituents? Other grammar formalisms LFG, Dependency structure,.. Require conversion to PTB format Extrinsic evaluation How well does this match semantics, etc?

Parser Issues PCFGs make many (unwarranted) independence assumptions Structural Dependency NP -> Pronoun: much more likely in subject position Lexical Dependency Verb subcategorization Coordination ambiguity

Improving PCFGs: Structural Dependencies How can we capture Subject/Object asymmetry? E.g., NP subj -> Pron vs NP Obj ->Pron Parent annotation: Annotate each node with parent in parse tree E.g., NP^S vs NP^VP Also annotate pre-terminals: RB^ADVP vs RB^VP IN^SBAR vs IN^PP Can also split rules on other conditions

Parent Annotation

Parent Annotation: Pre-terminals

Parent Annotation Advantages: Captures structural dependency in grammars

Parent Annotation Advantages: Captures structural dependency in grammars Disadvantages: Increases number of rules in grammar

Parent Annotation Advantages: Captures structural dependency in grammars Disadvantages: Increases number of rules in grammar Decreases amount of training per rule Strategies to search for optimal # of rules

Improving PCFGs: Lexical Dependencies Lexicalized rules: Best known parsers: Collins, Charniak parsers Each non-terminal annotated with its lexical head E.g. verb with verb phrase, noun with noun phrase Each rule must identify RHS element as head Heads propagate up tree Conceptually like adding 1 rule per head value VP(dumped) -> VBD(dumped)NP(sacks)PP(into) VP(dumped) -> VBD(dumped)NP(cats)PP(into)

Lexicalized PCFGs Also, add head tag to non-terminals Head tag: Part-of-speech tag of head word VP(dumped) -> VBD(dumped)NP(sacks)PP(into) VP(dumped,VBD) -> VBD(dumped,VBD)NP(sacks,NNS)PP(into,IN) Two types of rules: Lexical rules: pre-terminal -> word Deterministic, probability 1 Internal rules: all other expansions Must estimate probabilities

PLCFGs Issue:

PLCFGs Issue: Too many rules No way to find corpus with enough examples

PLCFGs Issue: Too many rules No way to find corpus with enough examples (Partial) Solution: Independence assumed Condition rule on Category of LHS, head Condition head on Category of LHS and parent’s head

PLCFGs Issue: Too many rules No way to find corpus with enough examples (Partial) Solution: Independence assumed Condition rule on Category of LHS, head Condition head on Category of LHS and parent’s head

Disambiguation Example