Probabilistic and Lexicalized Parsing CS 4705. Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Slides:



Advertisements
Similar presentations
Albert Gatt Corpora and Statistical Methods Lecture 11.
Advertisements

Probabilistic Earley Parsing Charlie Kehoe, Spring 2004 Based on the 1995 paper by Andreas Stolcke: An Efficient Probabilistic Context-Free Parsing Algorithm.
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.
Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05.
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.
1/13 Parsing III Probabilistic Parsing and Conclusions.
Features and Unification
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
Fall 2004 Lecture Notes #5 EECS 595 / LING 541 / SI 661 Natural Language Processing.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
1 Basic Parsing with Context- Free Grammars Slides adapted from Dan Jurafsky and Julia Hirschberg.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)
SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.
1 Natural Language Processing Lecture Notes 11 Chapter 15 (part 1)
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
Albert Gatt Corpora and Statistical Methods Lecture 11.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Motivation –A lot of the work is repeated –Caching intermediate results improves the complexity Dynamic programming –Building.
CPSC 503 Computational Linguistics
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
CPSC 422, Lecture 28Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 28 Nov, 18, 2015.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
Speech and Language Processing SLP Chapter 13 Parsing.
Chapter 12: Probabilistic Parsing and Treebanks Heshaam Faili University of Tehran.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Probabilistic and Lexicalized Parsing. Probabilistic CFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations –Use weights to.
Natural Language Processing Vasile Rus
CSC 594 Topics in AI – Natural Language Processing
CS60057 Speech &Natural Language Processing
Probabilistic and Lexicalized Parsing
CS 388: Natural Language Processing: Syntactic Parsing
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
Parsing and More Parsing
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
David Kauchak CS159 – Spring 2019
CPSC 503 Computational Linguistics
Presentation transcript:

Probabilistic and Lexicalized Parsing CS 4705

Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations –Use weights to choose preferred parses Utility: Pruning and ordering the search space, disambiguate, Language Model for ASR Parsing with weighted grammars: find the parse T’ which maximizes the weights of the derivations in the parse tree for all the possible parses of S T’(S) = argmax T ∈ τ(S) W(T,S) Probabilistic CFGs are one form of weighted CFGs

Rule Probability Attach probabilities to grammar rules Expansions for a given non-terminal sum to 1 R1: VP  V.55 R2: VP  V NP.40 R3: VP  V NP NP.05 Estimate probabilities from annotated corpora –E.g. Penn Treebank –P(R1)=counts(R1)/counts(VP)

Derivation Probability For a derivation T= {R 1 …R n }: –Probability of the derivation: Product of probabilities of rules expanded in tree –Most likely probable parse: –Probability of a sentence: Sum over all possible derivations for the sentence Note the independence assumption: Parse probability does not change based on where the rule is expanded.

One Approach: CYK Parser Bottom-up parsing via dynamic programming –Assign probabilities to constituents as they are completed and placed in a table –Use the maximum probability for each constituent type going up the tree to S The Intuition: –We know probabilities for constituents lower in the tree, so as we construct higher level constituents we don’t need to recompute these

CYK (Cocke-Younger-Kasami) Parser Bottom-up parser with top-down filtering Uses dynamic programming to store intermediate results (cf. Earley algorithm for top-down case) Input: PCFG in Chomsky Normal Form –Rules of form A  w or A  BC; no ε Chart: array [i,j,A] to hold probability that non-terminal A spans input i-j –Start State(s): (i,i+1,A) for each A  w i+1 –End State: (1,n,S) where n is the input size –Next State Rules: (i,k,B) (k,j,C)  (i,j,A) if A  BC Maintain back-pointers to recover the parse

Structural Ambiguity S  NP VP VP  V NP NP  NP PP VP  VP PP PP  P NP NP  John | Mary | Denver V -> called P -> from John called Mary from Denver S VP PP NP VP VNP P John called Mary from Denver S NP VP VNP PP P John called Mary fromDenver NP

Example JohncalledMaryfromDenver

Base Case: A  w NP PDenver NPfrom VMary NPcalled John

Recursive Cases: A  BC NP PDenver NPfrom XVMary NPcalled John

NP PDenver VPNPfrom XVMary NPcalled John

NP XPDenver VPNPfrom XVMary NPcalled John

PPNP XPDenver VPNPfrom XVMary NPcalled John

PPNP XPDenver SVPNPfrom VMary NPcalled John

PPNP XXPDenver SVPNPfrom XVMary NPcalled John

NPPPNP XPDenver SVPNPfrom XVMary NPcalled John

NPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

VPNPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

VPNPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

VP 1 VP 2 NPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

SVP 1 VP 2 NPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

SVPNPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

Problems with PCFGs Probability model just based on rules in the derivation. Lexical insensitivity: –Doesn’t use words in any real way –But structural disambiguation is lexically driven PP attachment often depends on the verb, its object, and the preposition I ate pickles with a fork. I ate pickles with relish. Context insensitivity of the derivation –Doesn’t take into account where in the derivation a rule is used Pronouns more often subjects than objects She hates Mary. Mary hates her. Solution: Lexicalization –Add lexical information to each rule –I.e. Condition the rule probabilities on the actual words

An example: Phrasal Heads Phrasal heads can ‘take the place of’ whole phrases, defining most important characteristics of the phrase Phrases generally identified by their heads –Head of an NP is a noun, of a VP is the main verb, of a PP is preposition Each PFCG rule’s LHS shares a lexical item with a non-terminal in its RHS

Increase in Size of Rule Set in Lexicalized CFG If R is the number of binary branching rules in CFG and ∑ is the lexicon, O(2*|∑|*|R|) For unary rules: O(|∑|*|R|)

Example (correct parse) Attribute grammar

Example (less preferred)

Computing Lexicalized Rule Probabilities We started with rule probabilities as before –VP  V NP PP P(rule|VP) E.g., count of this rule divided by the number of VPs in a treebank Now we want lexicalized probabilities –VP(dumped)  V(dumped) NP(sacks) PP(into) i.e., P(rule|VP ^ dumped is the verb ^ sacks is the head of the NP ^ into is the head of the PP) –Not likely to have significant counts in any treebank

Exploit the Data You Have So, exploit the independence assumption and collect the statistics you can… Focus on capturing –Verb subcategorization Particular verbs have affinities for particular VPs –Objects’ affinity for their predicates Mostly their mothers and grandmothers Some objects fit better with some predicates than others

Verb Subcategorization Condition particular VP rules on their heads –E.g. for a rule r VP -> V NP PP P(r|VP) becomes P(r ^ V=dumped | VP ^ dumped) –How do you get the probability? How many times was rule r used with dumped, divided by the number of VPs that dumped appears in, in total How predictive of r is the verb dumped? –Captures affinity between VP heads (verbs) and VP rules

Example (correct parse)

Example (less preferred)

Affinity of Phrasal Heads for Other Heads: PP Attachment Verbs with preps vs. Nouns with preps E.g. dumped with into vs. sacks with into –How often is dumped the head of a VP which includes a PP daughter with into as its head relative to other PP heads or… what’s P(into|PP,dumped is mother VP’s head)) –Vs…how often is sacks the head of an NP with a PP daughter whose head is into relative to other PP heads or… P(into|PP,sacks is mother’s head))

But Other Relationships do Not Involve Heads (Hindle & Rooth ’91) Affinity of gusto for eat is greater than for spaghetti; and affinity of marinara for spaghetti is greater than for ate Vp (ate) Pp(with) Np(spag) np v v Ate spaghetti with marinara Ate spaghetti with gusto np

Log-linear models for Parsing Why restrict to the conditioning to the elements of a rule? –Use even larger context…word sequence, word types, sub-tree context etc. Compute P(y|x); where f i (x,y) tests properties of context and i is weight of feature Use as scores in CKY algorithm to find best parse

Supertagging: Almost parsing Poachers now control the underground trade NP N poachers N NN trade S NP VP V NP N poachers  :::: S SAdv now VP Adv now VP AdvVP now :::: S S VP V NP control S NP VP V NP control S NP VP V NP control  S NP Det the NP N trade N NN poachers S NP VP V NP N trade  N NAdj underground S NP VP V NP Adj underground  S NP VP V NP Adj underground  S NP  :

Summary Parsing context-free grammars –Top-down and Bottom-up parsers –Mixed approaches (CKY, Earley parsers) Preferences over parses using probabilities –Parsing with PCFG and PCKY algorithms Enriching the probability model –Lexicalization –Log-linear models for parsing –Super-tagging