December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

Slides:



Advertisements
Similar presentations
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 11.
Probabilistic and Lexicalized Parsing CS Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.
1/13 Parsing III Probabilistic Parsing and Conclusions.
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
1/17 Probabilistic Parsing … and some other approaches.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
1 Basic Parsing with Context- Free Grammars Slides adapted from Dan Jurafsky and Julia Hirschberg.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
Albert Gatt Corpora and Statistical Methods Lecture 11.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
Lexicalized and Probabilistic Parsing Read J & M Chapter 12.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.
CSA2050 Introduction to Computational Linguistics Parsing I.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
CPSC 422, Lecture 28Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 28 Nov, 18, 2015.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Chapter 12: Probabilistic Parsing and Treebanks Heshaam Faili University of Tehran.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Probabilistic and Lexicalized Parsing. Probabilistic CFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations –Use weights to.
Natural Language Processing Vasile Rus
1 Statistical methods in NLP Diana Trandabat
CSC 594 Topics in AI – Natural Language Processing
Natural Language Processing
CS 388: Natural Language Processing: Statistical Parsing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
David Kauchak CS159 – Spring 2019
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
David Kauchak CS159 – Spring 2019
CPSC 503 Computational Linguistics
Presentation transcript:

December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004CSA3050: PCFGs2 Handling Ambiguities The Earley Algorithm is equipped to represent ambiguities efficiently but not to resolve them. Methods available for resolving ambiguities include: –Semantics (choose parse that makes sense). –Statistics: (choose parse that is most likely). Probabilistic context-free grammars (PCFGs) offer a solution.

December 2004CSA3050: PCFGs3 PCFG A PCFG is a 5-tuple (NT,T,P,S,D) where D is a function that assigns probabilities to each rule p  P A PCFG augments each rule with a conditional probability. A  [p] Formally this is the probability of a given expansion given LHS non-terminal.

December 2004CSA3050: PCFGs4 Example PCFG

December 2004CSA3050: PCFGs5 Example PCFG Fragment S  NP VP [.80] S  Aux NP VP [.15] S  VP [.05] Sum of conditional probabilities for a given A  NT = 1 PCFG can be used to estimate probabilities of each parse-tree for sentence S.

December 2004CSA3050: PCFGs6 Probability of a Parse Tree For sentence S, the probability assigned by a PCFG to a parse tree T is given by  P(r(n)) where n is a node of T and r(n) is the production rule that produced n i.e. the product of the probabilities of all the rules r used to expand each node n in T. nTnT

CSA3050: PCFGs Ambiguous Sentence P(TL)= 1.5 x P(TR)= 1.7 x P(S) = 3.2 x 10 -6

December 2004CSA3050: PCFGs8 The Parsing Problem for PCFGs The parsing problem for PCFGs is to produce the most likely parse for a given sentence, i.e. to compute the T spanning the sentence whose probability is maximal. CYK algorithm assumes that grammar is in Chomsky Normal Form: –No  productions –Rules of the form A  B C or A  a

December 2004CSA3050: PCFGs9 CKY Algorithm – Base Case Base case: covering input strings with of length 1 (i.e. individual words). In CNF, probability p has to come from that of corresponding rule A  w [p]

December 2004CSA3050: PCFGs10 CKY Algorithm Recursive Case Recursive case: input strings of length > 1: A  * w ij if and only if there is a rule A  B C and some 1 ≤ k ≤ j such that B derives w ik and C derives w kj In this case P(w ij ) is obtained by multiplying together P(w ik ) and P(w jk ). These probabilities in other parts of table Take max value

December 2004CSA3050: PCFGs11 Probabilistic CKY Algorithm for Sentence of Length n 1.for k := 1 to n do 2. π[k-1,k,A] := P(A → w k ) 3. for i := k-2 downto 0 do 4. for j := k-1 downto i+1 do 5. π[i,j,A] := max [π[i,j,B] * π[j,k,C] * P(A → BC) ] for each A → BC  G 6.return max[π(0,n,S)]

December 2004CSA3050: PCFGs12  w  w  w  w  w  w   i  k i+1  k-1   j

December 2004CSA3050: PCFGs13 Probabilistic Earley Non Probabilistic Completer Procedure Completer( (B -> Z., [j,k]) ) for each (A -> X. B Y, [i,j]) in chart[j] do enqueue( (A -> X B. Y, [i,k]), chart[k] ) Probabilistic Completer Procedure Completer( (B -> Z., [j,k],Pjk) ) for each (A -> X. B Y, [i,j],Pij) in chart[j] do enqueue( (A -> X B. Y, [i,k], Pij*Pjk), chart[k] )

December 2004CSA3050: PCFGs14 Discovery of Probabilities Normal Rules Use a corpus of already parsed sentences. Example: Penn Treebank (Marcus et al 1993) –parse trees for 1M word Brown Corpus –skeleton parsing: partial parse, leaving out the “hard” things (such as PP-attachment) Parse corpus and take statistics. Has to account for ambiguity. P(α→β|α) = C(α→β)/ΣC(α→γ) = C(α→β)/C(α)

December 2004CSA3050: PCFGs15 Penn Treebank Example 1 ((S (NP (NP Pierre Vinken), (ADJP (NP 61 years) old, )) will (VP join (NP the board) (PP as (NP a nonexecutive director)) (NP Nov 29))).)

December 2004CSA3050: PCFGs16 Penn Treebank – Example 2 ( (S (NP (DT The) (NNP Fulton) (NNP County) (NNP Grand) (NNP Jury) ) (VP (VBD said) (NP (NNP Friday) ) (SBAR (-NONE- 0) (S (NP (DT an) (NN investigation) (PP (IN of) (NP (NP (NNP Atlanta) ) (POS 's) (JJ recent) (JJ primary) (NN election) ))) (VP (VBD produced) (NP (OQUOTE OQUOTE) (DT no) (NN evidence) (CQUOTE CQUOTE) (SBAR (IN that) (S (NP (DT any) (NNS irregularities) ) (VP (VBD took) (NP (NN place) )))))))))) (PERIOD PERIOD) )

December 2004CSA3050: PCFGs17 Problems with PCFGs Fundamental Independence Assumptions: A CFG assumes that the expansion of any one non-terminal is independent of the expansion of any other non-terminal. Hence rule probabilities are always multiplied together. The FIA is not always realistic, however.

December 2004CSA3050: PCFGs18 Problems with PCFGs Difficulty in representing dependencies between parse tree nodes. –Structural Dependencies between the expansion of a node N and anything above N in the parse tree, such as M –Lexical Dependencies between the expansion of a node N and occurrences of particular words in text segments dominated by N.

December 2004CSA3050: PCFGs19 Tree Dependencies M p,s N q,r pqrs

December 2004CSA3050: PCFGs20 Structural Dependency By examination of text corpora, it has been shown (Kuno 1972) that there is a strong tendency (c. 2:1) in English and other languages for subject of a sentence to be a pronoun: –She's able to take her baby to work versus –Joanna worked until she had a family whilst the object tends to be a non-pronoun –All the people signed the confessions versus –Some laws prohibit it

December 2004CSA3050: PCFGs21 Expansion sometimes depends on ancestor nodes S Pron NP N hesawMr. Bush VP subject object

December 2004CSA3050: PCFGs22 Dependencies cannot be stated These dependencies could be captured if it were possible to say that the probabilities associated with, e.g. NP → Pron NP → N depend whether NP is subject or object. However, this cannot normally be said in a standard PCFG.

December 2004CSA3050: PCFGs23 Lexical Dependencies Consider sentence: "Moscow sent soldiers into Afghanistan." Suppose grammar includes NP → NP PP VP → VP PP There will typically be 2 parse trees.

December 2004CSA3050: PCFGs24 PP Attachment Ambiguity N V N P N Moscow sent soldiers into Afghanistan VP NP NP NP VP VP PP S NP VP VP NP S N V N P N Moscow sent soldiers into Afghanistan NP PP 67% of PPs attach to NPs33% of PPs attach to VPs

December 2004CSA3050: PCFGs25 PP Attachment Ambiguity N V N P N Moscow sent soldiers from Afghanistan VP NP NP NP VP VP PP S NP VP VP NP S N V N P N Moscow sent soldiers from Afghanistan NP PP 67% of PPs attach to NPs33% of PPs attach to VPs

December 2004CSA3050: PCFGs26 Lexical Properties Raw statistics on the use of these two rules suggest that NP → NP PP (67 %) VP → VP PP (33 % ) In this case the raw statistics are misleading and yield the wrong conclusion. The correct parse should be decided on the basis of the lexical properties of the verb "send into" alone, since we know that the basic pattern for this verb is (NP) send (NP) (PP into ) where the PP into attaches to the VP.

December 2004CSA3050: PCFGs27 Lexicalised PCFGs Basic idea: each syntactic constituent is associated with a head which is a single word. Each non-terminal in a parse tree is annotated with that single word. Michael Collins (1999) Head Driven Statistical Models for NL Parsing PhD Thesis (see author’s website).

December 2004CSA3050: PCFGs28 Lexicalised Tree

December 2004CSA3050: PCFGs29 Generating Lexicalised Parse Trees To generate such a tree, each rule must identify exactly one right hand side constituent to be the head daughter. Then the headword of a node is inherited from the headword of the head daughter. In the case of a lexical item, the head is clearly itself (though the word might undergo minor inflectional modification).

December 2004CSA3050: PCFGs30 Finding the Head Constituent In some cases this is very easy, e.g. NP[N] → Det N(the man) VP[V] → V NP (... asked John) In other cases it isn't PP[?] → P NP (to London) Many modern linguistic theories include a component that define what heads are.

December 2004CSA3050: PCFGs31 Discovery of Probabilities Lexicalised Rules Need to establish individual probabilities of, e.g. VP(dumped) → V(dumped) NP(sacks) PP(into) VP(dumped) → V(dumped) NP(cats) PP(into) VP(dumped) → V(dumped) NP(hats) PP(into) VP(dumped) → V(dumped) NP(sacks) PP(above) Problem – no corpus big enough to train with this number of rules (nearly all the rules would have zero counts). Need to make independence assumptions that allow counts to be clustered. Which independence assumptions?

December 2004CSA3050: PCFGs32 Charniak’s (1997) Approach Normal PCFG: probability is conditioned only on syntactic category, i.e. p(r(n)|c(n)) Charniak’s also conditioned the probability of a given rule expansion on the head of the non- terminal. p(r(n)|c(n),h(n)) N.B. This approach would pool the statistics of all individual rules on previous slide together, i.e. as VP(dumped) → V NP PP

December 2004CSA3050: PCFGs33 Probability of Head Now that we have added heads as a conditioning factor, we must also decide how to compute the probability of a head. The null assumption, that all heads are equally probable, is unrealistic (different verbs have different frequencies of occurrence). Charniak therefore adopted a better assumption: probability of a node n having head h depends on –syntactic category of n and –head of n’s mother.

December 2004CSA3050: PCFGs34 Including Head of Mother So instead of equal probabilities for all heads, we have p(h(n) = word i | c(n), h(m(n))) Relating this to the circled node in our previous figure, we have p(h(n)=sacks| c(n)=NP, h(m(n))=dumped)

December 2004CSA3050: PCFGs35 Probability of a Complete Parse STANDARD PCFG For sentence S, the probability assigned by a PCFG to a parse tree T was given by  P(r(n)) where n is a node of T and r(n) is the production rule that produced n HEAD DRIVEN PCFG To include probability of complete parse P(T) =  p(r(n)|c(n),h(n)) * p(h(n)|c(n), h(m(n))) nTnT nTnT

December 2004CSA3050: PCFGs36 Evaluating Parsers Let A = # correct constituents in candidate parse B = # correct constituents in treebank parse C = # total constituents in candidate parse Labelled Recall = A/B Labelled Precision = A/C