Albert Gatt Corpora and Statistical Methods Lecture 11.

Slides:



Advertisements
Similar presentations
Albert Gatt Corpora and Statistical Methods Lecture 11.
Advertisements

Probabilistic and Lexicalized Parsing CS Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.
1/13 Parsing III Probabilistic Parsing and Conclusions.
Features and Unification
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
1/17 Probabilistic Parsing … and some other approaches.
1 CONTEXT-FREE GRAMMARS. NLE 2 Syntactic analysis (Parsing) S NPVP ATNNSVBD NP AT NNthechildrenate thecake.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
1 Basic Parsing with Context- Free Grammars Slides adapted from Dan Jurafsky and Julia Hirschberg.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
Some Probability Theory and Computational models A short overview.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.
CSA2050 Introduction to Computational Linguistics Parsing I.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
DERIVATION S RULES USEDPROBABILITY P(s) = Σ j P(T,S) where t is a parse of s = Σ j P(T) P(T) – The probability of a tree T is the product.
/02/20161 Probabilistic Context Free Grammars Chris Brew Ohio State University.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
1 Statistical methods in NLP Course 5 Diana Trandab ă ț
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Chapter 12: Probabilistic Parsing and Treebanks Heshaam Faili University of Tehran.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Probabilistic and Lexicalized Parsing. Probabilistic CFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations –Use weights to.
Natural Language Processing Vasile Rus
CSC 594 Topics in AI – Natural Language Processing
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 26
CPSC 503 Computational Linguistics
David Kauchak CS159 – Spring 2019
CPSC 503 Computational Linguistics
Probabilistic Parsing
CPSC 503 Computational Linguistics
David Kauchak CS159 – Spring 2019
CPSC 503 Computational Linguistics
Presentation transcript:

Albert Gatt Corpora and Statistical Methods Lecture 11

Probabilistic Context-Free Grammars and beyond Part 1

Context-free grammars: reminder Many NLP parsing applications rely on the CFG formalism Definition: CFG is a 4-tuple: (N, Σ,P,S): N = a set of non-terminal symbols (e.g. NP, VP) Σ = a set of terminals (e.g. words) N and Σ are disjoint P = a set of productions of the form A  β A Є N β Є (N U Σ )* (any string of terminals and non-terminals) S = a designated start symbol (usually, “sentence”)

CFG Example S  NP VP S  Aux NP VP NP  Det Nom NP  Proper-Noun Det  that | the | a …

Probabilistic CFGs A CFG where each production has an associated probability PCFG is a 5-tuple: (N, Σ,P,S, D): D: P -> [0,1] a function assigning each rule in P a probability usually, probabilities are obtained from a corpus most widely used corpus is the Penn Treebank

The Penn Treebank English sentences annotated with syntax trees built at the University of Pennsylvania 40,000 sentences, about a million words text from the Wall Street Journal Other treebanks exist for other languages (e.g. NEGRA for German)

Example tree

Building a tree: rules S  NP VP NP  NNP NNP NNP  Mr NNP  Vinken … S NP NNP VinkenMr VP NP VBZ PPNP NN is chairman IN NN NNPof Elsevier

Characteristics of PCFGs In a PCFG, the probability P(A  β ) expresses the likelihood that the non- terminal A will expand as β. e.g. the likelihood that S  NP VP (as opposed to S  VP, or S  NP VP PP, or… ) can be interpreted as a conditional probability: probability of the expansion, given the LHS non-terminal P(A  β ) = P(A  β |A) Therefore, for any non-terminal A, probabilities of every rule of the form A  β must sum to 1 If this is the case, we say the PCFG is consistent

Uses of probabilities in parsing Disambiguation: given n legal parses of a string, which is the most likely? e.g. PP-attachment ambiguity can be resolved this way Speed: parsing is a search problem search through space of possible applicable derivations search space can be pruned by focusing on the most likely sub-parses of a parse Parser can be used as a model to determine the probability of a sentence, given a parse typical use in speech recognition, where input utterance can be “heard” as several possible sentences

Using PCFG probabilities PCFG assigns a probability to every parse-tree t of a string W e.g. every possible parse (derivation) of a sentence recognised by the grammar Notation: G = a PCFG s = a sentence t = a particular tree under our grammar t consists of several nodes n each node is generated by applying some rule r

Probability of a tree vs. a sentence simply the multiplication of the probability of every rule (node) that gives rise to t (i.e. the derivation of t) this is both the joint probability of t and s, and the probability of t alone why?

P(t,s) = P(t) But P(s|t) must be 1, since the tree t is a parse of all the words of s

Picking the best parse in a PCFG A sentence will usually have several parses we usually want them ranked, or only want the n-best parses we need to focus on P(t|s,G) probability of a parse, given our sentence and our grammar definition of the best parse for s:

Picking the best parse in a PCFG Problem: t can have multiple derivations e.g. expand left-corner nodes first, expand right-corner nodes first etc so P(t|s,G) should be estimated by summing over all possible derivations Fortunately, derivation order makes no difference to the final probabilities. can assume a “canonical derivation” d of t P(t) = def P(d)

Probability of a sentence Simply the sum of probabilities of all parses of that sentence since s is only a sentence if it’s recognised by G, i.e. if there is some t for s under G all those trees which “yield” s

Flaws I: Structural independence Probability of a rule r expanding node n depends only on n. Independent of other non-terminals Example: P(NP  Pro) is independent of where the NP is in the sentence but we know that NP  Pro is much more likely in subject position Francis et al (1999) using the Switchboard corpus: 91% of subjects are pronouns; only 34% of objects are pronouns

Flaws II: lexical independence vanilla PCFGs ignore lexical material e.g. P(VP  V NP PP) independent of the head of NP or PP or lexical head V Examples: prepositional phrase attachment preferences depend on lexical items; cf: dump [sacks into a bin] dump [sacks] [into a bin] (preferred parse) coordination ambiguity: [dogs in houses] and [cats] [dogs] [in houses and cats]

Weakening the independence assumptions in PCFGs

Lexicalised PCFGs Attempt to weaken the lexical independence assumption. Most common technique: mark each phrasal head (N,V, etc) with the lexical material this is based on the idea that the most crucial lexical dependencies are between head and dependent E.g.: Charniak 1997, Collins 1999

Lexicalised PCFGs: Matt walks Makes probabilities partly dependent on lexical content. P(VP  VBD|VP) becomes: P(VP  VBD|VP, h(VP)=walk) NB: normally, we can’t assume that all heads of a phrase of category C are equally probable. S(walks) NP(Matt) NNP(Matt) Matt VP(walk) VBD(walk) walks

Practical problems for lexicalised PCFGs data sparseness: we don’t necessarily see all heads of all phrasal categories often enough in the training data flawed assumptions: lexical dependencies occur elsewhere, not just between head and complement I got the easier problem of the two to solve of the two and to solve become more likely because of the prehead modifier easier

Structural context The simple way: calculate p(t|s,G) based on rules in the canonical derivation d of t assumes that p(t) is independent of the derivation could condition on more structural context but then we could lose the notion of a canonical derivation, i.e. P(t) could really depend on the derivation!

Structural context: probability of a derivation history How to calculate P(t) based on a derivation d? Observation: (probability that a sequence of m rewrite rules in a derivation yields s) can use the chain rule for multiplication

Approach 2: parent annotation Annotate each node with its parent in the parse tree. E.g. if NP has parent S, then rename NP to NP^S Can partly account for dependencies such as subject-of (NP^S is a subject, NP^VP is an object) S(walks) NP^S NNP^NP Matt VP^S VBD^VP walks

The main point Many different parsing approaches differ on what they condition their probabilities on

Other grammar formalisms

Phrase structure vs. Dependency grammar PCFGs are in the tradition of phrase-structure grammars Dependency grammar describes syntax in terms of dependencies between words no non-terminals or phrasal nodes only lexical nodes with links between them links are labelled, labels from a finite list

Dependency Grammar GAVE main I him address subj: dat: obj: MY attr:

Dependency grammar Often used now in probabilistic parsing Advantages: directly encode lexical dependencies therefore, disambiguation decisions take lexical material into account directly dependencies are a way of decomposing PSRs and their probability estimates estimating probability of dependencies between 2 words is less likely to lead to data sparseness problems

Summary We’ve taken a tour of PCFGs crucial notion: what the probability of a rule is conditioned on flaws in PCFGs: independence assumptions several proposals to go beyond these flaws dependency grammars are an alternative formalism