Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 2.

Slides:



Advertisements
Similar presentations
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
Advertisements

March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.
1 Earley Algorithm Chapter 13.4 October 2009 Lecture #9.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Context-Free Parsing. 2/37 Basic issues Top-down vs. bottom-up Handling ambiguity –Lexical ambiguity –Structural ambiguity Breadth first vs. depth first.
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.
Amirkabir University of Technology Computer Engineering Faculty AILAB Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing Course,
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Syntax LING October 11, 2006 Joshua Tauberer.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 1.
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Cs466(Prasad)L8Norm1 Normal Forms Chomsky Normal Form Griebach Normal Form.
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
1 Natural Language Processing Lecture 11 Efficient Parsing Reading: James Allen NLU (Chapter 6)
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Parsing with Context Free Grammars CSC 9010 Natural Language Processing Paula Matuszek and Mary-Angela Papalaskari This slide set was adapted from: Jim.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Parsing and Syntax. Syntactic Formalisms: Historic Perspective “Syntax” comes from Greek word “syntaxis”, meaning “setting out together or arrangement”
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Albert Gatt Corpora and Statistical Methods Lecture 11.
Parsing Based on presentations from Chris Manning’s course on Statistical Parsing (Stanford)
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Motivation –A lot of the work is repeated –Caching intermediate results improves the complexity Dynamic programming –Building.
Quick Speech Synthesis CMSC Natural Language Processing April 29, 2003.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Syntax Sudeshna Sarkar 25 Aug Top-Down and Bottom-Up Top-down Only searches for trees that can be answers (i.e. S’s) But also suggests trees.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Jan 13 th.
October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.
/02/20161 Probabilistic Context Free Grammars Chris Brew Ohio State University.
Top-down parsing 1. Last Time CYK – Step 1: get a grammar in Chomsky Normal Form – Step 2: Build all possible parse trees bottom-up Start with runs of.
November 2004csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
Speech and Language Processing SLP Chapter 13 Parsing.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
General Information on Context-free and Probabilistic Context-free Grammars İbrahim Hoça CENG784, Fall 2013.
Basic Parsing with Context Free Grammars Chapter 13
CSCI 5832 Natural Language Processing
Parsing and More Parsing
CS : Language Technology for the Web/Natural Language Processing
CSA2050 Introduction to Computational Linguistics
David Kauchak CS159 – Spring 2019
Presentation transcript:

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 2

Attendee information Please put on a piece of paper: Name: Affiliation: “Status” (undergrad, grad, industry, prof, …): Ling/CS/Stats background: What you hope to get out of the course: Whether the course has so far been too fast, too slow, or about right:

Assessment

Phrase structure grammars = context-free grammars G = (T, N, S, R) T is set of terminals N is set of nonterminals For NLP, we usually distinguish out a set P  N of preterminals, which always rewrite as terminals S is the start symbol (one of the nonterminals) R is rules/productions of the form X  , where X is a nonterminal and  is a sequence of terminals and nonterminals (possibly an empty sequence) A grammar G generates a language L.

A phrase structure grammar S  NP VPN  cats VP  V NPN  claws VP  V NP PPN  people NP  NP PPN  scratch NP  NV  scratch NP  eP  with NP  N N PP  P NP By convention, S is the start symbol, but in the PTB, we have an extra node at the top (ROOT, TOP)

Top-down parsing

Bottom-up parsing Bottom-up parsing is data directed The initial goal list of a bottom-up parser is the string to be parsed. If a sequence in the goal list matches the RHS of a rule, then this sequence may be replaced by the LHS of the rule. Parsing is finished when the goal list contains just the start category. If the RHS of several rules match the goal list, then there is a choice of which rule to apply (search problem) Can use depth-first or breadth-first search, and goal ordering. The standard presentation is as shift-reduce parsing.

Shift-reduce parsing: one path cats scratch people with claws catsscratch people with clawsSHIFT Nscratch people with clawsREDUCE NPscratch people with clawsREDUCE NP scratchpeople with clawsSHIFT NP V people with clawsREDUCE NP V peoplewith clawsSHIFT NP V Nwith clawsREDUCE NP V NPwith clawsREDUCE NP V NP withclawsSHIFT NP V NP PclawsREDUCE NP V NP P clawsSHIFT NP V NP P NREDUCE NP V NP P NPREDUCE NP V NP PPREDUCE NP VPREDUCE SREDUCE What other search paths are there for parsing this sentence?

Soundness and completeness A parser is sound if every parse it returns is valid/correct A parser terminates if it is guaranteed to not go off into an infinite loop A parser is complete if for any given grammar and sentence, it is sound, produces every valid parse for that sentence, and terminates (For many purposes, we settle for sound but incomplete parsers: e.g., probabilistic parsers that return a k-best list.)

Problems with bottom-up parsing Unable to deal with empty categories: termination problem, unless rewriting empties as constituents is somehow restricted (but then it's generally incomplete) Useless work: locally possible, but globally impossible. Inefficient when there is great lexical ambiguity (grammar-driven control might help here) Conversely, it is data-directed: it attempts to parse the words that are there. Repeated work: anywhere there is common substructure

Problems with top-down parsing Left recursive rules A top-down parser will do badly if there are many different rules for the same LHS. Consider if there are 600 rules for S, 599 of which start with NP, but one of which starts with V, and the sentence starts with V. Useless work: expands things that are possible top-down but not there Top-down parsers do well if there is useful grammar- driven control: search is directed by the grammar Top-down is hopeless for rewriting parts of speech (preterminals) with words (terminals). In practice that is always done bottom-up as lexical lookup. Repeated work: anywhere there is common substructure

Repeated work…

Principles for success: take 1 If you are going to do parsing-as-search with a grammar as is: Left recursive structures must be found, not predicted Empty categories must be predicted, not found Doing these things doesn't fix the repeated work problem: Both TD (LL) and BU (LR) parsers can (and frequently do) do work exponential in the sentence length on NLP problems.

Principles for success: take 2 Grammar transformations can fix both left- recursion and epsilon productions Then you parse the same language but with different trees Linguists tend to hate you But this is a misconception: they shouldn't You can fix the trees post hoc: The transform-parse-detransform paradigm

Principles for success: take 3 Rather than doing parsing-as-search, we do parsing as dynamic programming This is the most standard way to do things Q.v. CKY parsing, next time It solves the problem of doing repeated work But there are also other ways of solving the problem of doing repeated work Memoization (remembering solved subproblems) Also, next time Doing graph-search rather than tree-search.

Human parsing Humans often do ambiguity maintenance Have the police … eaten their supper? come in and look around. taken out and shot. But humans also commit early and are “garden pathed”: The man who hunts ducks out on weekends. The cotton shirts are made from grows in Mississippi. The horse raced past the barn fell.

Polynomial time parsing of PCFGs

Probabilistic or stochastic context-free grammars (PCFGs) G = (T, N, S, R, P) T is set of terminals N is set of nonterminals For NLP, we usually distinguish out a set P  N of preterminals, which always rewrite as terminals S is the start symbol (one of the nonterminals) R is rules/productions of the form X  , where X is a nonterminal and  is a sequence of terminals and nonterminals (possibly an empty sequence) P(R) gives the probability of each rule. A grammar G generates a language model L.

PCFGs – Notation w 1n = w 1 … w n = the word sequence from 1 to n (sentence of length n) w ab = the subsequence w a … w b N j ab = the nonterminal N j dominating w a … w b N j w a … w b We’ll write P(N i  ζ j ) to mean P(N i  ζ j | N i ) We’ll want to calculate max t P(t  * w ab )

The probability of trees and strings P(t) -- The probability of tree is the product of the probabilities of the rules used to generate it. P(w 1n ) -- The probability of the string is the sum of the probabilities of the trees which have that string as their yield P(w 1n ) = Σ j P(w 1n, t) where t is a parse of w 1n = Σ j P(t)

A Simple PCFG (in CNF) S  NP VP 1.0 VP  V NP 0.7 VP  VP PP 0.3 PP  P NP 1.0 P  with 1.0 V  saw 1.0 NP  NP PP 0.4 NP  astronomers 0.1 NP  ears 0.18 NP  saw 0.04 NP  stars 0.18 NP  telescope 0.1

Tree and String Probabilities w 15 = astronomers saw stars with ears P(t 1 ) = 1.0 * 0.1 * 0.7 * 1.0 * 0.4 * 0.18 * 1.0 * 1.0 * 0.18 = P(t 2 ) = 1.0 * 0.1 * 0.3 * 0.7 * 1.0 * 0.18 * 1.0 * 1.0 * 0.18 = P(w 15 ) = P(t 1 ) + P(t 2 ) = =

Chomsky Normal Form All rules are of the form X  Y Z or X  w. A transformation to this form doesn’t change the weak generative capacity of CFGs. With some extra book-keeping in symbol names, you can even reconstruct the same trees with a detransform Unaries/empties are removed recursively N-ary rules introduce new nonterminals: VP  V NP PP becomes VP   NP PP In practice it’s a pain Reconstructing n-aries is easy Reconstructing unaries can be trickier But it makes parsing easier/more efficient

N-ary Trees in Treebank Lexicon and Grammar Binary Trees TreeAnnotations.annotateTree Parsing TODO: CKY parsing Treebank binarization

ROOT S NP VP N cats V NP PP PNP claws withpeople scratch N N An example: before binarization…

P NP claws with NP N cats people scratch N VP V ROOT After binarization..

ROOT S NP VP N cats V NP PP PNP claws withpeople scratch N N Binary rule

PP P NP claws with ROOT S NP VP N cats V NP people scratch N Seems redundant? (the rule was already binary) Reason: easier to see how to make finite-order horizontal markovizations – it’s like a finite automaton (explained later)

PP P NP claws with ROOT S NP VP N cats V NP people scratch N ternary rule

P NP claws with ROOT S NP N cats people scratch N VP V

P NP claws with ROOT S NP N cats people scratch N VP V

P NP claws with NP N cats people scratch N VP V ROOT

P NP claws with NP N cats people scratch N VP V ROOT VP  V NP PP Remembers 2 siblings If there’s a rule VP  V NP PP will exist.

Treebank: empties and unaries TOP S-HLN NP-SUBJVP VB-NONE-  Atone PTB Tree TOP S NPVP VB-NONE-  Atone NoFuncTags TOP S VP VB Atone NoEmpties TOP S Atone NoUnaries TOP VB Atone HighLow

function CKY(words, grammar) returns most probable parse/prob score = new double[#(words)+1][#(words)+][#(nonterms)] back = new Pair[#(words)+1][#(words)+1][#nonterms]] for i=0; i<#(words); i++ for A in nonterms if A -> words[i] in grammar score[i][i+1][A] = P(A -> words[i]) //handle unaries boolean added = true while added added = false for A, B in nonterms if score[i][i+1][B] > 0 && A->B in grammar prob = P(A->B)*score[i][i+1][B] if(prob > score[i][i+1][A]) score[i][i+1][A] = prob back[i][i+1] [A] = B added = true The CKY algorithm (1960/1965)

for span = 2 to #(words) for begin = 0 to #(words)- span end = begin + span for split = begin+1 to end-1 for A,B,C in nonterms prob=score[begin][split][B]*score[split][end][C]*P(A->BC) if(prob > score[begin][end][A]) score[begin]end][A] = prob back[begin][end][A] = new Triple(split,B,C) //handle unaries boolean added = true while added added = false for A, B in nonterms prob = P(A->B)*score[begin][end][B]; if(prob > score[begin][end] [A]) score[begin][end] [A] = prob back[begin][end] [A] = B added = true return buildTree(score, back) The CKY algorithm (1960/1965)

score[0][1] score[1][2] score[2][3] score[3][4] score[4][5] score[0][2] score[1][3] score[2][4] score[3][5] score[0][3] score[1][4] score[2][5] score[0][4] score[1][5] score[0][5] cats scratchwallswithclaws

N → cats P → cats V → cats N → scratch P → scratch V → scratch N → walls P → walls V → walls N → with P → with V → with N → claws P → claws V → claws cats scratchwallswithclaws for i=0; i<#(words); i++ for A in nonterms if A -> words[i] in grammar score[i][i+1][A] = P(A -> words[i]);

N → cats P → cats V → cats NP → → → NP N → scratch P → scratch V → scratch NP → → → NP N → walls P → walls V → walls NP → → → NP N → with P → with V → with NP → → → NP N → claws P → claws V → claws NP → → → NP // handle unaries cats scratchwallswithclaws

N → cats P → cats V → cats NP → → → NP N → scratch P → scratch V → scratch NP → → → NP N → walls P → walls V → walls NP → → → NP N → with P → with V → with NP → → → NP N → claws P → claws V → claws NP → → → NP PP → VP → PP → VP → PP → VP → PP → VP → prob=score[begin][split][B]*score[split][end][C]*P(A->BC)  >_P) For each A, only keep the “A->BC” with highest prob. cats scratchwallswithclaws

N → cats P → cats V → cats NP → → → NP N → scratch P → scratch V → scratch NP → N → NP → NP N → walls P → walls V → walls NP → N → NP → NP N → with P → with V → with NP → N → NP → NP N → claws P → claws V → claws NP → N → NP → NP PP → VP → → → PP PP → VP → → → PP PP → VP → → → PP PP → VP → → → PP // handle unaries cats scratchwallswithclaws N → scratch P → scratch V → scratch NP → → → NP N → walls P → walls V → walls NP → → → NP N → with P → with V → with NP → → → NP N → claws P → claws V → claws NP → → → NP

………

N → cats P → cats V → cats NP → → → NP N → scratch P → scratch V → scratch NP → → → NP N → walls P → walls V → walls NP → → → NP N → with P → with V → with NP → → → NP N → claws P → claws V → claws NP → → → NP PP → VP → → VP → PP → PP PP → VP → → VP → PP → PP PP → VP → → VP → PP → PP PP → VP → → VP → PP → PP → NP → S → ROOT → S → E E E-4 ROOT→S → NP → S → ROOT → S → NP PP → 5.187E-6 VP → → VP → PP → PP 5.187E-6 PP → VP → → VP → PP → PP → 1.600E-4 NP → 5.335E-5 S → ROOT → S → NP 5.335E Call buildTree(score, back) to get the best parse cats scratchwallswithclaws