CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.

Slides:



Advertisements
Similar presentations
March 1, 2009Dr. Muhammed Al_mulhem1 ICS482 Parsing Chapter 13 Muhammed Al-Mulhem March 1, 2009.
Advertisements

Basic Parsing with Context-Free Grammars CS 4705 Julia Hirschberg 1 Some slides adapted from Kathy McKeown and Dan Jurafsky.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
CYK )Cocke-Younger-Kasami) Parsing Algorithm
PARSING WITH CONTEXT-FREE GRAMMARS
Parsing with Context Free Grammars Reading: Chap 13, Jurafsky & Martin
Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.
1 Earley Algorithm Chapter 13.4 October 2009 Lecture #9.
Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05.
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
Fall 2004 Lecture Notes #5 EECS 595 / LING 541 / SI 661 Natural Language Processing.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור תשע Bottom Up Parsing עידו דגן.
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
Pushpak Bhattacharyya CSE Dept., IIT Bombay 21st March, 2011
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 Syntax Sudeshna Sarkar 25 Aug Sentence-Types Declaratives: A plane left S -> NP VP Imperatives: Leave! S -> VP Yes-No Questions: Did the plane.
1 CPE 480 Natural Language Processing Lecture 5: Parser Asst. Prof. Nuttanart Facundes, Ph.D.
CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.
1 CKY and Earley Algorithms Chapter 13 October 2012 Lecture #8.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
Parsing with Context Free Grammars CSC 9010 Natural Language Processing Paula Matuszek and Mary-Angela Papalaskari This slide set was adapted from: Jim.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 17: Parsing Ambiguity, Probabilistic Parsing, sample seminar 17.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 17 (14/03/06) Prof. Pushpak Bhattacharyya IIT Bombay Formulation of Grammar.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Motivation –A lot of the work is repeated –Caching intermediate results improves the complexity Dynamic programming –Building.
CPSC 503 Computational Linguistics
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
Exercises on Chomsky Normal Form and CYK parsing
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
Speech and Language Processing SLP Chapter 13 Parsing.
1 Statistical methods in NLP Diana Trandabat
CSC 594 Topics in AI – Natural Language Processing
Parsing Recommended Reading: Ch th Jurafsky & Martin 2nd edition
Statistical NLP Winter 2009
CS60057 Speech &Natural Language Processing
Parsing Recommended Reading: Ch th Jurafsky & Martin 2nd edition
CS : Speech, NLP and the Web/Topics in AI
CSCI 5832 Natural Language Processing
CPSC 503 Computational Linguistics
CSCI 5832 Natural Language Processing
CS 388: Natural Language Processing: Syntactic Parsing
Parsing and More Parsing
CS : Language Technology for the Web/Natural Language Processing
David Kauchak CS159 – Spring 2019
Parsing I: CFGs & the Earley Parser
David Kauchak CS159 – Spring 2019
Presentation transcript:

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE Dept., IIT Bombay 22 nd March, 2011

Penn POS Tags [John/NNP ] wrote/VBD [ those/DT words/NNS ] in/IN [ the/DT Book/NN ] of/IN [ Proverbs/NNS ] John wrote those words in the Book of Proverbs.

Penn Treebank (S (NP-SBJ (NP John)) (VP wrote (NP those words) (PP-LOC in (NP (NP-TTL (NP the Book) (PP of (NP Proverbs))) John wrote those words in the Book of Proverbs.

PSG Parse Tree Official trading in the shares will start in Paris on Nov 6. S VP NP N AP official PP trading willstart on Nov 6 A PP NP in P the shares NP PPVAux in Paris

Penn POS Tags [ Official/JJ trading/NN ] in/IN [ the/DT shares/NNS ] will/MD start/VB in/IN [ Paris/NNP ] on/IN [ Nov./NNP 6/CD ] Official trading in the shares will start in Paris on Nov 6.

Penn POS Tag Sset Adjective: JJ Adverb: RB Cardinal Number: CD Determiner:DT Preposition: IN Coordinating ConjunctionCC Subordinating Conjunction: IN Singular Noun:NN Plural Noun:NNS Personal Pronoun:PP Proper Noun:NP Verb base form: VB Modal verb:MD Verb (3sg Pres):VBZ Wh-determiner:WDT Wh-pronoun:WP

CYK Parsing (some slides borrowed from Jimmy Lin’s “Syntactic Parsing with CFGs)

Shared Sub-Problems Observation: ambiguous parses still share sub-trees We don’t want to redo work that’s already been done Unfortunately, naïve backtracking leads to duplicate work

Shared Sub-Problems: Example

Efficient Parsing Dynamic programming to the rescue! Intuition: store partial results in tables, thereby: Avoiding repeated work on shared sub- problems Efficiently storing ambiguous structures with shared sub-parts Two algorithms: CKY: roughly, bottom-up Earley: roughly, top-down

CKY Parsing: CNF CKY parsing requires that the grammar consist of ε-free, binary rules = Chomsky Normal Form All rules of the form: A  BC or A  a What does the tree look like? What if my CFG isn’t in CNF? A → B C D → w

CKY Parsing with Arbitrary CFGs Problem: my grammar has rules like VP → NP PP PP Can’t apply CKY! Solution: rewrite grammar into CNF Introduce new intermediate non-terminals into the grammar What does this mean? = weak equivalence The rewritten grammar accepts (and rejects) the same set of strings as the original grammar… But the resulting derivations (trees) are different A  B C D A  X D X  B C (Where X is a symbol that doesn’t occur anywhere else in the grammar)

CKY Parsing: Intuition Consider the rule D → w Terminal (word) forms a constituent Trivial to apply Consider the rule A → B C If there is an A somewhere in the input then there must be a B followed by a C in the input First, precisely define span [ i, j ] If A spans from i to j in the input then there must be some k such that i<k<j Easy to apply: we just need to try different values for k ij k

CKY Parsing: Table Any constituent can conceivably span [ i, j ] for all 0≤i<j≤N, where N = length of input string We need an N × N table to keep track of all spans… But we only need half of the table Semantics of table: cell [ i, j ] contains A iff A spans i to j in the input string Of course, must be allowed by the grammar!

CKY Parsing: Table-Filling In order for A to span [ i, j ]: A  B C is a rule in the grammar, and There must be a B in [ i, k ] and a C in [ k, j ] for some i<k<j Operationally: To apply rule A  B C, look for a B in [ i, k ] and a C in [ k, j ] In the table: look left in the row and down in the column

CKY Algorithm

CKY Parsing: Recognize or Parse Is this really a parser? Recognizer to parser: add backpointers!

CKY: Algorithmic Complexity What’s the asymptotic complexity of CKY? O(n 3 )

CKY: Analysis Since it’s bottom up, CKY populates the table with a lot of “phantom constituents” Spans that are constituents, but cannot really occur in the context in which they are suggested Conversion of grammar to CNF adds additional non-terminal nodes Leads to weak equivalence wrt original grammar Additional terminal nodes not (linguistically) meaningful: but can be cleaned up with post processing Is there a parsing algorithm for arbitrary CFGs that combines dynamic programming and top-down control ? Yes: Earley Parsing

Penn Treebank ( (S (NP-SBJ (NP Official trading) (PP in (NP the shares))) (VP will (VP start (PP-LOC in (NP Paris)) (PP-TMP on (NP (NP Nov 6) Official trading in the shares will start in Paris on Nov 6.

Probabilistic Context Free Grammars S  NP VP1.0 NP  DT NN0.5 NP  NNS0.3 NP  NP PP 0.2 PP  P NP1.0 VP  VP PP 0.6 VP  VBD NP0.4 DT  the1.0 NN  gunman0.5 NN  building0.5 VBD  sprayed 1.0 NNS  bullets1.0

Example Parse t 1 The gunman sprayed the building with bullets. S 1.0 NP 0.5 VP 0.6 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunman sprayed P (t 1 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = VP 0.4

Another Parse t 2 S 1.0 NP 0.5 VP 0.4 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullet s with buildingth e Thegunmansprayed NP 0.2 P (t 2 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = The gunman sprayed the building with bullets.

Illustrating CYK [Cocke, Younger, Kashmi] Algo S  NP VP1.0 NP  DT NN0.5 NP  NNS0.3 NP  NP PP 0.2 PP  P NP1.0 VP  VP PP 0.6 VP  VBD NP0.4 DT  the1.0 NN  gunman0.5 NN  building0.5 VBD  sprayed 1.0 NNS  bullets1.0

CYK: Start with (0,1) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DT

CYK: Keep filling diagonals 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DT NN

CYK: Try getting higher level structures 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP NN

CYK: Diagonal continues 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP NN VBD

CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP NN VBD

CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP NN VBD DT

CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP NN VBD DT NN

CYK: starts filling the 5 th column 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP NN VBD DTNP NN

CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP NN VBD VP DTNP NN

CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP NN VBD VP DTNP NN

CYK: S found, but NO termination! 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP S NN VBD VP DTNP NN

CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP S NN VBD VP DTNP NN P

CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP S NN VBD VP DTNP NN P

CYK: Control moves to last column 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP S NN VBD VP DTNP NN P NP NNS

CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP S NN VBD VP DTNP NN PPP NP NNS

CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP S NN VBD VP DTNP NP NN PPP NP NNS

CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP S NN VBD VP VP DTNP NP NN PPP NP NNS

CYK: filling the last column 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP S NN VBD VP VP DTNP NP NN PPP NP NNS

CYK: terminates with S in (0,7) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From DTNP S S NN VBD VP VP DTNP NP NN PPP NP NNS

CYK: Extracting the Parse Tree The parse tree is obtained by keeping back pointers. S (0-7) NP (0- 2) VP (2- 7) VBD (2- 3) NP (3- 7) DT (0- 1) NN (1- 2) The gunma n sprayed NP (3- 5) PP (5- 7) DT (3- 4) NN (4- 5) P (5- 6) NP (6-7) NNS (6-7) thebuilding with bullets