Download presentation
Presentation is loading. Please wait.
Published byFranklin Osborne Modified over 9 years ago
1
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE Dept., IIT Bombay 22 nd March, 2011
2
Penn POS Tags [John/NNP ] wrote/VBD [ those/DT words/NNS ] in/IN [ the/DT Book/NN ] of/IN [ Proverbs/NNS ] John wrote those words in the Book of Proverbs.
3
Penn Treebank (S (NP-SBJ (NP John)) (VP wrote (NP those words) (PP-LOC in (NP (NP-TTL (NP the Book) (PP of (NP Proverbs))) John wrote those words in the Book of Proverbs.
4
PSG Parse Tree Official trading in the shares will start in Paris on Nov 6. S VP NP N AP official PP trading willstart on Nov 6 A PP NP in P the shares NP PPVAux in Paris
5
Penn POS Tags [ Official/JJ trading/NN ] in/IN [ the/DT shares/NNS ] will/MD start/VB in/IN [ Paris/NNP ] on/IN [ Nov./NNP 6/CD ] Official trading in the shares will start in Paris on Nov 6.
6
Penn POS Tag Sset Adjective: JJ Adverb: RB Cardinal Number: CD Determiner:DT Preposition: IN Coordinating ConjunctionCC Subordinating Conjunction: IN Singular Noun:NN Plural Noun:NNS Personal Pronoun:PP Proper Noun:NP Verb base form: VB Modal verb:MD Verb (3sg Pres):VBZ Wh-determiner:WDT Wh-pronoun:WP
7
CYK Parsing (some slides borrowed from Jimmy Lin’s “Syntactic Parsing with CFGs)
8
Shared Sub-Problems Observation: ambiguous parses still share sub-trees We don’t want to redo work that’s already been done Unfortunately, naïve backtracking leads to duplicate work
9
Shared Sub-Problems: Example
10
Efficient Parsing Dynamic programming to the rescue! Intuition: store partial results in tables, thereby: Avoiding repeated work on shared sub- problems Efficiently storing ambiguous structures with shared sub-parts Two algorithms: CKY: roughly, bottom-up Earley: roughly, top-down
11
CKY Parsing: CNF CKY parsing requires that the grammar consist of ε-free, binary rules = Chomsky Normal Form All rules of the form: A BC or A a What does the tree look like? What if my CFG isn’t in CNF? A → B C D → w
12
CKY Parsing with Arbitrary CFGs Problem: my grammar has rules like VP → NP PP PP Can’t apply CKY! Solution: rewrite grammar into CNF Introduce new intermediate non-terminals into the grammar What does this mean? = weak equivalence The rewritten grammar accepts (and rejects) the same set of strings as the original grammar… But the resulting derivations (trees) are different A B C D A X D X B C (Where X is a symbol that doesn’t occur anywhere else in the grammar)
13
CKY Parsing: Intuition Consider the rule D → w Terminal (word) forms a constituent Trivial to apply Consider the rule A → B C If there is an A somewhere in the input then there must be a B followed by a C in the input First, precisely define span [ i, j ] If A spans from i to j in the input then there must be some k such that i<k<j Easy to apply: we just need to try different values for k ij k
14
CKY Parsing: Table Any constituent can conceivably span [ i, j ] for all 0≤i<j≤N, where N = length of input string We need an N × N table to keep track of all spans… But we only need half of the table Semantics of table: cell [ i, j ] contains A iff A spans i to j in the input string Of course, must be allowed by the grammar!
15
CKY Parsing: Table-Filling In order for A to span [ i, j ]: A B C is a rule in the grammar, and There must be a B in [ i, k ] and a C in [ k, j ] for some i<k<j Operationally: To apply rule A B C, look for a B in [ i, k ] and a C in [ k, j ] In the table: look left in the row and down in the column
16
CKY Algorithm
17
CKY Parsing: Recognize or Parse Is this really a parser? Recognizer to parser: add backpointers!
18
CKY: Algorithmic Complexity What’s the asymptotic complexity of CKY? O(n 3 )
19
CKY: Analysis Since it’s bottom up, CKY populates the table with a lot of “phantom constituents” Spans that are constituents, but cannot really occur in the context in which they are suggested Conversion of grammar to CNF adds additional non-terminal nodes Leads to weak equivalence wrt original grammar Additional terminal nodes not (linguistically) meaningful: but can be cleaned up with post processing Is there a parsing algorithm for arbitrary CFGs that combines dynamic programming and top-down control ? Yes: Earley Parsing
20
Penn Treebank ( (S (NP-SBJ (NP Official trading) (PP in (NP the shares))) (VP will (VP start (PP-LOC in (NP Paris)) (PP-TMP on (NP (NP Nov 6) Official trading in the shares will start in Paris on Nov 6.
21
Probabilistic Context Free Grammars S NP VP1.0 NP DT NN0.5 NP NNS0.3 NP NP PP 0.2 PP P NP1.0 VP VP PP 0.6 VP VBD NP0.4 DT the1.0 NN gunman0.5 NN building0.5 VBD sprayed 1.0 NNS bullets1.0
22
Example Parse t 1 The gunman sprayed the building with bullets. S 1.0 NP 0.5 VP 0.6 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunman sprayed P (t 1 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.00225 VP 0.4
23
Another Parse t 2 S 1.0 NP 0.5 VP 0.4 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullet s with buildingth e Thegunmansprayed NP 0.2 P (t 2 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.0015 The gunman sprayed the building with bullets.
24
Illustrating CYK [Cocke, Younger, Kashmi] Algo S NP VP1.0 NP DT NN0.5 NP NNS0.3 NP NP PP 0.2 PP P NP1.0 VP VP PP 0.6 VP VBD NP0.4 DT the1.0 NN gunman0.5 NN building0.5 VBD sprayed 1.0 NNS bullets1.0
25
CYK: Start with (0,1) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DT 1------- 2 ------- -- 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
26
CYK: Keep filling diagonals 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DT 1-------NN 2-------------- -- 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
27
CYK: Try getting higher level structures 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP 1-------NN 2-------------- -- 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
28
CYK: Diagonal continues 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP 1-------NN 2-------------- -- VBD 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
29
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - 1-------NN------- - 2-------------- -- VBD 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
30
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - 1-------NN------- - 2-------------- -- VBD 3-------------- -- ------- - DT 4------- - ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
31
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - 1-------NN------- - -------- - 2-------------- -- VBD-------- - 3-------------- -- ------- - DT 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
32
CYK: starts filling the 5 th column 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - 1-------NN------- - -------- - 2-------------- -- VBD-------- - 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
33
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
34
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
35
CYK: S found, but NO termination! 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
36
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - P 6------- - ------- -- ------- - -------- -
37
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - 3-------------- -- ------- - DTNP-------- - 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - P 6------- - ------- -- ------- - -------- -
38
CYK: Control moves to last column 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - 3-------------- -- ------- - DTNP-------- - 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - P 6------- - ------- -- ------- - -------- - NP NNS
39
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - 3-------------- -- ------- - DTNP-------- - 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS
40
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - 3-------------- -- ------- - DTNP-------- - NP 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS
41
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - VP 3-------------- -- ------- - DTNP-------- - NP 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS
42
CYK: filling the last column 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - VP 3-------------- -- ------- - DTNP-------- - NP 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS
43
CYK: terminates with S in (0,7) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - VP 3-------------- -- ------- - DTNP-------- - NP 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS
44
CYK: Extracting the Parse Tree The parse tree is obtained by keeping back pointers. S (0-7) NP (0- 2) VP (2- 7) VBD (2- 3) NP (3- 7) DT (0- 1) NN (1- 2) The gunma n sprayed NP (3- 5) PP (5- 7) DT (3- 4) NN (4- 5) P (5- 6) NP (6-7) NNS (6-7) thebuilding with bullets
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.