Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.

Similar presentations


Presentation on theme: "CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE."— Presentation transcript:

1 CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE Dept., IIT Bombay 22 nd March, 2011

2 Penn POS Tags [John/NNP ] wrote/VBD [ those/DT words/NNS ] in/IN [ the/DT Book/NN ] of/IN [ Proverbs/NNS ] John wrote those words in the Book of Proverbs.

3 Penn Treebank (S (NP-SBJ (NP John)) (VP wrote (NP those words) (PP-LOC in (NP (NP-TTL (NP the Book) (PP of (NP Proverbs))) John wrote those words in the Book of Proverbs.

4 PSG Parse Tree Official trading in the shares will start in Paris on Nov 6. S VP NP N AP official PP trading willstart on Nov 6 A PP NP in P the shares NP PPVAux in Paris

5 Penn POS Tags [ Official/JJ trading/NN ] in/IN [ the/DT shares/NNS ] will/MD start/VB in/IN [ Paris/NNP ] on/IN [ Nov./NNP 6/CD ] Official trading in the shares will start in Paris on Nov 6.

6 Penn POS Tag Sset Adjective: JJ Adverb: RB Cardinal Number: CD Determiner:DT Preposition: IN Coordinating ConjunctionCC Subordinating Conjunction: IN Singular Noun:NN Plural Noun:NNS Personal Pronoun:PP Proper Noun:NP Verb base form: VB Modal verb:MD Verb (3sg Pres):VBZ Wh-determiner:WDT Wh-pronoun:WP

7 CYK Parsing (some slides borrowed from Jimmy Lin’s “Syntactic Parsing with CFGs)

8 Shared Sub-Problems Observation: ambiguous parses still share sub-trees We don’t want to redo work that’s already been done Unfortunately, naïve backtracking leads to duplicate work

9 Shared Sub-Problems: Example

10 Efficient Parsing Dynamic programming to the rescue! Intuition: store partial results in tables, thereby: Avoiding repeated work on shared sub- problems Efficiently storing ambiguous structures with shared sub-parts Two algorithms: CKY: roughly, bottom-up Earley: roughly, top-down

11 CKY Parsing: CNF CKY parsing requires that the grammar consist of ε-free, binary rules = Chomsky Normal Form All rules of the form: A  BC or A  a What does the tree look like? What if my CFG isn’t in CNF? A → B C D → w

12 CKY Parsing with Arbitrary CFGs Problem: my grammar has rules like VP → NP PP PP Can’t apply CKY! Solution: rewrite grammar into CNF Introduce new intermediate non-terminals into the grammar What does this mean? = weak equivalence The rewritten grammar accepts (and rejects) the same set of strings as the original grammar… But the resulting derivations (trees) are different A  B C D A  X D X  B C (Where X is a symbol that doesn’t occur anywhere else in the grammar)

13 CKY Parsing: Intuition Consider the rule D → w Terminal (word) forms a constituent Trivial to apply Consider the rule A → B C If there is an A somewhere in the input then there must be a B followed by a C in the input First, precisely define span [ i, j ] If A spans from i to j in the input then there must be some k such that i<k<j Easy to apply: we just need to try different values for k ij k

14 CKY Parsing: Table Any constituent can conceivably span [ i, j ] for all 0≤i<j≤N, where N = length of input string We need an N × N table to keep track of all spans… But we only need half of the table Semantics of table: cell [ i, j ] contains A iff A spans i to j in the input string Of course, must be allowed by the grammar!

15 CKY Parsing: Table-Filling In order for A to span [ i, j ]: A  B C is a rule in the grammar, and There must be a B in [ i, k ] and a C in [ k, j ] for some i<k<j Operationally: To apply rule A  B C, look for a B in [ i, k ] and a C in [ k, j ] In the table: look left in the row and down in the column

16 CKY Algorithm

17 CKY Parsing: Recognize or Parse Is this really a parser? Recognizer to parser: add backpointers!

18 CKY: Algorithmic Complexity What’s the asymptotic complexity of CKY? O(n 3 )

19 CKY: Analysis Since it’s bottom up, CKY populates the table with a lot of “phantom constituents” Spans that are constituents, but cannot really occur in the context in which they are suggested Conversion of grammar to CNF adds additional non-terminal nodes Leads to weak equivalence wrt original grammar Additional terminal nodes not (linguistically) meaningful: but can be cleaned up with post processing Is there a parsing algorithm for arbitrary CFGs that combines dynamic programming and top-down control ? Yes: Earley Parsing

20 Penn Treebank ( (S (NP-SBJ (NP Official trading) (PP in (NP the shares))) (VP will (VP start (PP-LOC in (NP Paris)) (PP-TMP on (NP (NP Nov 6) Official trading in the shares will start in Paris on Nov 6.

21 Probabilistic Context Free Grammars S  NP VP1.0 NP  DT NN0.5 NP  NNS0.3 NP  NP PP 0.2 PP  P NP1.0 VP  VP PP 0.6 VP  VBD NP0.4 DT  the1.0 NN  gunman0.5 NN  building0.5 VBD  sprayed 1.0 NNS  bullets1.0

22 Example Parse t 1 The gunman sprayed the building with bullets. S 1.0 NP 0.5 VP 0.6 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunman sprayed P (t 1 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.00225 VP 0.4

23 Another Parse t 2 S 1.0 NP 0.5 VP 0.4 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullet s with buildingth e Thegunmansprayed NP 0.2 P (t 2 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.0015 The gunman sprayed the building with bullets.

24 Illustrating CYK [Cocke, Younger, Kashmi] Algo S  NP VP1.0 NP  DT NN0.5 NP  NNS0.3 NP  NP PP 0.2 PP  P NP1.0 VP  VP PP 0.6 VP  VBD NP0.4 DT  the1.0 NN  gunman0.5 NN  building0.5 VBD  sprayed 1.0 NNS  bullets1.0

25 CYK: Start with (0,1) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DT 1------- 2 ------- -- 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

26 CYK: Keep filling diagonals 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DT 1-------NN 2-------------- -- 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

27 CYK: Try getting higher level structures 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP 1-------NN 2-------------- -- 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

28 CYK: Diagonal continues 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP 1-------NN 2-------------- -- VBD 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

29 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - 1-------NN------- - 2-------------- -- VBD 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

30 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - 1-------NN------- - 2-------------- -- VBD 3-------------- -- ------- - DT 4------- - ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

31 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - 1-------NN------- - -------- - 2-------------- -- VBD-------- - 3-------------- -- ------- - DT 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

32 CYK: starts filling the 5 th column 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - 1-------NN------- - -------- - 2-------------- -- VBD-------- - 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

33 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

34 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

35 CYK: S found, but NO termination! 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

36 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - P 6------- - ------- -- ------- - -------- -

37 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - 3-------------- -- ------- - DTNP-------- - 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - P 6------- - ------- -- ------- - -------- -

38 CYK: Control moves to last column 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - 3-------------- -- ------- - DTNP-------- - 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - P 6------- - ------- -- ------- - -------- - NP NNS

39 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - 3-------------- -- ------- - DTNP-------- - 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS

40 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - 3-------------- -- ------- - DTNP-------- - NP 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS

41 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - VP 3-------------- -- ------- - DTNP-------- - NP 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS

42 CYK: filling the last column 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - VP 3-------------- -- ------- - DTNP-------- - NP 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS

43 CYK: terminates with S in (0,7) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - VP 3-------------- -- ------- - DTNP-------- - NP 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS

44 CYK: Extracting the Parse Tree The parse tree is obtained by keeping back pointers. S (0-7) NP (0- 2) VP (2- 7) VBD (2- 3) NP (3- 7) DT (0- 1) NN (1- 2) The gunma n sprayed NP (3- 5) PP (5- 7) DT (3- 4) NN (4- 5) P (5- 6) NP (6-7) NNS (6-7) thebuilding with bullets


Download ppt "CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE."

Similar presentations


Ads by Google