Download presentation
Presentation is loading. Please wait.
Published bySara Morgan Modified over 9 years ago
1
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE Dept., IIT Bombay 24 th March, 2011
2
CYK Parsing
3
Shared Sub-Problems: Example
4
CKY Parsing: CNF CKY parsing requires that the grammar consist of ε-free, binary rules = Chomsky Normal Form All rules of the form: A BC or A a What does the tree look like? What if my CFG isn’t in CNF? A → B C D → w
5
CKY Algorithm
6
Illustrating CYK [Cocke, Younger, Kashmi] Algo S NP VP1.0 NP DT NN0.5 NP NNS0.3 NP NP PP 0.2 PP P NP1.0 VP VP PP 0.6 VP VBD NP0.4 DT the1.0 NN gunman0.5 NN building0.5 VBD sprayed 1.0 NNS bullets1.0
7
CYK: Start with (0,1) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DT 1------- 2 ------- -- 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
8
CYK: Keep filling diagonals 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DT 1-------NN 2-------------- -- 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
9
CYK: Try getting higher level structures 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP 1-------NN 2-------------- -- 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
10
CYK: Diagonal continues 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP 1-------NN 2-------------- -- VBD 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
11
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - 1-------NN------- - 2-------------- -- VBD 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
12
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - 1-------NN------- - 2-------------- -- VBD 3-------------- -- ------- - DT 4------- - ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
13
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - 1-------NN------- - -------- - 2-------------- -- VBD-------- - 3-------------- -- ------- - DT 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
14
CYK: starts filling the 5 th column 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - 1-------NN------- - -------- - 2-------------- -- VBD-------- - 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
15
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
16
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
17
CYK: S found, but NO termination! 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -
18
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - P 6------- - ------- -- ------- - -------- -
19
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - 3-------------- -- ------- - DTNP-------- - 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - P 6------- - ------- -- ------- - -------- -
20
CYK: Control moves to last column 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - 3-------------- -- ------- - DTNP-------- - 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - P 6------- - ------- -- ------- - -------- - NP NNS
21
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - 3-------------- -- ------- - DTNP-------- - 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS
22
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - 3-------------- -- ------- - DTNP-------- - NP 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS
23
CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - VP 3-------------- -- ------- - DTNP-------- - NP 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS
24
CYK: filling the last column 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - VP 3-------------- -- ------- - DTNP-------- - NP 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS
25
CYK: terminates with S in (0,7) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - VP 3-------------- -- ------- - DTNP-------- - NP 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS
26
CYK: Extracting the Parse Tree The parse tree is obtained by keeping back pointers. S (0-7) NP (0- 2) VP (2- 7) VBD (2- 3) NP (3- 7) DT (0- 1) NN (1- 2) The gunma n sprayed NP (3- 5) PP (5- 7) DT (3- 4) NN (4- 5) P (5- 6) NP (6-7) NNS (6-7) thebuilding with bullets
27
Probabilistic parse tree construction
28
Interesting Probabilities The gunman sprayed the building with bullets 1 2 3 45 6 7 N1N1 NP What is the probability of having a NP at this position such that it will derive “the building” ? - What is the probability of starting from N 1 and deriving “The gunman sprayed”, a NP and “with bullets” ? - Inside Probabilities Outside Probabilities
29
Interesting Probabilities Random variables to be considered The non-terminal being expanded. E.g., NP The word-span covered by the non-terminal. E.g., (4,5) refers to words “the building” While calculating probabilities, consider: The rule to be used for expansion : E.g., NP DT NN The probabilities associated with the RHS non- terminals : E.g., DT subtree’s inside/outside probabilities & NN subtree’s inside/outside probabilities
30
Outside Probability j (p,q) : The probability of beginning with N 1 & generating the non-terminal N j pq and all words outside w p..w q w 1 ………w p-1 w p …w q w q+1 ……… w m N1N1 NjNj
31
Inside Probabilities j (p,q) : The probability of generating the words w p..w q starting with the non-terminal N j pq. w 1 ………w p-1 w p …w q w q+1 ……… w m N1N1 NjNj
32
Outside & Inside Probabilities: example The gunman sprayed the building with bullets 1 2 3 45 6 7 N1N1 NP
33
Calculating Inside probabilities j (p,q) Base case: Base case is used for rules which derive the words or terminals directly E.g., Suppose N j = NN is being considered & NN building is one of the rules with probability 0.5
34
Induction Step: Assuming Grammar in Chomsky Normal Form Induction step : wpwp NjNj NrNr NsNs wdwd w d+1 wqwq Consider different splits of the words - indicated by d E.g., the huge building Consider different non-terminals to be used in the rule: NP DT NN, NP DT NNS are available options Consider summation over all these. Split here for d=2 d=3
35
The Bottom-Up Approach The idea of induction Consider “the gunman” Base cases : Apply unary rules DT the Prob = 1.0 NN gunmanProb = 0.5 Induction : Prob that a NP covers these 2 words = P (NP DT NN) * P (DT deriving the word “the”) * P (NN deriving the word “gunman”) = 0.5 * 1.0 * 0.5 = 0.25 The gunman NP 0.5 DT 1.0 NN 0.5
36
Parse Triangle A parse triangle is constructed for calculating j (p,q) Probability of a sentence using j (p,q):
37
Parse Triangle The (1) gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) 0 1 2 3 4 5 6 Fill diagonals with from to
38
Parse Triangle The (1) gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) 1 2 3 4 5 6 7 Calculate using induction formula
39
Example Parse t 1 S 1.0 NP 0.5 VP 0.6 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullet s with buildingth e Thegunman sprayed VP 0.4 Rule used here is VP VP PP The gunman sprayed the building with bullets.
40
Another Parse t 2 S 1.0 NP 0.5 VP 0.4 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullet s with buildingth e Thegunmansprayed NP 0.2 Rule used here is VP VBD NP The gunman sprayed the building with bullets.
41
Parse Triangle The (1)gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) 1 2 3 4 5 6 7
42
Different Parses Consider Different splitting points : E.g., 5th and 3 rd position Using different rules for VP expansion : E.g., VP VP PP, VP VBD NP Different parses for the VP “sprayed the building with bullets” can be constructed this way.
43
The Viterbi-like Algorithm for PCFGs Very similar to calculation of inside probabilities i (p,q) Instead of summing over all ways of constructing the parse for w pq –Choose only the best way (the maximum probability one!)
44
Calculation of i (p,q) This rule is chosen VP 0.4 PP 1.0 VP 0.4 VBD 1.0 NP 0.2 0.6 * 1.0 * 0.3 = 0.18 0.4 * 1.0 * 0.015 = 0.06
45
Viterbi-like Algorithm Base case: Induction : i (p,q) stores RHS of the rule selected Position of splitting Example : VP (3,7) stores VP, PP and split position = 5 because VP VP PP is the rule used. Backtracing : Start from 1 (1,7) and 1 (1,7) and backtrace.
46
Example 1 (1,7) records S NP VP & split position as 2 NP (1,2) records NP DT NN & split position as 1 VP (3,7) records VP VP PP & split position as 5 S NPVP The gunman sprayed the building with bullets 1 2 3 45 6 7
47
Example S NPVP The gunman sprayed the building with bullets 1 2 3 4 5 6 7 DTNN VP PP
48
Grammar Induction Annotated corpora like Penn Treebank Counts used as follows: Sample training data: NP The boy DT NN NP Those cars DT NNS NP Bears NNS NP That book DT NN NP She PRP
49
Grammar Induction for Unannotated Corpora: EM algorithm Start with initial estimates for rule probabilities Compute probability of each parse of a sentence according to current estimates of rule probabilities Compute expectation of how often a rule is used (summing probabilities of rules used in previous step) Refine rule probabilities so that training corpus likelihood increases EXPECTATION PHASE MAXIMIZATION PHASE
50
Outside Probabilities j (p,q) Base case: Inductive step for calculating : wpwp N f pe N j pq N g (q+1) e wqwq w q+1 wewe w p-1 w1w1 wmwm w e+1 N1N1 Summation over f, g & e
51
Probability of a Sentence Joint probability of a sentence w 1m and that there is a constituent spanning words w p to w q is given as: The gunman sprayed the building with bullets 1 2 3 4 5 6 7 N1N1 NP
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.