Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.

Similar presentations


Presentation on theme: "CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE."— Presentation transcript:

1 CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE Dept., IIT Bombay 24 th March, 2011

2 CYK Parsing

3 Shared Sub-Problems: Example

4 CKY Parsing: CNF CKY parsing requires that the grammar consist of ε-free, binary rules = Chomsky Normal Form All rules of the form: A  BC or A  a What does the tree look like? What if my CFG isn’t in CNF? A → B C D → w

5 CKY Algorithm

6 Illustrating CYK [Cocke, Younger, Kashmi] Algo S  NP VP1.0 NP  DT NN0.5 NP  NNS0.3 NP  NP PP 0.2 PP  P NP1.0 VP  VP PP 0.6 VP  VBD NP0.4 DT  the1.0 NN  gunman0.5 NN  building0.5 VBD  sprayed 1.0 NNS  bullets1.0

7 CYK: Start with (0,1) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DT 1------- 2 ------- -- 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

8 CYK: Keep filling diagonals 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DT 1-------NN 2-------------- -- 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

9 CYK: Try getting higher level structures 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP 1-------NN 2-------------- -- 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

10 CYK: Diagonal continues 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP 1-------NN 2-------------- -- VBD 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

11 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - 1-------NN------- - 2-------------- -- VBD 3-------------- -- ------- - 4 ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

12 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - 1-------NN------- - 2-------------- -- VBD 3-------------- -- ------- - DT 4------- - ------- -- ------- - -------- - 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

13 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - 1-------NN------- - -------- - 2-------------- -- VBD-------- - 3-------------- -- ------- - DT 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

14 CYK: starts filling the 5 th column 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - 1-------NN------- - -------- - 2-------------- -- VBD-------- - 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

15 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

16 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

17 CYK: S found, but NO termination! 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - 6------- - ------- -- ------- - -------- -

18 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP 3-------------- -- ------- - DTNP 4------- - ------- -- ------- - -------- - NN 5------- - ------- -- ------- - -------- - P 6------- - ------- -- ------- - -------- -

19 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - 3-------------- -- ------- - DTNP-------- - 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - P 6------- - ------- -- ------- - -------- -

20 CYK: Control moves to last column 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - 3-------------- -- ------- - DTNP-------- - 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - P 6------- - ------- -- ------- - -------- - NP NNS

21 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - 3-------------- -- ------- - DTNP-------- - 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS

22 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - 3-------------- -- ------- - DTNP-------- - NP 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS

23 CYK (cont…) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - VP 3-------------- -- ------- - DTNP-------- - NP 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS

24 CYK: filling the last column 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - VP 3-------------- -- ------- - DTNP-------- - NP 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS

25 CYK: terminates with S in (0,7) 0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7. To From 1234567 0DTNP------- - -------- - S S 1-------NN------- - -------- - 2-------------- -- VBD-------- - VP-------- - VP 3-------------- -- ------- - DTNP-------- - NP 4------- - ------- -- ------- - -------- - NN-------- - 5------- - ------- -- ------- - -------- - PPP 6------- - ------- -- ------- - -------- - NP NNS

26 CYK: Extracting the Parse Tree The parse tree is obtained by keeping back pointers. S (0-7) NP (0- 2) VP (2- 7) VBD (2- 3) NP (3- 7) DT (0- 1) NN (1- 2) The gunma n sprayed NP (3- 5) PP (5- 7) DT (3- 4) NN (4- 5) P (5- 6) NP (6-7) NNS (6-7) thebuilding with bullets

27 Probabilistic parse tree construction

28 Interesting Probabilities The gunman sprayed the building with bullets 1 2 3 45 6 7 N1N1 NP What is the probability of having a NP at this position such that it will derive “the building” ? - What is the probability of starting from N 1 and deriving “The gunman sprayed”, a NP and “with bullets” ? - Inside Probabilities Outside Probabilities

29 Interesting Probabilities Random variables to be considered The non-terminal being expanded. E.g., NP The word-span covered by the non-terminal. E.g., (4,5) refers to words “the building” While calculating probabilities, consider: The rule to be used for expansion : E.g., NP  DT NN The probabilities associated with the RHS non- terminals : E.g., DT subtree’s inside/outside probabilities & NN subtree’s inside/outside probabilities

30 Outside Probability  j (p,q) : The probability of beginning with N 1 & generating the non-terminal N j pq and all words outside w p..w q w 1 ………w p-1 w p …w q w q+1 ……… w m N1N1 NjNj

31 Inside Probabilities  j (p,q) : The probability of generating the words w p..w q starting with the non-terminal N j pq. w 1 ………w p-1 w p …w q w q+1 ……… w m   N1N1 NjNj

32 Outside & Inside Probabilities: example The gunman sprayed the building with bullets 1 2 3 45 6 7 N1N1 NP

33 Calculating Inside probabilities  j (p,q) Base case: Base case is used for rules which derive the words or terminals directly E.g., Suppose N j = NN is being considered & NN  building is one of the rules with probability 0.5

34 Induction Step: Assuming Grammar in Chomsky Normal Form Induction step : wpwp NjNj NrNr NsNs wdwd w d+1 wqwq Consider different splits of the words - indicated by d E.g., the huge building Consider different non-terminals to be used in the rule: NP  DT NN, NP  DT NNS are available options Consider summation over all these. Split here for d=2 d=3

35 The Bottom-Up Approach The idea of induction Consider “the gunman” Base cases : Apply unary rules DT  the Prob = 1.0 NN  gunmanProb = 0.5 Induction : Prob that a NP covers these 2 words = P (NP  DT NN) * P (DT deriving the word “the”) * P (NN deriving the word “gunman”) = 0.5 * 1.0 * 0.5 = 0.25 The gunman NP 0.5 DT 1.0 NN 0.5

36 Parse Triangle A parse triangle is constructed for calculating  j (p,q) Probability of a sentence using  j (p,q):

37 Parse Triangle The (1) gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) 0 1 2 3 4 5 6 Fill diagonals with from to

38 Parse Triangle The (1) gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) 1 2 3 4 5 6 7 Calculate using induction formula

39 Example Parse t 1 S 1.0 NP 0.5 VP 0.6 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullet s with buildingth e Thegunman sprayed VP 0.4 Rule used here is VP  VP PP The gunman sprayed the building with bullets.

40 Another Parse t 2 S 1.0 NP 0.5 VP 0.4 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullet s with buildingth e Thegunmansprayed NP 0.2 Rule used here is VP  VBD NP The gunman sprayed the building with bullets.

41 Parse Triangle The (1)gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) 1 2 3 4 5 6 7

42 Different Parses Consider Different splitting points : E.g., 5th and 3 rd position Using different rules for VP expansion : E.g., VP  VP PP, VP  VBD NP Different parses for the VP “sprayed the building with bullets” can be constructed this way.

43 The Viterbi-like Algorithm for PCFGs Very similar to calculation of inside probabilities  i (p,q) Instead of summing over all ways of constructing the parse for w pq –Choose only the best way (the maximum probability one!)

44 Calculation of  i (p,q) This rule is chosen VP 0.4 PP 1.0 VP 0.4 VBD 1.0 NP 0.2 0.6 * 1.0 * 0.3 = 0.18 0.4 * 1.0 * 0.015 = 0.06

45 Viterbi-like Algorithm Base case: Induction :  i (p,q) stores RHS of the rule selected Position of splitting Example :  VP (3,7) stores VP, PP and split position = 5 because VP  VP PP is the rule used. Backtracing : Start from  1 (1,7) and  1 (1,7) and backtrace.

46 Example  1 (1,7) records S  NP VP & split position as 2  NP (1,2) records NP  DT NN & split position as 1  VP (3,7) records VP  VP PP & split position as 5 S NPVP The gunman sprayed the building with bullets 1 2 3 45 6 7

47 Example S NPVP The gunman sprayed the building with bullets 1 2 3 4 5 6 7 DTNN VP PP

48 Grammar Induction Annotated corpora like Penn Treebank Counts used as follows: Sample training data: NP The boy DT NN NP Those cars DT NNS NP Bears NNS NP That book DT NN NP She PRP

49 Grammar Induction for Unannotated Corpora: EM algorithm Start with initial estimates for rule probabilities Compute probability of each parse of a sentence according to current estimates of rule probabilities Compute expectation of how often a rule is used (summing probabilities of rules used in previous step) Refine rule probabilities so that training corpus likelihood increases EXPECTATION PHASE MAXIMIZATION PHASE

50 Outside Probabilities  j (p,q) Base case: Inductive step for calculating : wpwp N f pe N j pq N g (q+1) e wqwq w q+1 wewe w p-1 w1w1 wmwm w e+1 N1N1 Summation over f, g & e

51 Probability of a Sentence Joint probability of a sentence w 1m and that there is a constituent spanning words w p to w q is given as: The gunman sprayed the building with bullets 1 2 3 4 5 6 7 N1N1 NP


Download ppt "CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE."

Similar presentations


Ads by Google