CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities.

CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities

Example PCFG Rules & Probabilities S  NP VP1.0 NP  DT NN0.5 NP  NNS0.3 NP  NP PP 0.2 PP  P NP1.0 VP  VP PP 0.6 VP  VBD NP0.4 DT  the1.0 NN  gunman0.5 NN  building0.5 VBD  sprayed 1.0 NNS  bullets1.0

Example Parse t 1` The gunman sprayed the building with bullets. S 1.0 NP 0.5 VP 0.6 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunman sprayed P (t 1 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.00225 VP 0.4

Another Parse t 2 S 1.0 NP 0.5 VP 0.4 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunmansprayed NP 0.2 P (t 2 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.0015 The gunman sprayed the building with bullets.

HMM ↔ PCFG O observed sequence ↔ w 1m sentence X state sequence ↔ t parse tree  model ↔ G grammar Three fundamental questions

HMM ↔ PCFG How likely is a certain observation given the model? ↔ How likely is a sentence given the grammar? How to choose a state sequence which best explains the observations? ↔ How to choose a parse which best supports the sentence? ↔ ↔

HMM ↔ PCFG How to choose the model parameters that best explain the observed data? ↔ How to choose rule probabilities which maximize the probabilities of the observed sentences? ↔

Interesting Probabilities The gunman sprayed the building with bullets 1 2 3 45 6 7 N1N1 NP What is the probability of having a NP at this position such that it will derive “the building” ? - What is the probability of starting from N 1 and deriving “The gunman sprayed”, a NP and “with bullets” ? - Inside Probabilities Outside Probabilities

Interesting Probabilities Random variables to be considered –The non-terminal being expanded. E.g., NP –The word-span covered by the non-terminal. E.g., (4,5) refers to words “the building” While calculating probabilities, consider: –The rule to be used for expansion : E.g., NP  DT NN –The probabilities associated with the RHS non-terminals : E.g., DT subtree’s inside/outside probabilities & NN subtree’s inside/outside probabilities

Outside Probabilities Forward ↔ Outside probabilities  j (p,q) : The probability of beginning with N 1 & generating the non-terminal N j pq and all words outside w p..w q Forward probability : Outside probability : w 1 ………w p-1 w p …w q w q+1 ……… w m  N1N1 NjNj

Inside Probabilities Backward ↔ Inside probabilities  j (p,q) : The probability of generating the words w p..w q starting with the non-terminal N j pq. Backward probability : Inside probability : w 1 ………w p-1 w p …w q w q+1 ……… w m   N1N1 NjNj

Outside & Inside Probabilities The gunman sprayed the building with bullets 1 2 3 45 6 7 N1N1 NP

Inside probabilities  j (p,q) Base case: Base case is used for rules which derive the words or terminals directly E.g., Suppose N j = NN is being considered & NN  building is one of the rules with probability 0.5

Induction Step Induction step : wpwp NjNj NrNr NsNs wdwd w d+1 wqwq Consider different splits of the words - indicated by d E.g., the huge building Consider different non-terminals to be used in the rule: NP  DT NN, NP  DT NNS are available options Consider summation over all these. Split here for d=2 d=3

The Bottom-Up Approach The idea of induction Consider “the gunman” Base cases : Apply unary rules DT  the Prob = 1.0 NN  gunmanProb = 0.5 Induction : Prob that a NP covers these 2 words = P (NP  DT NN) * P (DT deriving the word “the”) * P (NN deriving the word “gunman”) = 0.5 * 1.0 * 0.5 = 0.25 The gunman NP 0.5 DT 1.0 NN 0.5

Parse Triangle A parse triangle is constructed for calculating  j (p,q) Probability of a sentence using  j (p,q):

Parse Triangle The (1) gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) 1 2 3 4 5 6 7 Fill diagonals with

Parse Triangle The (1) gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) 1 2 3 4 5 6 7 Calculate using induction formula

Example Parse t 1` S 1.0 NP 0.5 VP 0.6 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunman sprayed VP 0.4 Rule used here is VP  VP PP The gunman sprayed the building with bullets.

Another Parse t 2 S 1.0 NP 0.5 VP 0.4 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunmansprayed NP 0.2 Rule used here is VP  VBD NP The gunman sprayed the building with bullets.

Parse Triangle The (1)gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) 1 2 3 4 5 6 7

Different Parses Consider –Different splitting points : E.g., 5th and 3 rd position –Using different rules for VP expansion : E.g., VP  VP PP, VP  VBD NP Different parses for the VP “sprayed the building with bullets” can be constructed this way.

Outside Probabilities  j (p,q) Base case: Inductive step for calculating : wpwp N f pe N j pq N g (q+1)e wqwq w q+1 wewe w p-1 w1w1 wmwm w e+1 N1N1 Summation over f, g & e

Probability of a Sentence Joint probability of a sentence w 1m and that there is a constituent spanning words w p to w q is given as: The gunman sprayed the building with bullets 1 2 3 45 6 7 N1N1 NP

CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities.

Similar presentations

Presentation on theme: "CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities.

Similar presentations

Presentation on theme: "CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities."— Presentation transcript:

Similar presentations

About project

Feedback