CS 224n / Lx 237 section Tuesday, May

CS 224n / Lx 237 section Tuesday, May 4 2004
PCFGs CS 224n / Lx 237 section Tuesday, May 4 2004

Inside Algorithm We’re calculating the total probability of generating words wp … wq given that one is starting with the nonterminal Nj Nj Nr Ns wp wd wd+1 wq

Inside Algorithm - Base
Base case, for rules of the form Nj  wk : βj(k,k) = P(wk|Njkk,G) = P(Ni  wk|G) This deals with the lexical rules

Inside Algorithm - Inductive
Inductive case, for rules of the form : Nj  Nr Ns βj(p,q) = P(wpq|Njpq,G) = Σr,sΣq-1d=p P(Nrpd,Ns(d+1)q|Njpq,G) * P(wpd|Nrpd,G) * P(w(d+1)q|Ns(d+1)q,G) = Σr,sΣd P(Nj  Nr Ns) βr(p,d) βs((d+1),q)

Inside Algorithm - Inductive
Inductive case, for rules of the form : Nj  Nr Ns βj(p,q) = P(wpq|Njpq,G) = Σr,sΣq-1d=p P(Nrpd,Ns(d+1)q|Njpq,G) * P(wpd|Nrpd,G) * P(w(d+1)q|Ns(d+1)q,G) = Σr,sΣd P(Nj  Nr Ns) βr(p,d) βs((d+1),q) Nj Nr Ns wp wd wd+1 wq

Calculating inside probabilities with CKY the base case
NP  astronomers NP  saw V  saw NP  stars P  with NP  ears 1 2 3 4 5 βNP = 0.1 βNP = 0.04 βV = 1.0 βNP = 0.18 βP = 1.0 astronomers saw stars with ears

Calculating inside probabilities with CKY inductive case
VP  V NP βNP βV βVP = P(VP  V NP) * βV * βNP βVP = 0.7 * 1.0 * 0.18 βVP = 1 2 3 4 5 βNP = 0.1 βNP = 0.04 βV = 1.0 βVP = 0.126 βNP = 0.18 βP = 1.0 astronomers saw stars with ears

Calculating inside probabilities with CKY inductive case
PP  P NP βP βNP βPP = P(PP  P NP) * βV * βNP βPP = 1.0 * 1.0 * 0.18 βPP = 0.18 1 2 3 4 5 βNP = 0.1 βNP = 0.04 βV = 1.0 βVP = 0.126 βNP = 0.18 βP = 1.0 βPP = 0.18 astronomers saw stars with ears

Calculating inside probabilities with CKY
βVP = P(VP  V NP) * βV * βNP P(VP  VP PP) * βVP * βPP = * 1.0 * * * 0.18 = = 1 2 3 4 5 βNP = 0.1 βS = βS = βNP = 0.04 βV = 1.0 βVP = 0.126 βVP = βNP = 0.18 βNP = βP = 1.0 βPP = 0.18 astronomers saw stars with ears

Outside algorithm Outside algorithm reflects top-down processing (whereas the inside algorithm reflects bottom-up processing) With the outside algorithm we’re calculating the total probability of beginning with a symbol Nj and generating the nonterminal Njpq and all words outside wp … wp

Outside Algorithm N11m Nfpe Njpq Ng(q+1)e
w1 wp wp wq wq we we wm

Outside Algorithm Base case, for the start symbol: αj(1,m) = 1 j = 1
0 otherwise Inductive case (either left or right branch): αj(p,q) = Σf,gΣme=q+1 P(w1(p-1), w(q+1)m,Nfpe,Njpq,Ng(q+1)e Σf,gΣp-1e=1 P(w1(p-1) ,w(q+1)m,Nfeq,Nge(p-1),Njpq . = Σf,gΣme=q+1 αf(p,e) P(Nf  Nj Ng) βg(q+1,e) + Σf,gΣp-1e=1 αf(e,q) P(Nf  Ng Nj) βg(e, q-1)

Outside Algorithm – left branching
N11m Nfpe Njpq Ng(q+1)e w1 wp wp wq wq we we wm

Outside Algorithm – right branching
N11m Nfeq Nge(p-1) Njpq w1 we we wp wp wq wq wm

Outside Algorithm Inductive case (either left or right branch):
αj(p,q) = Σf,gΣme=q+1 P(w1(p-1), w(q+1)m,Nfpe,Njpq,Ng(q+1)e Σf,gΣp-1e=1 P(w1(p-1) ,w(q+1)m,Nfeq,Nge(p-1),Njpq . = Σf,gΣme=q+1 αf(p,e) P(Nf  Nj Ng) βg(q+1,e) Σf,gΣp-1e=1 αf(e,q) P(Nf  Ng Nj) βg(e, q-1)

Overall probability of a node
Similar to HMMs (with forward/backward algorithms), the overall probability of the node is formed by taking the product of the inside and outside probabilities αj(p,q)βj(p,q) = P(w1(p-1), Njpq,w(q+1)m |G)P(wpq |Njpq ,G) = P (w1m ,Njpq |G) Therefore P (w1m ,Npq |G) = Σj αj(p,q)βj(p,q) In the case of the root node and terminals, we know there will be some such constituent

Viterbi Algorithm and PCFGs
This is like the inside algorithm but we find the maximum instead of the sum and then record it δi(p,p) = highest probability parse of a subtree Nipq Initialization: δi(p,p) = P(Ni  wp) Induction: δi(p,q) = max P(Ni  Nj Nk ) δj(p,r) δk(r+1,q) Store backtrace: Ψi(p,q) = argmax P(Ni  Nj Nk ) δj(p,r) δk(r+1,q) From start symbol N1, most likely parse t is: P(t) = δ1(1,m)

Calculating Viterbi with CKY Initialization
NP  astronomers NP  saw V  saw NP  stars P  with NP  ears 1 2 3 4 5 δNP = 0.1 δNP = 0.04 δV = 1.0 δNP = 0.18 δP = 1.0 astronomers saw stars with ears

Calculating Viterbi with CKY Induction
So far this is the same as calculating the inside probabilities 1 2 3 4 5 δNP = 0.1 δS = δNP = 0.04 δV = 1.0 δVP = 0.126 δNP = 0.18 δNP = δP = 1.0 δPP = 0.18 astronomers saw stars with ears

Calculating Viterbi with CKY Backpointers
δVP = max ( P(VP  V NP) * βV * βNP , P(VP  VP PP) * βVP * βPP ) = max ( , ) = 1 2 3 4 5 δNP = 0.1 δS = δS = δNP = 0.04 δV = 1.0 δVP = 0.126 δVP = δNP = 0.18 δNP = δP = 1.0 δPP = 0.18 astronomers saw stars with ears

Learning PCFGs Imagine we have a training corpus with that contains the treebank given below (1)S (2)S (3)S A A B B A A a a a a f g (4)S (5)S A A A A f a g f

Learning PCFGs Let’s say that (1) occurs 40 times, (2) occurs 10 times, (3) occurs 5 times, (4) occurs 5 times, and (5) occurs one time. We want to make a PCFG that reflects this grammar. What are the parameters that maximizes the joint likelihood of the data? Σj P(Ni ζj | Ni ) = 1

Learning PCFGs Rules S  A A : 40 + 5 + 5 + 1 = 51 S  B B : 10
A  f : = 11 A  g : = 6 B  a : 10

Learning PCFGs Parameters that maximize the joint likelihood: G
S  A A S  B B A  a A  f A  g B  a Count 51 10 85 11 6 Total 61 102 Probability 0.836 0.164 0.833 0.108 0.059 1.0

Learning PCFGs Given these parameters, what is the most likely parse of the string ‘a a’? (1)S (2)S A A B B a a a a P(1) = P(S  A A) * P(A  a) * P(A  a) = * * = 0.580 P(2) = P(S  B B) * P(B  a) * P(B  a) = * 1.0 * 1.0 = 0.164

CS 224n / Lx 237 section Tuesday, May

Similar presentations

Presentation on theme: "CS 224n / Lx 237 section Tuesday, May"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 224n / Lx 237 section Tuesday, May

Similar presentations

Presentation on theme: "CS 224n / Lx 237 section Tuesday, May"— Presentation transcript:

Similar presentations

About project

Feedback