Download presentation
Presentation is loading. Please wait.
1
1 Statistical methods in NLP Diana Trandabat 2015-2016
2
CKY Parsing Cocke-Kasami-Younger parsing algorithm: – (Relatively) efficient bottom-up parsing algorithm based on tabulating substring parses to avoid repeated work – Approach: Use a Chomsky Normal Form grammar Build an (n+1) x (n+1) matrix to store subtrees – Upper triangular portion Incrementally build parse spanning whole input string
3
Reminder – A CNF grammar is a Context-Free Grammar in which: Every rule LHS is a non-terminal Every rule RHS consists of either a single terminal or two non-terminals. Examples: – A BC – NP Nominal PP – A a – Noun man But not: – NP the Nominal – S VP
4
Reminder Any CFG can be re-written in CNF, without any loss of expressiveness. – That is, for any CFG, there is a corresponding CNF grammar which accepts exactly the same set of strings as the original CFG.
5
Dynamic Programming in CKY Key idea: – For a parse spanning substring [i,j], there exists some k such there are parses spanning [i,k] and [k,j] We can construct parses for whole sentence by building up from these stored partial parses So, – To have a rule A -> B C in [i,j], We must have B in [i,k] and C in [k,j], for some i<k<j – CNF grammar forces this for all j>i+1
6
CKY Given an input string S of length n, – Build table (n+1) x (n+1) – Indexes correspond to inter-word positions W.g., 0 Book 1 That 2 Flight 3 Cells [i,j] contain sets of non-terminals of ALL constituents spanning i,j – [j-1,j] contains pre-terminals – If [0,n] contains Start, the input is recognized
7
Recognising strings with CKY Example input: The flight includes a meal. The CKY algorithm proceeds by: 1.Splitting the input into words and indexing each position. (0) the (1) flight (2) includes (3) a (4) meal (5) 2.Setting up a table. For a sentence of length n, we need (n+1) rows and (n+1) columns. 3.Traversing the input sentence left-to-right 4.Use the table to store constituents and their span.
8
The table 12345 0 DetS 1 2 3 4 theflightincludesameal [0,1] for “the” Rule: Det the
9
The table 12345 0 DetS 1 N 2 3 4 theflightincludesameal [0,1] for “the”[1,2] for “flight” Rule1: Det the Rule 2: N flight
10
The table 12345 0 DetNPS 1 N 2 3 4 theflightincludesameal [0,1] for “the” [0,2] for “the flight” [1,2] for “flight” Rule1: Det the Rule 2: N flight Rule 3: NP Det N
11
A CNF CFG for CKY S NP VP NP Det N VP V NP V includes Det the Det a N meal N flight
12
CYK algorithm: two components Lexical step: for j from 1 to length(string) do: let w be the word in position j find all rules ending in w of the form X w put X in table[j-1,1] Syntactic step: for i = j-2 to 0 do: for k = i+1 to j-1 do: for each rule of the form A B C do: if B is in table[i,k] & C is in table[k,j] then add A to table[i,j]
13
CKY algorithm: two components We actually interleave the lexical and syntactic steps: for j from 1 to length(string) do: let w be the word in position j find all rules ending in w of the form X w put X in table[j-1,1] for i = j-2 to 0 do: for k = i+1 to j-1 do: for each rule of the form A B C do: if B is in table[i,k] & C is in table[k,j] then add A to table[i,j]
14
CKY: lexical step (j = 1) The flight includes a meal. Lexical lookup Matches Det the 12345 0 Det 1 2 3 4 5
15
CKY: lexical step (j = 2) The flight includes a meal. 12345 0 Det 1 N 2 3 4 5 Lexical lookup Matches N flight
16
CKY: syntactic step (j = 2) The flight includes a meal. 12345 0 DetNP 1 N 2 3 4 5 Syntactic lookup: look backwards and see if there is any rule that will cover what we’ve done so far.
17
CKY: lexical step (j = 3) The flight includes a meal. 12345 0 DetNP 1 N 2 V 3 4 5 Lexical lookup Matches V includes
18
CKY: lexical step (j = 3) The flight includes a meal. 12345 0 DetNP 1 N 2 V 3 4 5 Syntactic lookup There are no rules in our grammar that will cover Det, NP, V
19
CKY: lexical step (j = 4) The flight includes a meal. 12345 0 DetNP 1 N 2 V 3 Det 4 5 Lexical lookup Matches Det a
20
CKY: lexical step (j = 5) The flight includes a meal. 12345 0 DetNP 1 N 2 V 3 Det 4 N Lexical lookup Matches N meal
21
CKY: syntactic step (j = 5) The flight includes a meal. 12345 0 DetNP 1 N 2 V 3 DetNP 4 N Syntactic lookup We find that we have NP Det N
22
CKY: syntactic step (j = 5) The flight includes a meal. 12345 0 DetNP 1 N 2 VVP 3 DetNP 4 N Syntactic lookup We find that we have VP V NP
23
CKY: syntactic step (j = 5) The flight includes a meal. 12345 0 DetNPS 1 N 2 VVP 3 DetNP 4 N Syntactic lookup We find that we have S NP VP
24
From recognition to parsing The procedure so far will recognise a string as a legal sentence in English. But we’d like to get a parse tree back! Solution: – We can work our way back through the table and collect all the partial solutions into one parse tree. – Cells will need to be augmented with “backpointers”, i.e. With a pointer to the cells that the current cell covers.
25
From recognition to parsing 12345 0 DetNPS 1 N 2 VVP 3 DetNP 4 N
26
From recognition to parsing 12345 0 DetNPS 1 N 2 VVP 3 DetNP 4 N NB: This algorithm always fills the top “triangle” of the table!
27
What about ambiguity? The algorithm does not assume that there is only one parse tree for a sentence. – (Our simple grammar did not admit of any ambiguity, but this isn’t realistic of course). There is nothing to stop it returning several parse trees. If there are multiple local solutions, then more than one non- terminal will be stored in a cell of the table.
28
Exercise Apply the CKY algrithm to the fllowing sentence: Astronomers saw stars with ears. given the following grammar: S - > NP VP 1.0NP-> NP PP0.4 PP -> P NP 1.0NP-> astronomers0.2 VP -> V NP 0.7NP-> ears0.18 VP - > VP PP 0.3NP-> saw0.04 P -> with 1.0NP -> stars0.18 V-> saw 1.0
29
Exercise
30
Now run the CKY algorithm considering also the probabilities of the rules. The probability of a cell [i, j] is P(rule learning to the cell)*P(cell[I, j-1])*P(cell[j+1, i]
31
CKY Discussions Running time: where n is the length of the input string Inner loop grows as square of # of non-terminals Expressiveness: – As implemented, requires CNF Weakly equivalent to original grammar Doesn’t capture full original structure – Back-conversion? » Can do binarization, terminal conversion » Unit non-terminals require change in CKY
32
Parsing Efficiently With arbitrary grammars – Earley algorithm Top-down search Dynamic programming – Tabulated partial solutions Some bottom-up constraints
33
Interesting Probabilities The gunman sprayed the building with bullets 1 2 3 45 6 7 N1N1 NP What is the probability of having a NP at this position such that it will derive “the building” ? - What is the probability of starting from N 1 and deriving “The gunman sprayed”, a NP and “with bullets” ? - Inside Probabilities Outside Probabilities
34
Interesting Probabilities Random variables to be considered – The non-terminal being expanded. E.g., NP – The word-span covered by the non-terminal. E.g., (4,5) refers to words “the building” While calculating probabilities, consider: – The rule to be used for expansion : E.g., NP DT NN – The probabilities associated with the RHS non-terminals : E.g., DT subtree’s inside/outside probabilities & NN subtree’s inside/outside probabilities.
35
Outside Probabilities j (p,q) : The probability of beginning with N 1 & generating the non-terminal N j pq and all words outside w p..w q Outside probability : w 1 ………w p-1 w p …w q w q+1 ……… w m N1N1 NjNj
36
Inside Probabilities j (p,q) : The probability of generating the words w p..w q starting with the non-terminal N j pq. Inside probability : w 1 ………w p-1 w p …w q w q+1 ……… w m N1N1 NjNj
37
Outside & Inside Probabilities The gunman sprayed the building with bullets 1 2 3 45 6 7 N1N1 NP
38
Inside probabilities j (p,q) Base case: Base case is used for rules which derive the words or terminals directly E.g., Suppose N j = NN is being considered & NN building is one of the rules with probability 0.5
39
Induction Step Induction step : wpwp NjNj NrNr NsNs wdwd w d+1 wqwq Consider different splits of the words - indicated by d E.g., the huge building Consider different non-terminals to be used in the rule: NP DT NN, NP DT NNS are available options Consider summation over all these. Split here for d=2 d=3
40
The Bottom-Up Approach The idea of induction Consider “the gunman” Base cases : Apply unary rules DT the Prob = 1.0 NN gunmanProb = 0.5 Induction : Prob that a NP covers these 2 words = P (NP DT NN) * P (DT deriving the word “the”) * P (NN deriving the word “gunman”) = 0.5 * 1.0 * 0.5 = 0.25 The gunman NP 0.5 DT 1.0 NN 0.5
41
Parse Triangle A parse triangle is constructed for calculating j (p,q) Probability of a sentence using j (p,q):
42
Example PCFG Rules & Probabilities S NP VP1.0 NP DT NN0.5 NP NNS0.3 NP NP PP 0.2 PP P NP1.0 VP VP PP 0.6 VP VBD NP0.4 DT the1.0 NN gunman0.5 NN building0.5 VBD sprayed 1.0 NNS bullets1.0 P with1.0
43
Parse Triangle The (1) gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) 1 2 3 4 5 6 7 Fill diagonals with
44
Parse Triangle The (1) gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) 1 2 3 4 5 6 7 Calculate using induction formula
45
Example Parse t 1` S 1.0 NP 0.5 VP 0.6 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunman sprayed VP 0.4 Rule used here is VP VP PP The gunman sprayed the building with bullets.
46
Another Parse t 2 S 1.0 NP 0.5 VP 0.4 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunmansprayed NP 0.2 Rule used here is VP VBD NP The gunman sprayed the building with bullets.
47
Parse Triangle The (1)gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) 1 2 3 4 5 6 7
48
Different Parses Consider – Different splitting points : E.g., 5th and 3 rd position – Using different rules for VP expansion : E.g., VP VP PP, VP VBD NP Different parses for the VP “sprayed the building with bullets” can be constructed this way.
49
Outside Probabilities j (p,q) Base case: Inductive step for calculating : wpwp N f pe N j pq N g (q+1)e wqwq w q+1 wewe w p-1 w1w1 wmwm w e+1 N1N1 Summation over f, g & e
50
Probability of a Sentence Joint probability of a sentence w 1m and that there is a constituent spanning words w p to w q is given as: The gunman sprayed the building with bullets 1 2 3 45 6 7 N1N1 NP
51
Further readings Michael Collins - The Inside-Outside AlgorithmThe Inside-Outside Algorithm
52
Great! See you next time!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.