Download presentation
Presentation is loading. Please wait.
Published byMerryl Casey Modified over 8 years ago
1
Probabilistic CKY Roger Levy [thanks to Jason Eisner]
2
Managing Ambiguity John saw Mary Typhoid Mary Phillips screwdriver Mary note how rare rules interact I see a bird is this 4 nouns – parsed like “city park scavenger bird”? rare parts of speech, plus systematic ambiguity in noun sequences Time flies like an arrow Fruit flies like a banana Time reactions like this one Time reactions like a chemist or is it just an NP?
3
Our bane: Ambiguity John saw Mary Typhoid Mary Phillips screwdriver Mary note how rare rules interact I see a bird is this 4 nouns – parsed like “city park scavenger bird”? rare parts of speech, plus systematic ambiguity in noun sequences Time | flies like an arrow NP VP Fruit flies | like a banana NP VP Time | reactions like this one V[stem] NP Time reactions | like a chemist S PP or is it just an NP?
4
How to solve this combinatorial explosion of ambiguity? 1.First try parsing without any weird rules, throwing them in only if needed. 2.Better: every rule has a weight. A tree’s weight is total weight of all its rules. Pick the overall lightest parse of sentence. 3.Can we pick the weights automatically? We’ll get to this later …
5
The plan for the rest of parsing There’s probability (inference) and then there’s statistics (model estimation) These two problems are logically separate, yet interrelated in practice We’ll start with the problem of efficient inference (probabilistic bottom-up parsing) Then we’ll move to the estimation of good (generative) models that permit efficient bottom-up parsing Finally, we’ll mention state-of-the-art work that doesn’t permit exact inference (“reranking” approaches)
6
The critical recursion We’ll do bottom-up parsing (weighted CKY) When combining constituents into a larger constituent: The weight of the new constituent is the sum of the weights of the combined subconstituents… …plus the weight of the rule used to combine the subconstituents
7
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
8
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
9
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
10
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
11
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
12
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
13
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
14
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
15
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 PP12 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
16
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
17
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
18
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 NP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
19
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 NP18 S21 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
20
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
21
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
22
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
23
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
24
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
25
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
26
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
27
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
28
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
29
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP
30
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP S Follow backpointers …
31
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP S NP VP
32
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP S NP VP PP
33
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP S NP VP PP PNP
34
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP S NP VP PP PNP DetN
35
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP Which entries do we need?
36
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP Which entries do we need?
37
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP Not worth keeping …
38
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP … since it just breeds worse options
39
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP Keep only best-in-class! “inferior stock”
40
time 1 flies 2 like 3 an 4 arrow 5 NP3 Vst3 NP10 S8 NP24 S22 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP Keep only best-in-class! (and backpointers so you can recover parse)
41
Probabilistic Trees Instead of lightest weight tree, take highest probability tree Given any tree, your assignment 1 generator would have some probability of producing it! Just like using n-grams to choose among strings … What is the probability of this tree? S NP time VP flies PP P like NP Det an N arrow
42
Probabilistic Trees Instead of lightest weight tree, take highest probability tree Given any tree, your assignment 1 generator would have some probability of producing it! Just like using n-grams to choose among strings … What is the probability of this tree? You rolled a lot of independent dice … S NP time VP flies PP P like NP Det an N arrow p( | S )
43
Chain rule: One word at a time p(time flies like an arrow) =p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an)
44
Chain rule + backoff (to get trigram model) p(time flies like an arrow) =p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an)
45
Chain rule – written differently p(time flies like an arrow) =p(time) * p(time flies | time) * p(time flies like | time flies) * p(time flies like an | time flies like) * p(time flies like an arrow | time flies like an ) Proof: p(x,y | x) = p(x | x) * p(y | x, x) = 1 * p(y | x)
46
Chain rule + backoff p(time flies like an arrow) =p(time) * p(time flies | time) * p(time flies like | time flies) * p(time flies like an | time flies like) * p(time flies like an arrow | time flies like an ) Proof: p(x,y | x) = p(x | x) * p(y | x, x) = 1 * p(y | x)
47
Chain rule: One node at a time S NP time VP flies PP P like NP Det an N arrow p( | S ) = p( S NP VP | S ) * p( S NP time VP | S NP VP ) * p( S NP time VP PP | S NP time VP ) * p( S NP time VP flies PP | S NP time VP ) * … VP PP
48
Chain rule + backoff S NP time VP flies PP P like NP Det an N arrow p( | S ) = p( S NP VP | S ) * p( S NP time VP | S NP VP ) * p( S NP time VP PP | S NP time VP ) * p( S NP time VP flies PP | S NP time VP ) * … VP PP
49
Simplified notation S NP time VP flies PP P like NP Det an N arrow p( | S ) = p( S NP VP | S ) * p( NP flies | NP ) * p( VP VP NP | VP ) * p( VP flies | VP ) * …
50
Already have a CKY alg for weights … S NP time VP flies PP P like NP Det an N arrow w( | S ) = w( S NP VP ) + w( NP flies | NP ) + w( VP VP NP ) + w( VP flies ) + … Just let w( X Y Z ) = -log p( X Y Z | X ) Then lightest tree has highest prob
51
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP multiply to get 2 -22 2 -8 2 -12 2 -2
52
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP multiply to get 2 -22 2 -8 2 -12 2 -2 2 -13 Need only best-in-class to get best parse
53
Why probabilities not weights? We just saw probabilities are really just a special case of weights … … but we can estimate them from training data by counting and smoothing! Warning: What kind of training data do we need for this type of estimation?
54
A slightly different task One task: What is probability of generating a given tree with a PCFG generator? To pick tree with highest prob: useful in parsing. But could also ask: What is probability of generating a given string with the generator? To pick string with highest prob: useful in speech recognition, as substitute for an n-gram model. (“Put the file in the folder” vs. “Put the file and the folder”) To get prob of generating string, must add up probabilities of all trees for the string …
55
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP Could just add up the parse probabilities 2 -22 2 -27 2 -22 2 -27 oops, back to finding exponentially many parses
56
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S S 2 -13 NP24 S22 S27 NP24 S27 S 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP 2 -12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 -2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP Any more efficient way? 2 -8 2 -22 2 -27
57
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S NP24 S22 S27 NP24 S27 S 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP 2 -12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 -2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP Add as we go … (the “inside algorithm”) 2 -8 +2 -13 2 -22 +2 -27
58
time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S NP S 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP 2 -12 VP16 3 Det1NP10 4 N8N8 1 S NP VP 6 S Vst NP 2 -2 S S PP 1 VP V NP 2 VP VP PP 1 NP Det N 2 NP NP PP 3 NP NP NP 0 PP P NP Add as we go … (the “inside algorithm”) 2 -8 +2 -13 +2 -22 +2 -27 2 -22 +2 -27 2 -22 +2 -27
59
PCFGs and HMMs There is a natural connection between: the inside algorithm for filling out a probabilistic CKY chart and the forward algorithm for filling out an HMM trellis Do you see the relationship? However, bottom-up for PCFGs and left-to-right for HMMs are not the only ways to go for DPs You can go outside-in for PCFGs We’ll see left-to-right (the Earley algorithm) for PCFGs And you can go right-to-left for HMMs
60
Charts and lattices You can equivalently represent a parse chart as a lattice constructed over some initial arcs This will also set the stage for Earley parsing later saltfliesscratch NP N NP S S VP V N NP S VP NP N NP V VP salt N NP V VP fliesscratch N NP V VPN NP NP S NP VP S S S NP VP VP V NP VP V NP N NP N N
61
(Speech) Lattices There was nothing magical about words spanning exactly one position. When working with speech, we generally don’t know how many words there are, or where they break. We can represent the possibilities as a lattice and parse these just as easily. I awe of van eyes saw a ‘ve an Ivan
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.