Presentation is loading. Please wait.

Presentation is loading. Please wait.

Constraining Chart Parsing with Partial Tree Bracketing

Similar presentations


Presentation on theme: "Constraining Chart Parsing with Partial Tree Bracketing"— Presentation transcript:

1 Constraining Chart Parsing with Partial Tree Bracketing
Seth Kulick (with Dan Bikel and Tony Kroch) Clunch, 12/12/05

2 Outline Motivation Key Aspects of Implementation
Interaction with Parser Experiments with Penn Treebank Experiments with PPCEME (Penn-Helsinki Parsed Corpus of Early Modern English) 12/12/05 Seth Kulick

3 Motivation Hard constraints on parser input – prebracketing S, NP, etc. Becomes useful when such phrases can be separately marked with a high degree of accuracy Treebank construction – less-trained treebankers can do just S, or NP, etc. Extra annotation steps will pay off if resulting parses have a substantial increase in accuracy 12/12/05 Seth Kulick

4 Partial Tree Specification
Prebracketing: (S (NP (NNP John)) (VBD ate) (NP (DT the) (NN ice) (NN cream))) Possible tree: (S (NP (NNP John)) (VP (VBD ate) (NP (DT the) (NN ice) (NN cream))) 12/12/05 Seth Kulick

5 Partial Tree Specification
Prebracketing: (S (NP (NNP John)) (VBD ate) (NP (DT the) (NN ice) (NN cream))) Another possible tree: (S (NP (NNP John)) (VP (VBD ate) (NP (NP (DT the) (NN ice))) (NP (NN cream))))) Easy to imagine various monstrosities 12/12/05 Seth Kulick

6 Negative Contraints Prebracketing should be “gold”, with parses having 100% P/R for those constituents. Allow “negative constraints” – implicit constraint on every chart item that it cannot be a spurious instance of a constituent that is prebracketed. 12/12/05 Seth Kulick

7 Negative Contraints (S (NP (NNP John)) (VBD ate) (NP (DT the) (NN ice) (NN cream))) No NP brackets in parse except with span(0,0) and (2,4), and those must exist. Precision, as well as recall, must be 100% 12/12/05 Seth Kulick

8 Wildcard Constraints Specify that some bracketing exists, regardless of label. Currently only used for root to keep it as a tree E.g., if only doing NP prebracketing: (* (NP (NNP John)) (VBD ate) (NP (DT the) (NN ice) (NN cream))) wildcard means “can be anything except one of the negative constraints” – e.g., root can’t be NP here. 12/12/05 Seth Kulick

9 Notational Note May also be written:
(S (NP (NNP John)) (VBD ate) (NP (DT the) (NN ice) (NN cream))) May also be written: (S-0-4 (NP-0-0 (NNP-0-0 John)) (VBD-1-1 ate) (NP-2-4 (DT-2-2 the) (NN-3-3 ice) (NN-4-4 cream))) Or sometimes dropping the terminals: (S-0-4 NP-0-0 NP-2-4) 12/12/05 Seth Kulick

10 Not just constraints on items
Crucial to ensure that when an item matches a constraint, all the child constraints in the constraint tree also match some item in the tree. “and prays delygently for your good speed in your matters” (*-0-9 (NP-4-9 (NP-4-6 NP-8-9))) NP(4,9) “your good speed in your matters” NP(4,6) “your good speed” NP(8,9) “your matters” 12/12/05 Seth Kulick

11 Not just constraints on items
(*-0-9 (NP-4-9 (NP-4-6 NP-8-9))) (IP and prays (ADVP delygently) (PP for (NP-A (NPB your good speed))) (PP in (NP-A (NPB your matters)))) Matches *-0-9, NP-4-6,NP-8-9, but not NP-4-9 While *-0-9 is satisfied, its child constraint NP-4-9 is not 12/12/05 Seth Kulick

12 Killing Items ASAP Can’t just check constraints when stop probabilities are added Two partial constituents can be “equivalent” yet different in terms of constraints. 12/12/05 Seth Kulick

13 Killing Items ASAP (IP prays (ADVP delygently) (PP for (NP-A (NPB your good speed))) (PP in (NP-A (NPB your matters)))) (IP prays (ADVP delygently) (PP for (NP-A (NPB your good speed))) (PP in (NP-A (NPB your matters)))) ) No need to wait for the left modifier “and” to be added to kill the first one. Otherwise bad one could kill off “equivalent” good one. 12/12/05 Seth Kulick

14 Outline Motivation Key Aspects of Implementation
Interaction with Parser Experiments with Penn Treebank Experiments with PPCEME (Penn-Helsinki Parsed Corpus of Early Modern English) 12/12/05 Seth Kulick

15 Collins/Bikel recap Underlying lexicalized PCFG has rules of form
Generate each child independently: given P has been generated, generate H, then generate modifying nonterminals from head-adjacent outward Definition is top-down, derivation is bottom-up Seed Chart, addUnaries, joinItems, addStopProbs 12/12/05 Seth Kulick

16 Seeding the Chart Constraint associated with each chart item
(*-0-4 (NP-0-0 (NNP-0-0 John)) (VBD-1-1 ate) (NP-2-4 (DT-2-2 the) (NN-3-3 ice) (NN-4-4 cream))) Constraint associated with each chart item Items seeded with appropriate terminal constraint (NNP-0-0, etc.) 12/12/05 Seth Kulick

17 addUnaries – assigning constraints
(*-0-4 (NP-0-0 (NNP-0-0 John)) (VBD-1-1 ate) (NP-2-4 (DT-2-2 the) (NN-3-3 ice) (NN-4-4 cream))) P(th,wh) If H matches c1 (label and span), then c1’s parent constraint. Otherwise c1 H(th,wh) Constraint c1 12/12/05 Seth Kulick

18 addUnaries – assigning constraints
(*-0-4 (NP-0-0 (NNP-0-0 John)) (VBD-1-1 ate) (NP-2-4 (DT-2-2 the) (NN-3-3 ice) (NN-4-4 cream))) (VBD ate)) VBD-1-1 (VP ate the ice cream) *-0-4 (S John..cream) (NP (NN cream)) NP-2-4 (NN cream) NN-4-4 12/12/05 Seth Kulick

19 addUnaries – killing items
Kill new item if it doesn’t match its constraint label and it matches one of the negative constraints. E.g. IP and NP are being prebracketed, and both are negative constraints. New item is NP(3,6) with constraint IP-0-10. This is not okay, since if the item was good there would have been a prebracketing for NP(3,6) (NP (NP 4..5) ) (IP-0-10 NP NP-8-10) 12/12/05 Seth Kulick

20 addUnaries – killing items
Kill new item if its constraint is the wildcard and the item matches one of the negative constraints. E.g., item is NP(4,5), NPs are being prebracketed and so NP is negative constraint. Item’s constraint is *-0-10 This is not okay, since * must be something besides NP 12/12/05 Seth Kulick

21 joinItems – assigning contraints
When joining two items, modificand’s constraint is associated with the new item P(th,wh) Contraint c2 P(th,wh) Contraint c2 H(th,wh) H(th,wh) R 12/12/05 Seth Kulick

22 joinItems – assigning contraints
(*-0-4 (NP-0-0 (NNP-0-0 John)) (VBD-1-1 ate) (NP-2-4 (DT-2-2 the) (NN-3-3 ice) (NN-4-4 cream))) (NN cream) NN-4-4 (NP (NN cream) NP-2-4 (NN ice) NN-3-3 (NN cream) NN-4-4 (NP (NN cream) NP-2-4 (NN ice) NN-3-3 12/12/05 Seth Kulick

23 joinItems – killing items
Kill new item if child intersects with constraint span (IP (CONJ and) (VBD left) (D this) (N letter) (IP (TO to) (BE be) (VAN sent)… Modificand: (IP (TO to)) constraint:IP-4-9 (Left) Modifier: (NPB (N letter)) 12/12/05 Seth Kulick

24 joinItems – killing items
Kill new item if there is any constraint within the span of the item that does not match some descendent of the item (*-0-9 (NP-4-9 (NP-4-6 NP-8-9))) (IP prays (ADVP delygently-2) (PP for-3 (NP-A (NPB your-4 good-5 speed-6))) (PP in-7 (NP-A (NPB your-8 matters-9)))) PP(7,9) just joined to IP (1,6), forming IP(1,9) 12/12/05 Seth Kulick

25 joinItems – killing items
(*-0-9 (NP-4-9 (NP-4-6 NP-8-9))) (IP prays (ADVP delygently-2) (PP for-3 (NP-A (NPB your-4 good-5 speed-6))) (PP in-7 (NP-A (NPB your-8 matters-9)))) check constraints in (1,9): NP-4-9, NP-4,6, NP NP-4-9 not satisfied – kill item (never enters chart) Some smarts – actually will only check NP-4-9 (doesn’t go down constraint tree branches) Could be smarter and kill this earlier when PP(3,6) was added 12/12/05 Seth Kulick

26 addStopProbs – killing items
Kill new item if Item span != constraint span && constraint label matches item label && constraint is not wildcard && negative(itemLabel) E.g. NP is being prebracketed, and so NP is a negative constraint. New item is NP(4,6) and constraint is NP-3-6. This is not okay, since if the item was good there would have been a prebracketing for NP(4,6). Can’t kill off this case earlier since item could grow to match the constraint’s span. (addUnaries case was when labels did not match – then it could be killed off earlier.) 12/12/05 Seth Kulick

27 Internal Transformations – Yuch!
Various internal transformations used by the parser But the constraints are over untransformed partial trees Some simple cases: node is SG, constraint is S, consider them the same. A base NP can be generated, with label NPB. Similarly, this could also be considered to match a NP constraint. But at the end of the parse, NPBs that are a unary child of a NP are deleted. So a NP constraint could refer to *two* nodes in the parse, since they are later collapsed together. 12/12/05 Seth Kulick

28 Dealing with BaseNPs This is really bad, since the higher NP will now match a negative constraint, and be killed NP NPB 2..4 NP-0-5 NP-2-4 12/12/05 Seth Kulick

29 Dealing with Base NPs Modify definition of matching a negative constraint Matches one of the negative constraints, and it’s not this case: NP NPB is only child and it matches its constraint. NPB 12/12/05 Seth Kulick

30 Dealing with Base NPs Such cases will no longer be killed in addUnaries. But if more items are joined, the NPB will not be killed, and higher NP *should* be killed off. NP NPB 2..4 NP-0-5 NP-2-4 So for the special case of NP, we keep checking at joinItems if it matches a negative constraint. 12/12/05 Seth Kulick

31 Outline Motivation Key Aspects of Implementation
Interaction with Parser Experiments with Penn Treebank Experiments with PPCEME (Penn-Helsinki Parsed Corpus of Early Modern English) 12/12/05 Seth Kulick

32 Experiments with PTB What is the effect of forcing certain constituents to be gold? Are there sentences that just cannot be parsed with such constraints? Trained on 02-21, tested on 23, with gold NP, S, or both. Ran in regular way, with expanded beam, with “relaxation”, or both. “relaxation” – zero probabilities changed to low probability 12/12/05 Seth Kulick

33 Experiments with PTB 23 # Nulls (2416 sentences)
Usual Wide Beam Relax Wide & Relax None S 35 2 25 2 (0.1%) NP 143 12 115 4 (0.2%) NP&S 211 36 167 27 (1.1%) 12/12/05 Seth Kulick

34 Prebracketing Penn Treebank none S NP NP&S 88.53/ 88.63 92.27/ 92.09
95.21/ 94.45 97.17/ 96.48 91.03/ 89.90 92.02/ 90.60 99.52/ 98.84 99.53/ 99.11 90.51/ 90.35 96.57/ 96.42 93.77/ 92.56 97.22/ 96.43 89.42/ 89.52 100.00/ 99.79 92.96/ 90.57 100.00/ 99.78 85.41/ 84.58 86.88/ 86.11 97.72/ 97.06 98.14/ 96.74 84.08/ 87.29 94.05/ 97.24 89.46/ 89.71 94.13/ 95.75 all Constituent score NP VP S PP SBAR 12/12/05 Seth Kulick

35 Experiments with PPCEME
1.8 Million words, Experiments using 500K words over New treebanks using similar annotation style. Same questions as with PTB… …but main concern is really to get cheaper “level one” treebankers to do IP and NP levels (also working with entire 1.8 million words) 12/12/05 Seth Kulick

36 Experiments with PPCEME # Nulls (2860 sentences)
Usual Wide Beam Relax Wide & Relax None 46 1.6% 17 0.6% IP 26 0.9% NP 32 1.1% NP&IP 74 2.6% 12/12/05 Seth Kulick

37 Prebracketing PPCEME none IP NP NP&IP 77.22/ 76.93 88.36/ 86.64
89.73/ 88.48 96.14/ 93.63 83.50/ 81.75 86.51/ 83.32 99.70/ 98.08 99.69/ 98.26 72.23/ 72.19 100.00/ 99.97 79.94/ 78.64 78.20/ 78.14 85.63/ 85.25 93.61/ 93.25 96.54/ 95.00 61.67/ 63.42 91.73/ 90.09 75.54/ 73.76 92.83/ (??) all Constituent score NP IP PP CP 12/12/05 Seth Kulick

38 Conclusion and Future Stuff
Prebracketing Conjuncts Using some high-accuracy method for getting S or NPs Propaganda point: greater success for empty category recovery Continue to use it for ongoing treebank construction, and experiment with what are the best constituents to prebracket 12/12/05 Seth Kulick

39 Key Aspects of Implementation
Partial Tree Specification Negative Constraints Wildcard Killing constraint-violating parses early Not just constraints on cells Conflict between constraints and internal tree transformations 12/12/05 Seth Kulick

40 Overview of Collins’ Model (from Dan’s defense slides)
P(th,wh) H(th,wh) Li generated conditioning on Li Li–1 L1 subcat {subcatL} The model generates L-sub-i conditioning on the following items, indicated in red circles. 12/12/05 Seth Kulick

41 Experiments with PPCEME # Nulls (2860 sentences)
Usual Wide Beam Relax Wide & Relax None 46(78) 1.6% 17(83) 0.6% IP 26(80) 0.9% NP 32(79) 1.1% NP&IP 74(81) 2.6% 12/12/05 Seth Kulick

42 Experiments with PTB 23 # Nulls (2416 sentences)
Usual Wide Beam Relax Wide & Relax None 0(39) 0 (46) S 35(40) 2(52) 25(48) 2 (44) (0.1%) NP 143(41) 12(53) 115(49) 4 (45) (0.2%) NP&S 211(42) 36(51) 167(47) 27 (43) (1.1%) 12/12/05 Seth Kulick


Download ppt "Constraining Chart Parsing with Partial Tree Bracketing"

Similar presentations


Ads by Google