Constraining Chart Parsing with Partial Tree Bracketing

Slides:



Advertisements
Similar presentations
Albert Gatt Corpora and Statistical Methods Lecture 11.
Advertisements

Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Features and Unification
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
GALE Banks 11/9/06 1 Parsing Arabic: Key Aspects of Treebank Annotation Seth Kulick Ryan Gabbard Mitch Marcus.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
12/06/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Parsing Dr. Jan Hajič CS Dept., Johns Hopkins Univ.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
Albert Gatt Corpora and Statistical Methods Lecture 11.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Supertagging CMSC Natural Language Processing January 31, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
CIS Treebanks, Trees, Querying, QC, etc. Seth Kulick Linguistic Data Consortium University of Pennsylvania
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.
Chapter 12: Probabilistic Parsing and Treebanks Heshaam Faili University of Tehran.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Natural Language Processing Vasile Rus
Natural Language Processing Vasile Rus
CSC 594 Topics in AI – Natural Language Processing
Treebanks, Trees, Querying, QC, etc.
Statistical NLP Winter 2009
CSC 594 Topics in AI – Natural Language Processing
An Introduction to the Government and Binding Theory
Red-Black Trees Motivations
Construct State Modification in the Arabic Treebank
Presentation by Julie Betlach 7/02/2009
Structural relations Carnie 2013, chapter 4 Kofi K. Saah.
Probabilistic and Lexicalized Parsing
Compiler Design 4. Language Grammars
(Slides copied liberally from Ruth Anderson, Hal Perkins and others)
Training Tree Transducers
LING/C SC 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
Probabilistic and Lexicalized Parsing
LING 581: Advanced Computational Linguistics
Earley’s Algorithm (1970) Nice combo of our parsing ideas so far:
Parsing and More Parsing
Truth Trees.
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Heaps Chapter 11 has several programming projects, including a project that uses heaps. This presentation shows you what a heap is, and demonstrates.
Chunk Parsing CS1573: AI Application Development, Spring 2003
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CS : Language Technology for the Web/Natural Language Processing
Constraint satisfaction problems
BNF 9-Apr-19.
CSA2050 Introduction to Computational Linguistics
Probabilistic Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Dekai Wu Presented by David Goss-Grubbs
David Kauchak CS159 – Spring 2019
LING/C SC 581: Advanced Computational Linguistics
Constraint satisfaction problems
Presentation transcript:

Constraining Chart Parsing with Partial Tree Bracketing Seth Kulick (with Dan Bikel and Tony Kroch) Clunch, 12/12/05

Outline Motivation Key Aspects of Implementation Interaction with Parser Experiments with Penn Treebank Experiments with PPCEME (Penn-Helsinki Parsed Corpus of Early Modern English) 12/12/05 Seth Kulick

Motivation Hard constraints on parser input – prebracketing S, NP, etc. Becomes useful when such phrases can be separately marked with a high degree of accuracy Treebank construction – less-trained treebankers can do just S, or NP, etc. Extra annotation steps will pay off if resulting parses have a substantial increase in accuracy 12/12/05 Seth Kulick

Partial Tree Specification Prebracketing: (S (NP (NNP John)) (VBD ate) (NP (DT the) (NN ice) (NN cream))) Possible tree: (S (NP (NNP John)) (VP (VBD ate) (NP (DT the) (NN ice) (NN cream))) 12/12/05 Seth Kulick

Partial Tree Specification Prebracketing: (S (NP (NNP John)) (VBD ate) (NP (DT the) (NN ice) (NN cream))) Another possible tree: (S (NP (NNP John)) (VP (VBD ate) (NP (NP (DT the) (NN ice))) (NP (NN cream))))) Easy to imagine various monstrosities 12/12/05 Seth Kulick

Negative Contraints Prebracketing should be “gold”, with parses having 100% P/R for those constituents. Allow “negative constraints” – implicit constraint on every chart item that it cannot be a spurious instance of a constituent that is prebracketed. 12/12/05 Seth Kulick

Negative Contraints (S (NP (NNP John)) (VBD ate) (NP (DT the) (NN ice) (NN cream))) No NP brackets in parse except with span(0,0) and (2,4), and those must exist. Precision, as well as recall, must be 100% 12/12/05 Seth Kulick

Wildcard Constraints Specify that some bracketing exists, regardless of label. Currently only used for root to keep it as a tree E.g., if only doing NP prebracketing: (* (NP (NNP John)) (VBD ate) (NP (DT the) (NN ice) (NN cream))) wildcard means “can be anything except one of the negative constraints” – e.g., root can’t be NP here. 12/12/05 Seth Kulick

Notational Note May also be written: (S (NP (NNP John)) (VBD ate) (NP (DT the) (NN ice) (NN cream))) May also be written: (S-0-4 (NP-0-0 (NNP-0-0 John)) (VBD-1-1 ate) (NP-2-4 (DT-2-2 the) (NN-3-3 ice) (NN-4-4 cream))) Or sometimes dropping the terminals: (S-0-4 NP-0-0 NP-2-4) 12/12/05 Seth Kulick

Not just constraints on items Crucial to ensure that when an item matches a constraint, all the child constraints in the constraint tree also match some item in the tree. “and prays delygently for your good speed in your matters” (*-0-9 (NP-4-9 (NP-4-6 NP-8-9))) NP(4,9) “your good speed in your matters” NP(4,6) “your good speed” NP(8,9) “your matters” 12/12/05 Seth Kulick

Not just constraints on items (*-0-9 (NP-4-9 (NP-4-6 NP-8-9))) (IP and prays (ADVP delygently) (PP for (NP-A (NPB your good speed))) (PP in (NP-A (NPB your matters)))) Matches *-0-9, NP-4-6,NP-8-9, but not NP-4-9 While *-0-9 is satisfied, its child constraint NP-4-9 is not 12/12/05 Seth Kulick

Killing Items ASAP Can’t just check constraints when stop probabilities are added Two partial constituents can be “equivalent” yet different in terms of constraints. 12/12/05 Seth Kulick

Killing Items ASAP (IP prays (ADVP delygently) (PP for (NP-A (NPB your good speed))) (PP in (NP-A (NPB your matters)))) (IP prays (ADVP delygently) (PP for (NP-A (NPB your good speed))) (PP in (NP-A (NPB your matters)))) ) No need to wait for the left modifier “and” to be added to kill the first one. Otherwise bad one could kill off “equivalent” good one. 12/12/05 Seth Kulick

Outline Motivation Key Aspects of Implementation Interaction with Parser Experiments with Penn Treebank Experiments with PPCEME (Penn-Helsinki Parsed Corpus of Early Modern English) 12/12/05 Seth Kulick

Collins/Bikel recap Underlying lexicalized PCFG has rules of form Generate each child independently: given P has been generated, generate H, then generate modifying nonterminals from head-adjacent outward Definition is top-down, derivation is bottom-up Seed Chart, addUnaries, joinItems, addStopProbs 12/12/05 Seth Kulick

Seeding the Chart Constraint associated with each chart item (*-0-4 (NP-0-0 (NNP-0-0 John)) (VBD-1-1 ate) (NP-2-4 (DT-2-2 the) (NN-3-3 ice) (NN-4-4 cream))) Constraint associated with each chart item Items seeded with appropriate terminal constraint (NNP-0-0, etc.) 12/12/05 Seth Kulick

addUnaries – assigning constraints (*-0-4 (NP-0-0 (NNP-0-0 John)) (VBD-1-1 ate) (NP-2-4 (DT-2-2 the) (NN-3-3 ice) (NN-4-4 cream))) P(th,wh) If H matches c1 (label and span), then c1’s parent constraint. Otherwise c1 H(th,wh) Constraint c1 12/12/05 Seth Kulick

addUnaries – assigning constraints (*-0-4 (NP-0-0 (NNP-0-0 John)) (VBD-1-1 ate) (NP-2-4 (DT-2-2 the) (NN-3-3 ice) (NN-4-4 cream))) (VBD ate)) VBD-1-1 (VP ate the ice cream) *-0-4 (S John..cream) (NP (NN cream)) NP-2-4 (NN cream) NN-4-4 12/12/05 Seth Kulick

addUnaries – killing items Kill new item if it doesn’t match its constraint label and it matches one of the negative constraints. E.g. IP and NP are being prebracketed, and both are negative constraints. New item is NP(3,6) with constraint IP-0-10. This is not okay, since if the item was good there would have been a prebracketing for NP(3,6) (NP 3 (NP 4..5) 6) (IP-0-10 NP-4-5 NP-8-10) 12/12/05 Seth Kulick

addUnaries – killing items Kill new item if its constraint is the wildcard and the item matches one of the negative constraints. E.g., item is NP(4,5), NPs are being prebracketed and so NP is negative constraint. Item’s constraint is *-0-10 This is not okay, since * must be something besides NP 12/12/05 Seth Kulick

joinItems – assigning contraints When joining two items, modificand’s constraint is associated with the new item P(th,wh) Contraint c2 P(th,wh) Contraint c2 H(th,wh) H(th,wh) R 12/12/05 Seth Kulick

joinItems – assigning contraints (*-0-4 (NP-0-0 (NNP-0-0 John)) (VBD-1-1 ate) (NP-2-4 (DT-2-2 the) (NN-3-3 ice) (NN-4-4 cream))) (NN cream) NN-4-4 (NP (NN cream) NP-2-4 (NN ice) NN-3-3 (NN cream) NN-4-4 (NP (NN cream) NP-2-4 (NN ice) NN-3-3 12/12/05 Seth Kulick

joinItems – killing items Kill new item if child intersects with constraint span (IP (CONJ and) (VBD left) (D this) (N letter) (IP (TO to) (BE be) (VAN sent)… Modificand: (IP (TO to)) constraint:IP-4-9 (Left) Modifier: (NPB (N letter)) 12/12/05 Seth Kulick

joinItems – killing items Kill new item if there is any constraint within the span of the item that does not match some descendent of the item (*-0-9 (NP-4-9 (NP-4-6 NP-8-9))) (IP prays-1 (ADVP delygently-2) (PP for-3 (NP-A (NPB your-4 good-5 speed-6))) (PP in-7 (NP-A (NPB your-8 matters-9)))) PP(7,9) just joined to IP (1,6), forming IP(1,9) 12/12/05 Seth Kulick

joinItems – killing items (*-0-9 (NP-4-9 (NP-4-6 NP-8-9))) (IP prays-1 (ADVP delygently-2) (PP for-3 (NP-A (NPB your-4 good-5 speed-6))) (PP in-7 (NP-A (NPB your-8 matters-9)))) check constraints in (1,9): NP-4-9, NP-4,6, NP-8-9 NP-4-9 not satisfied – kill item (never enters chart) Some smarts – actually will only check NP-4-9 (doesn’t go down constraint tree branches) Could be smarter and kill this earlier when PP(3,6) was added 12/12/05 Seth Kulick

addStopProbs – killing items Kill new item if Item span != constraint span && constraint label matches item label && constraint is not wildcard && negative(itemLabel) E.g. NP is being prebracketed, and so NP is a negative constraint. New item is NP(4,6) and constraint is NP-3-6. This is not okay, since if the item was good there would have been a prebracketing for NP(4,6). Can’t kill off this case earlier since item could grow to match the constraint’s span. (addUnaries case was when labels did not match – then it could be killed off earlier.) 12/12/05 Seth Kulick

Internal Transformations – Yuch! Various internal transformations used by the parser But the constraints are over untransformed partial trees Some simple cases: node is SG, constraint is S, consider them the same. A base NP can be generated, with label NPB. Similarly, this could also be considered to match a NP constraint. But at the end of the parse, NPBs that are a unary child of a NP are deleted. So a NP constraint could refer to *two* nodes in the parse, since they are later collapsed together. 12/12/05 Seth Kulick

Dealing with BaseNPs This is really bad, since the higher NP will now match a negative constraint, and be killed NP NPB 2..4 NP-0-5 NP-2-4 12/12/05 Seth Kulick

Dealing with Base NPs Modify definition of matching a negative constraint Matches one of the negative constraints, and it’s not this case: NP NPB is only child and it matches its constraint. NPB 12/12/05 Seth Kulick

Dealing with Base NPs Such cases will no longer be killed in addUnaries. But if more items are joined, the NPB will not be killed, and higher NP *should* be killed off. NP NPB 2..4 NP-0-5 NP-2-4 So for the special case of NP, we keep checking at joinItems if it matches a negative constraint. 12/12/05 Seth Kulick

Outline Motivation Key Aspects of Implementation Interaction with Parser Experiments with Penn Treebank Experiments with PPCEME (Penn-Helsinki Parsed Corpus of Early Modern English) 12/12/05 Seth Kulick

Experiments with PTB What is the effect of forcing certain constituents to be gold? Are there sentences that just cannot be parsed with such constraints? Trained on 02-21, tested on 23, with gold NP, S, or both. Ran in regular way, with expanded beam, with “relaxation”, or both. “relaxation” – zero probabilities changed to low probability 12/12/05 Seth Kulick

Experiments with PTB 23 # Nulls (2416 sentences) Usual Wide Beam Relax Wide & Relax None S 35 2 25 2 (0.1%) NP 143 12 115 4 (0.2%) NP&S 211 36 167 27 (1.1%) 12/12/05 Seth Kulick

Prebracketing Penn Treebank none S NP NP&S 88.53/ 88.63 92.27/ 92.09 95.21/ 94.45 97.17/ 96.48 91.03/ 89.90 92.02/ 90.60 99.52/ 98.84 99.53/ 99.11 90.51/ 90.35 96.57/ 96.42 93.77/ 92.56 97.22/ 96.43 89.42/ 89.52 100.00/ 99.79 92.96/ 90.57 100.00/ 99.78 85.41/ 84.58 86.88/ 86.11 97.72/ 97.06 98.14/ 96.74 84.08/ 87.29 94.05/ 97.24 89.46/ 89.71 94.13/ 95.75 all Constituent score NP VP S PP SBAR 12/12/05 Seth Kulick

Experiments with PPCEME 1.8 Million words, 1500-1710. Experiments using 500K words over 1640-1710. New treebanks using similar annotation style. Same questions as with PTB… …but main concern is really to get cheaper “level one” treebankers to do IP and NP levels (also working with entire 1.8 million words) 12/12/05 Seth Kulick

Experiments with PPCEME # Nulls (2860 sentences) Usual Wide Beam Relax Wide & Relax None 46 1.6% 17 0.6% IP 26 0.9% NP 32 1.1% NP&IP 74 2.6% 12/12/05 Seth Kulick

Prebracketing PPCEME none IP NP NP&IP 77.22/ 76.93 88.36/ 86.64 89.73/ 88.48 96.14/ 93.63 83.50/ 81.75 86.51/ 83.32 99.70/ 98.08 99.69/ 98.26 72.23/ 72.19 100.00/ 99.97 79.94/ 78.64 78.20/ 78.14 85.63/ 85.25 93.61/ 93.25 96.54/ 95.00 61.67/ 63.42 91.73/ 90.09 75.54/ 73.76 92.83/ 85.93 (??) all Constituent score NP IP PP CP 12/12/05 Seth Kulick

Conclusion and Future Stuff Prebracketing Conjuncts Using some high-accuracy method for getting S or NPs Propaganda point: greater success for empty category recovery Continue to use it for ongoing treebank construction, and experiment with what are the best constituents to prebracket 12/12/05 Seth Kulick

Key Aspects of Implementation Partial Tree Specification Negative Constraints Wildcard Killing constraint-violating parses early Not just constraints on cells Conflict between constraints and internal tree transformations 12/12/05 Seth Kulick

Overview of Collins’ Model (from Dan’s defense slides) P(th,wh) H(th,wh) Li generated conditioning on Li Li–1 L1 subcat … {subcatL} The model generates L-sub-i conditioning on the following items, indicated in red circles.  12/12/05 Seth Kulick

Experiments with PPCEME # Nulls (2860 sentences) Usual Wide Beam Relax Wide & Relax None 46(78) 1.6% 17(83) 0.6% IP 26(80) 0.9% NP 32(79) 1.1% NP&IP 74(81) 2.6% 12/12/05 Seth Kulick

Experiments with PTB 23 # Nulls (2416 sentences) Usual Wide Beam Relax Wide & Relax None 0(39) 0 (46) S 35(40) 2(52) 25(48) 2 (44) (0.1%) NP 143(41) 12(53) 115(49) 4 (45) (0.2%) NP&S 211(42) 36(51) 167(47) 27 (43) (1.1%) 12/12/05 Seth Kulick