AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward, Dan Jurafsky, James Martin Center for Spoken Language Research University of Colorado Boulder, CO
AQUAINT Workshop – June 2003 What is Semantic Role Tagging? Assigning semantic labels to sentence elements. Elements are arguments of some predicate or participants in some event. –Who did What to Whom, How, When, Where, Why [ TEMPORAL In 1901 ] [ THEME President William McKinley ] [ TARGET was shot] [ AGENT by anarchist Leon Czolgosz ] [ LOCATION at the Pan-American Exposition ]
AQUAINT Workshop – June 2003 Parsing Algorithm From Gildea and Jurafsky (2002) Generate syntactic parse of sentence (Charniak) Specify predicate (verb) For each constituent node in parse tree: –Extract features relative to predicate Path, Voice, Headword, Position, Phrase Type, Sub-Cat –Estimate P(Role| features) for each role and normalize –Assign role with highest probability
AQUAINT Workshop – June 2003 SVM Classifier Same basic procedure as (Gildea & Jurafsky 2000) –Same features except include predicate as feature Change classification step to use SVM TinySVM software [Kudo & Matsumoto 2000] Prune constituents with P(Null) > 0.98 –For efficiency in training –Prunes ~ 80% of constituents For each role train one-vs-all classifier –Includes Null role
AQUAINT Workshop – June 2003 SVM Classification Generate syntactic parse (Charniak parser) For each target (verb) Prune constituents with P(Null) > 0.98 Run each ova classifier on remaining constituents Convert SVM output to probs by fitting sigmoid Described in Platt 2000 Generate N-best labels for each constituent Pick highest prob sequence of non-overlapping roles
AQUAINT Workshop – June 2003 Features Target word (verb) Cluster for target word (64) Path from cons to target Phrase Type Position (before/after) Voice Head Word Sub-categorization Path: NP S VP VB Head Word: He Sub-cat: VP VB NP
AQUAINT Workshop – June 2003 Role Labels Arg0 ArgM-ADV Arg1 ArgM-CAU Arg2 ArgM-DIR Arg3 ArgM-DIS Arg4 ArgM-EXT Arg5 ArgM-LOC ArgA ArgM-MNR ArgM ArgM-MOD ArgM-REC ArgM-NEG ArgM-PRD ArgM-PRP ArgM-TMP Agent Actor Beneficiary Cause Degree Experiencer Goal Instrument Location Manner Means Proposition Result State Stimulus Source Temporal Theme Topic Type Other PropBank ArgumentsThematic Roles
AQUAINT Workshop – June 2003 Data PropBank data –WSJ section of Penn TreeBank –Annotated with Predicate-Argument Train on PropBank Training Set –Section 00, 23 witheld – 72,000 annotated roles Test on PropBank section-23 –3,800 annotated roles
AQUAINT Workshop – June 2003 SVM Performance Arg ID P R F Role Assign SVM Surdeanu03 (same feat) Surdeanu03 (add’tl feat) Gildea & Palmer (2002)83 Annotate PropBank Arguments Gold-Standard Parses from TreeBank
AQUAINT Workshop – June 2003 Using Real Parses Arg IDRole A TreeBank Parse Charniak Parse Annotate PropBank Arguments Arg IDRole A TreeBank Parse Charniak Parse AnnotateThematic Roles
AQUAINT Workshop – June 2003 ID and Label ID and Annotate Thematic Roles Using Charniak Parse Top N Classification
AQUAINT Workshop – June 2003 Hard vs Soft Pruning Soft Pruning Train Null-vs-Role classifier on all data Prune constituents with P(Null) > 0.98 Train ova classifiers (incl Null) on remaining constituents Hard Pruning Train Null-vs-Role classifier on all data Make Null-vs-Role classification for each constituents Train ova classifiers (no Null) on role constituents
AQUAINT Workshop – June 2003 Segment & Classify with SVM Initial system used Charniak parser to segment –SVM classified segmented constituents Use SVM to segment and classify chunks Features: –Window of 5 words (+2,target,-2) –POS tags for words –Syntactic phrase position tags (B,I,O) –Path from word to target –Class assignments for previous words Assign Semantic phrase position tag to each word
AQUAINT Workshop – June 2003 SVM Chunking Parser Syntactic Parser Path Finder Chunker Active Passive Detector words path for each word POS tags word positions voice Target word detector target word input sentence Features
AQUAINT Workshop – June 2003 Example I But CC O CC VP->VBP say B A O analysts NNS B-NP NNS VP->VBP say B A B-agent IBM NNP B-NP VBP SBAR->S->NP->NNP say A A B-topic is AUX O VBP SBAR->S->VP->AUX say A A I-topic a DT B-NP VBP SBAR->S->VP->NP->DT say A A I-topic special JJ I-NP VBP SBAR->S->VP->NP->JJ say A A I-topic case NN I-NP VBP SBAR->S->VP->NP->NN say A A I-topic But analysts say IBM is a special case But [ AGENT analysts] [ TARGET say] [ TOPIC IBM is a special case] Word POS SPP Path Pr B/A V Class
AQUAINT Workshop – June 2003 SVM Chunking Parser II Features POS tagger Path Finder Yamcha Chunker Active Passive Detector words path for each word POS tags word positions voice Target word detector target word input sentence
AQUAINT Workshop – June 2003 Example II But_ CC [ NP analysts_ NNS ] ( VP say_ VBP ) [ NP IBM_ NNP ] ( VP is_ VBZ ) [ NP a_ DT special_ JJ case_ NN ] But CC O CC->NP->VP->VBP say B A O analysts NNS B-NP NNS->NP->VP->VBP say B A B-agent IBM NNP B-NPNNP->NP->VP->VBPsay A A B-topic is VBZ B-VPVBZ->VP->NP->VP->VBP say A A I-topic a DT B-NPDT->NP->VP->NP->VP->VBP say A A I-topic special JJ I-NP JJ->NP->VP->NP->VP->VBP say A A I-topic case NN I-NPNN->NP->VP->NP->VP->VBP say A A I-topic POS tagged & Chunked (only NP and VP) But analysts say IBM is a special case Word POS SPP Path Pr B/A V Class
AQUAINT Workshop – June 2003 Performance Train on only first 3000 sentences PropBank data Segment & Annotate Thematic Roles 21,000 sentences training 3000 sentences training SVM Baseline80/74 Chunker-179/7167/53 Chunker-259/44 Chunker-I Syntax features derived from Charniak parse Chunker-II Syntax features from syntactic SVM chunker
AQUAINT Workshop – June 2003 Summary and Future Work Project has shown continued improvement in semantic parsing Goals: –Improve accuracy through new features –Improve robustness to data sets by improving word sense robustness –Continue experiments without full syntactic parse –Apply to Question Answering