SEMANTIC ROLE LABELING BY TAGGING SYNTACTIC CHUNKS Kadri Hacioglu1, Sameer Pradhan1, Wayne Ward1 James H. Martin1, Daniel Jurafsky2 1 The Center for Spoken Language Research 2 Stanford NLP Group Stanford University University of Colorado at Boulder
OUTLINE Semantic Role Labeling (SRL) Nature of Shared Task Data Our Strategy System Description & Features Experiments Concluding Remarks
SEMANTIC ROLE LABELING Based on predicate-argument structure: First explored by (Gildea & Jurafsky, 2000) Predicate: pursue A1 (Theme) A0 (Agent) AM-MNR (Manner) we completion of this transaction aggressively PropBank style Thematic role [A0We] are prepared to [PREDpursue] [A1aggressively] [AM-MNR completion of this transaction] he says
EXAMPLE OF SHARED TASK DATA words POS tags Clause tags Semantic labels Sales NNS B-NP (S* - (A1*A1) declined VBD B-VP * decline (V*V) 10 CD B-NP * - (A2* % NN I-NP * - *A2) to TO B-PP * - * $ $ B-NP * - (A4* 251.2 CD I-NP * - * million CD I-NP * - *A4) from IN B-PP * - * $ $ B-NP * - (A3* 287.7 CD I-NP * - * million CD I-NP * - *A3) . . O *S) - * BP tags (BOI2) Predicate Info
OUTLINE OF OUR STRATEGY Change Shared Task Representation make sure that it is reversible Engineer additional features use intuition, experience and data analysis Optimize system settings context size SVM parameters; degree of polynomial, C
CHANGE IN REPRESENTATION Restructure available information - words collapsed into respective BPs - only headwords are retained (rightmost words) - exceptions: VPs with the predicate; Outside (O) chunks Modify semantic role labeling - BOI2 scheme instead of bracketing scheme
NEW REPRESENTATION BPs POS tags Clause tags Semantic labels (BOI2) NP Sales NNS B-NP (S* - B-A1 VP declined VBD B-VP * decline B-V NP % NN I-NP * - B-A2 PP to TO B-PP * - O NP million CD I-NP * - B-A4 PP from IN B-PP * - O NP million CD I-NP * - B-A3 O . . O *S) - O Headwords BP tags (BOI2) Predicate Info
DIFFERENCES BETWEEN REPRESENTATIONS Original Representation New Tokens words base phrases Lexical Info all words headwords #Tagging Steps larger fewer Context span narrower wider # Role Labels greater smaller Info Loss - yes Performance worse better
SYSTEM DESCRIPTION Phrase-by-phrase Left-to-right Binary feature encoding Discriminative Deterministic SVM based (YamCha toolkit, developed by Taku Kudo) Simple post-processing (for consistent bracketing)
BASE FEATURES Words Predicate lemmas Part of speech tags Base phrase IOB2 tags Clause bracketing tags Named Entities
ADDITIONAL FEATURES Token level Sentence level Token position Path Clause bracket patterns Clause Position Headword suffixes Distance Length Predicate POS tag Predicate Frequency Predicate Context (POS, BP) Predicate Argument Frames Number of predicates
EXPERIMENTAL SET-UP Corpus: Flattened PropBank (2004 release) Training set: Sections 15-18 Dev set: Section 20 Test set: Section 21 SVMs: 78 OVA classes, polynomial kernel, d=2, C=0.01 Context: sliding +2/-2 tokens window
RESULTS Base features, W-by-W & P-by-P approaches, dev set Method Precision Recall F1 W-by-W 68.34% 45.16% 54.39 P-by-P 69.04% 54.68% 61.02 All features, P-by-P approach Data Precision Recall F1 Dev set 74.17% 69.42% 71.72 Test set 72.43% 66.77% 69.49
CONCLUSIONS We have done SRL by tagging base phrase chunks - original representation has been changed - additional features have been engineered - SVMs have been used Improved performance with new representation and additional features Compared to W-by-W approach, our method - classifies larger units - uses wider context - runs faster - performs better
THANK YOU! So so… Boring! Cool! Wow!… That’s OK!… Awesome! Not too bad! Yawning..
CLAUSE FEATURES Clause (CL) markers CL pattern to predicate One CD B-NP (S* - OUT (S*(S**S) - troubling VBG I-NP * - OUT (S**S) (S* aspect NN I-NP * - OUT (S**S) (S* of IN B-PP * - OUT (S**S) (S* DEC NNP B-NP * - OUT (S**S) (S* 's POS B-NP * - OUT (S**S) (S* results NNS I-NP * - OUT (S**S) (S* , , O * - OUT (S**S) (S* analysts NNS B-NP (S* - IN (S**S) (S* said VBD B-VP *S) say IN - - , , O * - OUT *S) *S) was VBD B-VP * - OUT *S) *S) its PRP$ B-NP * - OUT *S) *S) performance NN I-NP * - OUT *S) *S) in IN B-PP * - OUT *S) *S) Europe NNP B-NP * - OUT *S) *S) . . O *S) - OUT *S)*S) - CL pattern to sentence begin predicate CL pattern to sentence end
SUFFIXES suffixes of length 2-4 as features for head words are tried The confusion B-AM-MNR B-AM-TMP single word cases: fetchingly, tacitly, provocatively suffixes of length 2-4 as features for head words are tried