Liberal Event Extraction and Event Schema Induction Lifu Huang1, Taylor Cassidy2, Xiaocheng Feng3, Heng Ji1, Clare R. Voss2, Jiawei Han4, Avirup Sil5 1 Rensselaer Polytechnic Institute, 2 US Army Research Lab, 3 Harbin Institute of Technology, 4 University of Illinois at Urbana-Champaign, 5 IBM T.J. Watson Research Center
Task Definition Event Mention: a string of words in text denoting a particular event; Event Trigger: the word in the event mention that holds the bulk of its semantic content; Event Argument: a concept that serves as a participant or attribute with a specific role in an event mention; Argument Role: the function or purpose of the argument with respect to the corresponding event; e.g.:
Comparison Traditional Event Extraction vs. Liberal Event Extraction
Motivating Examples - Top 8 most similar words based on lexical embedding
Hypothesis 1 Event triggers that occur in similar contexts and share the same sense tend to have similar types. WSD (word sense disambiguation) to learn a distinct embedding for each sense;
Motivating Examples
Hypothesis 2 Beyond the lexical semantics of a particular event trigger, its type is also dependent on its arguments and their roles, as well as other words contextually connected to the trigger; Event Structure;
Approach Overview Hypothesis 2 Hypothesis 1
Identification Trigger Identification: Arguments Identification: AMR Parsing: map concepts to OntoNotes; FrameNet: war, theft, pickpocket; Arguments Identification: AMR Parsing: Most arguments can be captured by rich semantic parsing.
Identification Trigger Identification: Arguments Identification: AMR Parsing: map concepts to OntoNotes; FrameNet: war, theft, pickpocket; Arguments Identification: AMR Parsing: Most arguments can be captured by rich semantic parsing.
Representation WSD based Lexical Embedding: Preprocess: IMS (Zhong and Ng. 2010) for word sense disambiguation; Skip-gram Word2Vec (Mikolov et al., 2013); Top 10 most similar words: Fire-1 Score firing cannon grenades grenade gun arm explosive point-blank 0.829 0.774 0.767 0.760 0.757 0.755 0.742 0.740 Fire-2 Score rehired hire-1 resign-1 rehire sacked quit-1 sack-1 quits 0.790 0.626 0.618 0.596 0.591 0.565 0.563
Representation Event Structure Representation: Recursive Neural Tensor AutoEncoder
Joint Constraint Clustering Joint Optimized Trigger and Argument Clusters: Spectral Clustering; Internal Measures: cohesion (Dintra) and separation(Dinter);
Naming Trigger Type Naming: Argument Role Naming: The trigger which is nearest to the centroid; Argument Role Naming: Borrow from existing linguistic resource: map AMR roles to the roles in FrameNet, VerbNet, PropBank;
Naming Trigger Type Naming: Argument Role Naming: The trigger which is nearest to the centroid; Argument Role Naming:
Experiments and Evaluation Data Set: ACE (Automatic Content Extraction); ERE (Entity Relation Event);
Experiments and Evaluation Schema Discovery: Event Types and Arguments:
Experiments and Evaluation Schema Discovery:
Experiments and Evaluation Schema Discovery:
Experiments and Evaluation Schema Discovery: Coverage Comparison with ACE and ERE: New Event types and Arguments: e.g., Arguments for Attack Event: Attacker, Target, Instrument, Time, Place The Dutch government, facing strong public anti-war pressure, said it would not commit fighting forces to the war against Iraq but added it supported the military campaign to disarm Saddam. Attack New Argument: Purpose
Experiments and Evaluation Event Extraction for All Types: Fully annotate 100 sentences with inter-annotator agreement: 83% for triggers and 79% for arguments; Missing Triggers and Arguments:
Experiments and Evaluation Event Extraction for All Types: Fully annotate 100 sentences with inter-annotator agreement: 83% for triggers and 79% for arguments; Missing Triggers and Arguments: Multi-word expressions (e.g., took office) or not verb or noun concepts:
Experiments and Evaluation Event Extraction for All Types: Fully annotate 100 sentences with inter-annotator agreement: 83% for triggers and 79% for arguments; Missing Triggers and Arguments: Multi-word expressions (e.g., took office) or not verb or noun concepts: e.g.: As well as previously holding senior positions at Barclays Bank, BZW and Kleinwort Benson, McCarthy was formerly a top civil servant at the Department of Trade and Industry.
Experiments and Evaluation Event Extraction for All Types: Fully annotate 100 sentences with inter-annotator agreement: 83% for triggers and 79% for arguments; Missing Triggers and Arguments: Multi-word expressions (e.g., took office) or not verb or noun concepts Arguments that are not direct semantic related with event triggers:
Experiments and Evaluation Event Extraction for All Types: Fully annotate 100 sentences with inter-annotator agreement: 83% for triggers and 79% for arguments; Missing Triggers and Arguments: Multi-word expressions (e.g., took office) or not verb or noun concepts Arguments that are not direct semantic related with event triggers: e.g., : Anti-corruption judge Saul Pena stated Montesinos has admitted to the abuse of authority charge.
Experiments and Evaluation Impact of Representations: AMR vs. Dependency: AMR: Instrument Dependency Parsing: compound modifier e.g., Approximately 25 kilometers southwest of Sringar 2 militants were killed in a second gun battle.
Experiments and Evaluation Event Extraction on ACE and ERE Types Manually asses triggers to ACE/ERE; Comparison Supervised Methods: DMCNN: A dynamic multi-pooling convolutional neural network based on distributed word representations (Chen et al., 2015); Joint IE: A Structured perceptron model based on symbolic semantic features (Li et al., 2013); LSTM: A long short-term memory neural network (Hochreiter and Schmidhuber, 1997) based on distributed semantic features;
Experiments and Evaluation Event Extraction on ACE and ERE Types Manually asses triggers to ACE/ERE; Comparison Supervised Methods:
Experiments and Evaluation Event Extraction on ACE and ERE Types Manually asses triggers to ACE/ERE; Comparison Supervised Methods: Supervised methods heavily rely on the quality and quantity of the training data: ERE training documents contain 1,068 events and 2,448 arguments, while ACE training documents contain more than 4,700 events and 9,700 arguments;
Experiments and Evaluation Event Extraction on Biomedical Data set: 14 biomedical articles (755 sentences) with perfect AMR annotations; Randomly sample 100 sentences and manually asses the correctness of each event and argument; 83.1% precision on trigger labeling (619 events); 78.4% precision on argument labeling (1,124 arguments);
Experiments and Evaluation Event Extraction on Biomedical Data set: 14 biomedical articles (755 sentences) with perfect AMR annotations; Randomly sample 100 sentences and manually asses the correctness of each event and argument;
Questions and Comments? Thanks!