(Entity and) Event Extraction CSCI-GA.2591

Slides:



Advertisements
Similar presentations
Search-Based Structured Prediction
Advertisements

Data Mining Classification: Alternative Techniques
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
An Illustrative Example
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Online Learning Algorithms
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Semantic Grammar CSCI-GA.2590 – Lecture 5B Ralph Grishman NYU.
1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
Feature selection with Neural Networks Dmitrij Lagutin, T Variable Selection for Regression
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
A Database of Narrative Schemas A 2010 paper by Nathaniel Chambers and Dan Jurafsky Presentation by Julia Kelly.
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Automatically Labeled Data Generation for Large Scale Event Extraction
Lecture 7: Constrained Conditional Models
Unsupervised Learning of Video Representations using LSTMs
Deep Learning for Bacteria Event Identification
Artificial Neural Networks
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Preliminaries CSCI-GA.2591
Tokenizer and Sentence Splitter CSCI-GA.2591
Named Entity Tagging with Conditional Random Fields
Relation Extraction CSCI-GA.2591
Entity- & Topic-Based Information Ordering
NYU Coreference CSCI-GA.2591 Ralph Grishman.
By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS
Markov Logic Networks for NLP CSCI-GA.2591
Two Discourse Driven Language Models for Semantics
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
Training and Evaluation CSCI-GA.2591
Different Units Ramakrishna Vedantam.
CSC 594 Topics in AI – Natural Language Processing
Social Knowledge Mining
Clustering Algorithms for Noun Phrase Coreference Resolution
Lei Sha, Jing Liu, Chin-Yew Lin, Sujian Li, Baobao Chang, Zhifang Sui
Neuro-Computing Lecture 4 Radial Basis Function Network
Word embeddings based mapping
Word embeddings based mapping
Other Classification Models: Recurrent Neural Network (RNN)
Attention.
NER with Models Allowing Long-Range Dependencies
The Voted Perceptron for Ranking and Structured Classification
Attention for translation
CIS 519 Recitation 11/15/18.
Presented by: Anurag Paul
Dan Roth Department of Computer Science
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Week 3 Presentation Ngoc Ta Aidean Sharghi.
Sequence-to-Sequence Models
Neural Machine Translation by Jointly Learning to Align and Translate
Presentation transcript:

(Entity and) Event Extraction CSCI-GA.2591 NYU (Entity and) Event Extraction CSCI-GA.2591 Ralph Grishman

Entity Tagger Comparisons to JetLite are difficult because stages are divided differently in many systems some systems combine with coreference some combine with NE tagging closest match is mention detection and classification F = 82.5 with either NN or MEMM [Nguyen 2017]

Motivations forJetLite pipeline Part of the current pipeline: Tokenizer – adds Token annotations NEtagger – adds Enamex annotations, shadowing some Token annotations parse – adds dependents feature Coref – adds Entity annotations EntityTagger –adds semType features to Entity annotations Why Do we want NE tagger before parse? Do we want parse before coref? Do we want coref before entity tagging?

ACE Events ACE 2005 had 6 types of events and 33 subtypes An ACE event mention reports an event and possibly its time, place, participants, and status: Two soldiers were wounded  injure event (also, the wounded soldiers) Jane Doe and John Smith were married on May 9  marry event ACE 2005 had 6 types of events and 33 subtypes most papers report on subtypes Information about the event may appear anywhere in a sentence: variable number of arguments at longer distance than relations (makes annotation harder)

Ace Event Mentions Have a complex structure: a trigger word (also called an ‘anchor’) typically a verb or nominalization a set of arguments (with roles) participants in the event (ACE entity mentions) time of the event (TIMEX) place of the event (GPE or LOCATION) a set of features time (past present future) specific / generic …

ACE Events ACE requires coreference resolution for entities, relations, and events We are doing it for entities It is straightforward for relations: check for same relation type arg1’s corefer arg2’s corefer It is much less straightforward for events different mentions of an event may include different subsets of arguments arguments which overlap but are not exact matches

Benchmark ACE 2005 corpus generally assuming perfect entity mentions on input Standard set of 40 test documents (out of 600) Dual annotation and adjudication

Evaluation Official ACE ‘value’ metric very complicated Combined everything into one value For R&D report separate trigger and argument metrics

Using MaxEnt + kNN Early description of an ACE event extractor [Ahn 2006] Did classification in four separate stages trigger arguments features coreference Also tried memory-based learner trigger classifier applied to most words in corpus very unbalanced training data rich feature set, including info on entities in same sentence F=50 with MaxEnt, F=60 with kNN

Using MaxEnt + kNN Argument classifier classify every pair of <event, entity> and <event, TIMEX> in same sentence best result with separate classifier for each event type MaxEnt F = 57 kNN F = 52 Independent classification of triggers and arguments is a simplification presence of arguments affects choice of event type (1) An American tank fired on the Palestine Hotel. attack (2) He has fired his air defense chief  endPosition presence of one argument may affect assignment of other roles usually only one attacker How do we capture this info?

Global Features more global features for events how to capture this? other events in this document Having one attack event in a document increases the chance of other attacks Having an attack event in a document increases the chance of injure and die events Other documents Find similar documents (news stories) using BoW Events in retrieved documents increase likelihood of events in initial document how to capture this? let’s look at specific solns Ahn: rich features Ji, Liao: rules Li: structured perceptron Nguyen: complex NN

Ahn’s response to the problem was to include as trigger features information about potential arguments (entities)

Rule-Based Ji and Grishman [ACL 08] and Liao and Grishman used rules which distinguished low-confidence and high-confidence extractions

Structured Prediction A more principled solution will treat this as a structured prediction problem until now we have decomposed the language analysis task into independent subtasks each with its own loss function each making a 1-of-N prediction now we would like to predict a larger structure an event with trigger and arguments a sentence with multiple events Why is this a problem?

Decoding When we run a MaxEnt classifier, it computes the probability of each outcome y’and returns the most probable outcome For a structured prediction task, there are too many outcomes |Y|

Approximate Decoding We must approximate the decoding iterate over likely outcomes for Event Extraction, loop over tokens in sentence, generate all possible event labels for current token keep K best [beam search] for each event, loop over entities in sentence assign entity to event with all possible role found beam size 4 was sufficient

Global Features supports arbitrary features within sentence but does not support wider scope such as event elsewhere in document

Perceptron Basic linear model: Building block of neural networks Trained using perceptron algorithm for each training example <xj, dj> compute output yj (t) = f(w(t), x) update weights wi(t+1) = wi (t) + (dj – yj(t))xj,i

Structured Perceptron We face a decoding problem in training a structured perceptron: again use argmax zi = argmax(yεY) F(x,z) * α If (zi != yi) α = α + F(xi, yi) – F(xi,zi) Used by Collins & Roark to enhance parser output

Neural Net for EE Basic approach [Nguyen et al NAACL 2116] recurrent NN Loop with memory (LSTM or GRU) One iteration for each token in sentence Bidirectional Two RNNs: one operating L to R, the other R to L Tried three memories Gitrg = trigger subtypes recognized in words 1,…,i Giarg = arg roles recognized in words 1,…,i Giarg/trg = entities recognized as argument of an event of a given subtype in words 1,…,I Only Giarg/trg improved performance

Results trigger F argument F Li’s baseline 65.9 43.9 structured perceptron 67.5 52.7 JRNN 69.3 55.4