Template-Based Event Extraction Kevin Reschke – Aug 15 th 2013 Martin Jankowiak, Mihai Surdeanu, Dan Jurafsky, Christopher Manning.

Slides:



Advertisements
Similar presentations
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Advertisements

Overview of the TAC2013 Knowledge Base Population Evaluation: Temporal Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji,
Tweet Classification for Political Sentiment Analysis Micol Marchetti-Bowick.
Distant Supervision for Knowledge Base Population Mihai Surdeanu, David McClosky, John Bauer, Julie Tibshirani, Angel Chang, Valentin Spitkovsky, Christopher.
Event Extraction Using Distant Supervision Kevin Reschke, Martin Jankowiak, Mihai Surdeanu, Christopher D. Manning, Daniel Jurafsky 30 May 2014 Language.
Optimizing Statistical Information Extraction Programs Over Evolving Text Fei Chen Xixuan (Aaron) Feng Christopher Ré Min Wang.
A Graphical Model For Simultaneous Partitioning And Labeling Philip Cowans & Martin Szummer AISTATS, Jan 2005 Cambridge.
Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.
Data Mining and Machine Learning Lab Document Clustering via Matrix Representation Xufei Wang, Jiliang Tang and Huan Liu Arizona State University.
Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011.
CS4705.  Idea: ‘extract’ or tag particular types of information from arbitrary text or transcribed speech.
Abstract We present a model of curvilinear grouping using piecewise linear representations of contours and a conditional random field to capture continuity.
Open Information Extraction From The Web Rani Qumsiyeh.
Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Distributed Representations of Sentences and Documents
Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
Chapter 5 Data mining : A Closer Look.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Jan 4 th 2013 Event Extraction Using Distant Supervision Kevin Reschke.
Dongyeop Kang1, Youngja Park2, Suresh Chari2
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Conditional Topic Random Fields Jun Zhu and Eric P. Xing ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011.
HOCT: A Highly Scalable Algorithm for Training Linear CRF on Modern Hardware presented by Tianyuan Chen.
1 Learning CRFs with Hierarchical Features: An Application to Go Scott Sanner Thore Graepel Ralf Herbrich Tom Minka TexPoint fonts used in EMF. Read the.
Event Extraction Using Distant Supervision Kevin Reschke, Mihai Surdeanu, Martin Jankowiak, David McClosky, Christopher Manning Nov 15, 2012.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
Qual Presentation Daniel Khashabi 1. Outline  My own line of research  Papers:  Fast Dropout training, ICML, 2013  Distributional Semantics Beyond.
Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides.
Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.
Joint Models of Disagreement and Stance in Online Debate Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor, Marilyn Walker University of California,
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Machine Learning for Natural Language Processing Robin Tibor Schirrmeister Seminar Information Extraktion Wednesday, 6th November 2013.
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Conditional Markov Models: MaxEnt Tagging and MEMMs
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki,
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Cold-Start KBP Something from Nothing Sean Monahan, Dean Carpenter Language Computer.
Learning From Measurements in Exponential Families Percy Liang, Michael I. Jordan and Dan Klein ICML 2009 Presented by Haojun Chen Images in these slides.
1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.
Conditional Random Fields & Table Extraction Dongfang Xu School of Information.
Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.
Learning Relational Dependency Networks for Relation Extraction
Chapter 7. Classification and Prediction
A Brief Introduction to Distant Supervision
CRF &SVM in Medication Extraction
Learning Emoji Embeddings Using Emoji Co-Occurrence Network Graph
Jeremy Morris & Eric Fosler-Lussier 04/19/2007
Word embeddings based mapping
Word embeddings based mapping
Family History Technology Workshop
GANG: Detecting Fraudulent Users in OSNs
Leveraging Textual Specifications for Grammar-based Fuzzing of Network Protocols Samuel Jero, Maria Leonor Pacheco, Dan Goldwasser, Cristina Nita-Rotaru.
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
Presentation transcript:

Template-Based Event Extraction Kevin Reschke – Aug 15 th 2013 Martin Jankowiak, Mihai Surdeanu, Dan Jurafsky, Christopher Manning

Outline Recap from last time Distant supervision Plane crash dataset Current work Fully supervised setting MUC4 terrorism dataset 2 Underlying theme: Joint Inference Models

Goal: Knowledge Base Population “… Delta Flight 14 crashed in Mississippi killing 40 … ” … … News Corpus Knowledge Base

Distant Supervision Use known events to automatically label training data. Training Knowledge-Base One year after [USAir] Operator [Flight 11] FlightNumber crashed in [Toronto] CrashSite, families of the [200] Fatalities victims attended a memorial service in [Vancouver] NIL.

Plane Crash Dataset 80 plane crashes from Wikipedia infoboxes. Training set: 32; Dev set: 8; Test set: 40 Corpus: Newswire data from 1989 – present.

Extraction Models Local Model Train and classify each mention independently. Pipeline Model Classify sequentially; use previous label as feature. Captures dependencies between labels. E.g., Passengers and Crew go together: “4 crew and 200 passengers were on board.” Joint Model Searn Algorithm (Daumé III et al., 2009). Jointly models all mentions in a sentence.

Results

Fully Supervised Setting: MUC4 Terrorism Dataset 4 th Message Understanding Conference (1992). Terrorist activities in Latin America docs ( train / dev / test = 1300 / 200 / 200 ). 50/50 mix of relevant and irrelevant doc. 8

MUC4 Task 5 slots types: Perpetrator Individual(PerpInd) Perpetrator Organization(PerpOrg) Physical Target(Target) Victim(Victim) Weapon(Weapon) Task: Identify all slot fills in each document. Don’t worry about differentiating multiple events. 9

MUC4 Example 10 THE ARCE BATTALION COMMAND HAS REPORTED THAT ABOUT 50 PEASANTS OF VARIOUS AGES HAVE BEEN KIDNAPPED BY TERRORISTS OF THE FARABUNDO MARTI NATIONAL LIBERATION FRONT [FMLN] IN SAN MIGUEL DEPARTMENT. Victim PerpInd PerpOrg

MUC4 Example 11 THE ARCE BATTALION COMMAND HAS REPORTED THAT ABOUT 50 PEASANTS OF VARIOUS AGES HAVE BEEN KIDNAPPED BY TERRORISTS OF THE FARABUNDO MARTI NATIONAL LIBERATION FRONT [FMLN] IN SAN MIGUEL DEPARTMENT. PerpInd PerpOrg NIL Victim

Baseline Results Local Mention Model Multiclass logistic regression. Pipeline Mention Model Previous non-NIL label (or “none”) is feature for current mention. 12

Observation 1: Local context is insufficient. Need sentence-level measure. (Patwardhan & Riloff, 2009) 13 Two bridges were destroyed... in Baghdad last night in a resurgence of bomb attacks in the capital city.... and $50 million in damage was caused by a hurricane that hit Miami on Friday.... to make way for modern, safer bridges that will be constructed early next year. ✓ ✗ ✗

Baseline Models + Sentence Relevance Binary relevance classifier – unigram / bigram features HardSent: Discard all mentions in irrelevant sentences. SoftSent: Sentence relevance is feature for mention classification. 14

Observation 2: Sentence relevance depends on surrounding context. (Huang & Riloff, 2012) 15 “Obama was attacked.” (political attack vs. terrorist attack) “He use a gun.” (weapon in terrorist event?)

Joint Inference Models Idea: Model sentence relevance and mention labels jointly – yield globally optimal decisions. Machinery: Conditional Random Fields (CRFs). Model joint probability of relevance labels and mention labels conditioned on input features. Encode dependencies among labels. Software: Factorie ( Flexibly design CRF graph structures. Learning / Classification algorithms with exact and approximate inference. 16

First Pass Fully joint model. S M M M Approximate inference a likely culprit. 17

Second Pass Two linear-chain CRFs with relevance threshold. S S S M M M 18

Analysis Many errors are reasonable extractions, but come from irrelevant documents. Learned CRF model weights: 19 RelLabel > = RelLabel = RelLabel = RelRel = RelRel = RelRel = RelRel = The kidnappers were accused of kidnapping several businessmen for high sums of Money.

Possibilities for improvement Label-specific relevance thresholds. Leverage Coref (Skip Chain CRFs). Incorporate doc-level relevance signal. 20

State of the art Huang & Riloff (2012) P / R / F 1 : 0.58 / 0.60 / 0.59 CRF sentence model with local mention classifiers. Textual cohesion features to model sentence chains. Multiple binary mention classifiers (SVMs). 21

Future Work Apply CRF models to plane crash dataset. New terrorism dataset from Wikipedia. Hybrid models: combine supervised MUC4 data with distant supervision on Wikipedia data. 22

Thanks! 23