Event Extraction Using Distant Supervision Kevin Reschke, Martin Jankowiak, Mihai Surdeanu, Christopher D. Manning, Daniel Jurafsky 30 May 2014 Language.

Slides:



Advertisements
Similar presentations
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Advertisements

Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Distant Supervision for Relation Extraction without Labeled Data CSE 5539.
Overview of the TAC2013 Knowledge Base Population Evaluation: Temporal Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji,
Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji, and.
Search-Based Structured Prediction
Presenters: Arni, Sanjana.  Subtask of Information Extraction  Identify known entity names – person, places, organization etc  Identify the boundaries.
Distant Supervision for Knowledge Base Population Mihai Surdeanu, David McClosky, John Bauer, Julie Tibshirani, Angel Chang, Valentin Spitkovsky, Christopher.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
CS4705.  Idea: ‘extract’ or tag particular types of information from arbitrary text or transcribed speech.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Introduction to Machine Learning Approach Lecture 5.
Modeling Missing Data in Distant Supervision for Information Extraction Alan Ritter Luke Zettlemoyer Mausam Oren Etzioni 1.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Online Learning Algorithms
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
Jan 4 th 2013 Event Extraction Using Distant Supervision Kevin Reschke.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Graphical models for part of speech tagging
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Event Extraction Using Distant Supervision Kevin Reschke, Mihai Surdeanu, Martin Jankowiak, David McClosky, Christopher Manning Nov 15, 2012.
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
Wei Xu, Ralph Grishman, Le Zhao (CMU) New York University Novmember 24, 2011.
Joint Models of Disagreement and Stance in Online Debate Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor, Marilyn Walker University of California,
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Mebi 591D – BHI Kaggle Class Baselines kaggleclass.weebly.com/
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Copyright © 2013 by Educational Testing Service. All rights reserved. 14-June-2013 Detecting Missing Hyphens in Learner Text Aoife Cahill *, Martin Chodorow.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Template-Based Event Extraction Kevin Reschke – Aug 15 th 2013 Martin Jankowiak, Mihai Surdeanu, Dan Jurafsky, Christopher Manning.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Machine Learning for Natural Language Processing Robin Tibor Schirrmeister Seminar Information Extraktion Wednesday, 6th November 2013.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Correcting Comma Errors in Learner Essays, and Restoring Commas in Newswire Text Ross Israel Indiana University Joel Tetreault Educational Testing Service.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Cold-Start KBP Something from Nothing Sean Monahan, Dean Carpenter Language Computer.
Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.
Conditional Random Fields & Table Extraction Dongfang Xu School of Information.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Learning Relational Dependency Networks for Relation Extraction
Experience Report: System Log Analysis for Anomaly Detection
Automatically Labeled Data Generation for Large Scale Event Extraction
A Brief Introduction to Distant Supervision
Erasmus University Rotterdam
Two Discourse Driven Language Models for Semantics
Aspect-based sentiment analysis
Introduction Task: extracting relational facts from text
Using Uneven Margins SVM and Perceptron for IE
Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.
Presentation transcript:

Event Extraction Using Distant Supervision Kevin Reschke, Martin Jankowiak, Mihai Surdeanu, Christopher D. Manning, Daniel Jurafsky 30 May 2014 Language Resources and Evaluation Conference Reykjavik, Iceland

Overview Problem: Information extraction systems require lots of training data. Human annotation is expensive and does not scale. Distant supervision: Generate training data automatically by aligning existing knowledge bases with text. Approach shown for relation extraction: Minz et al (ACL); Surdeanu et al (EMNLP). Goal: Adapt distant supervision to event extraction. 2

Outline Present new dataset and extraction task. Describe distant supervision framework. Evaluate several models within this framework. 3

Plane Crash Dataset 80 plane crash events from Wikipedia infoboxes (40 train / 40 test). Newswire corpus from 1988 to present (Tipster/Gigaword). Download: ist-sup-event-extraction.shtml 4

Template-Based Event Extraction “… Delta Flight 14 crashed in Mississippi killing 40 … ” … … News CorpusKnowledge Base 5

Distant Supervision (Relation Extraction) Noisy Labeling Rule: If slot value and entity name appear together in a sentence, then assume that sentence encodes the relation. 6 Training Fact: Entity: Apple founder = Steve Jobs Steve Jobs was fired from Apple in founder Apple co-founder Steve Jobs passed away in founder Noise!!!

Distant Supervision (Event Extraction) Sentence level labeling rule won’t work. 1.Many events lack proper names. “The crash of USAir Flight 11” 2.Slots values occur separate from names. The plane went down in central Texas. 10 died and 30 were injured in yesterday’s tragic incident. Heuristic solution: Document-level labeling rule. Use Flight Number as proxy for event name. 7 Training Fact: {, } …Flight 11 crash Sunday… …The plane went down in [Toronto] CrashSite …

Automatic Labeling Results 38,000 Training Instances. 39% Noise: Good: At least 52 people survived the crash of the Boeing 737. Bad: First envisioned in 1964, the Boeing 737 entered service in

Model 1: Simple Local Classifier Multiclass Logistic Regression Features: unigrams, POS, NETypes, part of doc, dependencies US Airways Flight 133 crashed in Toronto LexIncEdge-prep_in-crash-VBD UnLexIncEdge-prep_in-VBD PREV_WORD-in 2ndPREV_WORD-crash NEType-LOCATION Sent-NEType-ORGANIZATION etc. 9

Model 2: Sequence Model with Local Inference (SMLI) Intuition: There are dependencies between labels. Crew and Passenger go together: 4 crew and 200 passengers were on board. Site often follows Site: The plane crash landed in Beijing, China. Fatalities never follows Fatalities * 20 died and 30 were killed in last Wednesday’s crash. Solution: A sequence model where previous non-NIL label is a feature. At train time: use noisy “gold” labels. At test time: use classifier output. 10

Motivating Joint Inference Problem: Local sequence models propagate error. 20 dead, 15 injured in a USAirways Boeing 747 crash. Gold: Fat. Inj. Oper. A.Type. Pred: Fat. Surv. ?? ?? 11

Motivating Joint Inference Problem: Local sequence models propagate error. 20 dead, 15 injured in a USAirways Boeing 747 crash. Gold: Fat. Inj. Oper. A.Type. Pred: Fat. Surv. ?? ?? Gold: Fat. Fat. Oper. A.Type. Pred: Fat. Inj. ?? ?? 11

Model 3: Condition Random Fields (CRF) Linear-chain CRF. Algorithm: Laferty et al. (2001). Software: Factorie. McCallum et al. (2009) Jointly model all entity mentions in a sentence. 12

Model 4: Search-based structured prediction (Searn) General framework for infusing global decisions into a structured prediction task (Daumé III, 2009). We use Searn to implement a sequence tagger over a sentence’s entity mentions. Searn’s “chicken and egg” problem: Want to train an optimal classifier based on a set of global costs. Want global costs to be computed from the decisions made by an optimal classifier. Solution: Iterate! 13

A Searn iteration Start with classifier H i. For each training mention: Try all possible labels. Based on label choice, predict remaining labels using H i. Compute global cost for each choice. Use computed costs to train classifier H i dead, 15 injured in a USAirways Boeing 747 crash. Gold: Fat. Fat. Oper. A.Type H i : Fat.

A Searn iteration Start with classifier H i. For each training mention: Try all possible labels. Based on label choice, predict remaining labels using H i. Compute global cost for each choice. Use computed costs to train classifier H i dead, 15 injured in a USAirways Boeing 747 crash. Gold: Fat. Fat. Oper. A.Type H i : Fat. Fat. Inj. etc…

A Searn iteration Start with classifier H i. For each training mention: Try all possible labels. Based on label choice, predict remaining labels using H i. Compute global cost for each choice. Use computed costs to train classifier H i dead, 15 injured in a USAirways Boeing 747 crash. Gold: Fat. Fat. Oper. A.Type H i : Fat. Fat. NILNIL Inj. Oper. A. Type etc…

A Searn iteration Start with classifier H i. For each training mention: Try all possible labels. Based on label choice, predict remaining labels using H i. Compute global cost for each choice. Use computed costs to train classifier H i dead, 15 injured in a USAirways Boeing 747 crash. Gold: Fat. Fat. Oper. A.Type H i : Fat. Fat. NILNIL Cost: 2 Inj. Oper. A. Type Cost: 1 etc…

Evaluation 15 Task: Reconstruct knowledge base given just flight numbers. Metric: Multiclass Precision and Recall Precision: # correct (non-NIL) guesses / total (non-NIL) guesses Recall: # slots correctly filled / # slots possibly filled

Feature Ablation 16

Feature Ablation 16

Feature Ablation 16

Feature Ablation 16

Summary New plane crash dataset and evaluation task. Distant supervision framework for event extraction. Evaluate several models in this framework. 17

Thanks! 18