The Impact of Task and Corpus on Event Extraction Systems Ralph Grishman New York University Malta, May 2010 NYU.

Slides:



Advertisements
Similar presentations
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Advertisements

Imbalanced data David Kauchak CS 451 – Fall 2013.
1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -
Automatic Discovery of Useful Facet Terms Wisam Dakka – Columbia University Rishabh Dayal – Columbia University Panagiotis G. Ipeirotis – NYU.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A
WIMS 2014, Thessaloniki, June 2014 A soft frequent pattern mining approach for textual topic detection Georgios Petkos, Symeon Papadopoulos, Yiannis Kompatsiaris.
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
CS4705.  Idea: ‘extract’ or tag particular types of information from arbitrary text or transcribed speech.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.
July 9, 2003ACL An Improved Pattern Model for Automatic IE Pattern Acquisition Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
4/14/20051 ACE Annotation Ralph Grishman New York University.
Taking the Kitchen Sink Seriously: An Ensemble Approach to Word Sense Disambiguation from Christopher Manning et al.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Learning Subjective Adjectives from Corpora Janyce M. Wiebe Presenter: Gabriel Nicolae.
Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Jan 4 th 2013 Event Extraction Using Distant Supervision Kevin Reschke.
Lexical Semantics CSCI-GA.2590 – Lecture 7A
Issues with Data Mining
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods Oh-Woog Kwon KLE Lab. CSE POSTECH.
Natural Language Processing Group Department of Computer Science University of Sheffield, UK Improving Semi-Supervised Acquisition of Relation Extraction.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
NLP: An Information Extraction Perspective Ralph Grishman September 2005 NYU.
Shane T. Mueller, Ph.D. Indiana University Klein Associates/ARA Rich Shiffrin Indiana University and Memory, Attention & Perception Lab REM-II: A model.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
HyperLex: lexical cartography for information retrieval Jean Veronis Presented by: Siddhanth Jain( ) Samiulla Shaikh( )
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A Ralph Grishman NYU.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Learning Extraction Patterns for Subjective Expressions 2007/10/09 DataMining Lab 안민영.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
Relation Extraction: Rule-based Approaches CSCI-GA.2590 Ralph Grishman NYU.
Learning to Extract CSCI-GA.2590 Ralph Grishman NYU.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.
Graph-based WSD の続き DMLA /7/10 小町守.
Automatically Labeled Data Generation for Large Scale Event Extraction
NYU Coreference CSCI-GA.2591 Ralph Grishman.
Introduction Task: extracting relational facts from text
Natural Language Processing at NYU: the Proteus Project
Presentation transcript:

The Impact of Task and Corpus on Event Extraction Systems Ralph Grishman New York University Malta, May 2010 NYU

2 Event Extraction (“EE”) EE systems extract from text all instances of a given type of event, along with the event’s participants and modifiers. There’s been considerable research over the past decade on how to model such events, and how to learn such models But most advances are only tested on one or two types of events. We don’t always appreciate the degree to which particular approaches depend on the type of event and test corpus.

3 A Bit of EE History MUC scenario template 1987 – 1998 MUC-3/4: terrorist incidents MUC-6: executive succession Event 99 Move towards simpler templates ACE 2005 Inventory of 33 elementary news events Bio-molecular (Bio-creative, Bio-NLP)

Event models Largely based on local syntactic context –In simplest form, SVO patterns or comparable nominal patterns with semantic class constraints organization attacked location organization’s attack on location –Some gain from chain and tree patterns organization launched an attack on location –May implement as pattern matcher or as classifier using basically the same features 4

5 Impact we will explore this morning Breadth of task vs. Learning strategy Breadth of corpus vs. Event model

6 Breadth of Task EE fills an event template (with possible sub-templates) How wide a range of information is captured in this template? MUC-3/4: an attack and its effect on people and buildings ACE: attack and effects reported separately MUC-6: leaving job and starting new job reported together ACE: leaving job and starting job reported separately

Semi-supervised learning strategies Supervised EE training is very expensive … Lots of types of events Lots of paraphrases of each event Event annotation is slow (because information is complex) So semi-supervised methods are particularly attractive Start with seed set Grow incrementally (‘bootstrapping’) Stop the bootstrapping –by using annotated development sample or –by training multiple mutually exclusive events (counter-training) 7

8 Document-centric Event Discovery Premise: patterns which occur relatively more frequently in event- relevant documents (than in other documents) are event-relevant patterns [Riloff 1996] Procedure: [Yangarber 2000] Start with seed patterns Retrieve documents containing selected patterns Extract all patterns from retrieved documents Rank patterns by relative frequency Add top-ranked patterns to selected set Repeat

9 Successes and difficulties Document-centric strategy successful for MUC-3 and MUC-6 Captures related events But this strategy performs poorly for some ACE events –High degree of co-occurrence between selected event types 47% of documents reporting an attack also report a death –Natural scenarios of related (co-occurring) events Starting and leaving a job; crime and arrest; etc. –Semi-Supervised Learner quickly expands from seed events (representing a single event type) to related event types in the natural scenario

Alternatives to document-centric strategies WordNet-based strategy [Stevenson and Greenwood 2005] –Expand seed set by replacing words in patterns by most similar lexical items Based on WordNet synonyms & hypernyms Encounters problems with highly polysemous words Combined strategy [S NYU 2010] –Document-based information reduces problems of polysemy 10

Event extraction performance (F measure) 11

Breadth of Corpora Are documents in test corpus primarily about events of interest, or are they an unselected, heterogenous corpus? Issues: EE corpora are expensive Typically EE test corpora are enriched to be sure they have enough relevant events –MUC-3 and MUC-6 … over 50% relevant documents –ACE newswire … an average of 3 attack events/document Makes evaluation somewhat unrealistic 12

Why does corpus breadth matter? Event detection a Word Sense Disambiguation (WSD) problem Fred attacked Mary [physically or verbally?] Fred left the Pentagon [retired or went on a trip?] –Local patterns not sufficient May be a minor problem in a selected corpus but a major one in a heterogenous corpus Attack event detector trained on ACE corpus tested on ACE newswire: recall 66% spurious event rate 8% tested on New York Times: recall 46% spurious event rate 111% 13

Handling heterogenous corpora Add a topic model to do WSD for event triggers –Document-level bag-of-words model predicting whether document contains an attack event –Combine with traditional local model –[similar to Patwardhan & Riloff 2009 relevant-region model] Attack event detector trained on ACE corpus, augmented with topic model tested on ACE newswire: recall 66% spurious event rate 7% tested on New York Times: recall 33% spurious event rate 24% 14

Conclusion: Implications for EE Evaluation Continued progress in EE will require Appreciating the range of EE tasks –And how the choice of task affects EE strategy And appreciating the influence of test corpora –Evaluating on larger, more heterogenous corpora –With more selective annotation 15

Thank you.