Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.

Slides:



Advertisements
Similar presentations
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Advertisements

The Impact of Task and Corpus on Event Extraction Systems Ralph Grishman New York University Malta, May 2010 NYU.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman.
Person Name Disambiguation by Bootstrapping Presenter: Lijie Zhang Advisor: Weining Zhang.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Presented by Iman Sen.
Managing Distributed Collections: Evaluating Web Page Change, Movement, and Replacement Richard Furuta and Frank Shipman Center for the Study of Digital.
Distributional Clustering of English Words Fernando Pereira- AT&T Bell Laboratories, 600 Naftali Tishby- Dept. of Computer Science, Hebrew University Lillian.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.
Tag-based Social Interest Discovery
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Natural Language Processing Group Department of Computer Science University of Sheffield, UK Improving Semi-Supervised Acquisition of Relation Extraction.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
A Language Independent Method for Question Classification COLING 2004.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
TimeML compliant text analysis for Temporal Reasoning Branimir Boguraev and Rie Kubota Ando.
Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A Ralph Grishman NYU.
Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Annotating and measuring Temporal relations in texts Philippe Muller and Xavier Tannier IRIT,Université Paul Sabatier COLING 2004.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Authors: Yutaka Matsuo & Mitsuru Ishizuka Designed by CProDM Team.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Data Mining and Text Mining. The Standard Data Mining process.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Statistical NLP: Lecture 9
Introduction Task: extracting relational facts from text
Family History Technology Workshop
Text Categorization Berlin Chen 2003 Reference:
Unsupervised Learning of Narrative Schemas and their Participants
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1 Dept. Of Computer Science New York University

Introduction Internet search engines cannot answer complicated questions. “ a list of recent mergers and acquisitions of companies ” “ current leaders of nations from all over the world ” Information Extraction provides methods to extract information such as events and relations between entities. Domain dependent The goal is to automatically discovering useful relations among arbitrary entities in large text corpora.

Introduction Define a relation broadly as an affiliation, role, location, part- whole, social relationship and so on. Information should be extracted: “ George Bush (PERSON) was inaugurated as the president of the United States (GPE). ” Unsupervised method does not need richly annotated corpora and any instances as initial seeds for weakly supervised learning. Since we cannot know the relations in advance. Only need a NE tagger. Recently developed NE tagger work quite well.

Prior Work Most of approaches to the ACE RDC task involved supervised learning such as kernel methods. Large annotated corpora needed Some adopted a weakly supervised learning approach. It is unclear how to choose and how many initial seeds needed.

Relation Discovery Overview Assume that pairs of entities occurring in similar context can be clustered and each pair in a cluster is an instance of the relation. 1. Tag NE in text corpora 2. Get co-occurrence pairs of NE and their context 3. Measure context similarities among pairs of NEs. 4. Make clusters of pairs of NEs. 5. Label each cluster of pairs of NEs. Run NE tagger, get all context words within a certain distance; if context words of A-B and C-D pair are similar, these two pairs are placed into the same cluster(the same relation), in this case the relation is merger and acquisition.

Relation Discovery

NE tagging use the extended NE tagger(Sekine, 2001) to detect useful relations. Collect intervening words between two NEs for each co- occurrence. Two NEs are considered to co-occur if they appear within the same sentence and separated by at most N intervening words. Different orders are considered as different contexts. That is, e 1 …e 2 and e 2 …e 1 are collected as different contexts. Passive voice : collect the base forms of words which are stemmed by a POS tagger, but verb past participles are distinguished from other verb forms. Less frequent pairs of NEs should be eliminated. Set a frequency threshold

Relation Discovery Calculate similarity between the set of contexts of NE pairs. Vector space model and cosine similarity Only compare NE pairs which have the same types, e.g., one PERSON-GPE pair and another PERSON-GPE pair. Eliminate stop words, words in parallel expressions, and expressions peculiar to particular source documents. A context vector for each NE pair consists of the bag of words formed from all intervening words from all co-occurrences of two NEs. Different orders: if a word w i occurred L times in e 1 …e 2, M times in e 2 …e 1, the tf i of w i is defined as L-M. If the norm |α| is small due to the lack of context words, the similarity might be unreliable, so define a threshold to eliminate short context vectors.

Relation Discovery We can cluster the NE pairs base on the similarity among context vectors of them. We do not know the # of clusters in advance so we adopt hierarchical clustering. Using complete linkage Label the cluster with the most frequent word in all combinations of the NE pairs in the same cluster. The frequencies are normalized.

Experiments Experiment with one year of The New York Times(1995) as our corpus. Maximum context word length to 5 words Frequency threshold to 30 Use the patterns, “,.*,”, “and” and “or” for parallel expression, “) --” as peculiar to The New York Times. Stop words include symbols and words which occurred as frequent words.

Experiments Analyze the data set manually and identified the relations for two domains. PERSON-GPE : 177 distinct pairs, 38 classes(relations). COMPANY-COMPANY : 65 distinct pairs, 10 classes.

Evaluation The errors in NE tagging were eliminated to evaluate correctly. For each cluster, determine the relation R ( major relation ) of the cluster as the most frequently represented relation. NE pairs with relation R in a cluster whose major relation was R were counted as correct. N correct defined as total # of correct pairs in all clusters. N incorrect defined as total # of incorrect pairs in all clusters. N key defined as total # of pairs manually classified in clusters.

Evaluation These values vary depending on the threshold of cosine similarity. The best F-measure was 82 in the PER-GPE and 77 in the COM- COM domain, found near 0 cosine similarity threshold. Generally it is difficult to determine the threshold in advance. P F R P F R

Evaluation We also investigate each cluster with threshold just above PER-GRE clusters and 15 COM-COM clusters. 80 and 75 F-measure, very close to the best. The larger clusters for each domain and the ratio of # of pairs bearing the major relation to the total # of pairs is shown.

Evaluation If two NE pairs in a cluster share a particular context word, they are considered to be linked (with respect to this word). The relative frequency for a word is the # of such links, relative to the maximal possible number of links( N(N-1)/2 for a cluster). If the relative frequency is 1.0, this word is shared by all NE pairs. The frequent common words could be regarded as suitable labels for the relations.

Discussion The performance was a little higher in the PER-GRE domain perhaps because there were more NE pairs with high similarity. The COM-COM domain was more difficult to judge due to the similarity of relations. The pair of companies in M&A relation might also subsequently appear in the parent relation. Asymmetric properties caused more difficulties in the COM-COM domain. In determing similarity A→B with C→D and A→B with D→C, sometimes the wrong correspondence ends up being favored.

Discussion The main reason for undetected or mis-clustered NE pairs is the absence of common words in the pairs ’ context which explicitly represent the particular relations. Mis-clustered NE pairs were clustered by accidental words. The outer context words may be helpful while extending context in this way have to be carefully evaluated.

Discussion We tried single linkage and average linkage as well. The best F-measure is in complete linkage. The best threshold differs in the single and average linkage. The best threshold just above 0 means that each pair in a cluster shares at least one word in common. Sometimes the less frequent pairs might be in valuable, and one way to address this defect would be through bootstrapping.

Conclusion The key idea is to cluster pairs pairs of NEs according to the similarity of the context words intervening between them. Experiments show that not only the relations could be detected with high recall and precision, but also the labels could be automatically provided. We are planning to discover less frequent pairs of NEs by combining with bootstrapping.