Discovering Emerging Entities with Ambiguous Names

Slides:



Advertisements
Similar presentations
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Advertisements

The objective of an Entity Recognition and Disambiguation (ERD) system is to recognize mentions of entities in a given text, disambiguate them, and map.
On-line Compilation of Comparable Corpora and Their Evaluation Radu ION, Dan TUFIŞ, Tiberiu BOROŞ, Alexandru CEAUŞU and Dan ŞTEFĂNESCU Research Institute.
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Encyclopaedic Annotation of Text.  Entity level difficulty  All the entities in a document may not be in reader’s knowledge space  Lexical difficulty.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information.
Automatic Web Page Categorization by Link and Context Analysis Giuseppe Attardi Antonio Gulli Fabrizio Sebastiani.
A Robust System Architecture For Mining Semi-structured Data By Aby M Mathew CSE
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Text Analytics And Text Mining Best of Text and Data
The Problem Finding information about people in huge text collections or on-line repositories on the Web is a common activity Person names, however, are.
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
Open Information Extraction using Wikipedia
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
Text Mining In InQuery Vasant Kumar, Peter Richards August 25th, 1999.
Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.
Learning to Link with Wikipedia David Milne and Ian H. Witten Department of Computer Science, University of Waikato CIKM 2008 (Best Paper Award) Presented.
Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News (proceedings page 255) Mike Dowman Valentin Tablan Hamish Cunningham.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Understanding User’s Query Intent with Wikipedia G 여 승 후.
Team Members Dilip Narayanan Gaurav Jalan Nithya Janarthanan.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Mining Wiki Resoures for Multilingual Named Entity Recognition Xiej un
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
ESWC 2005, Crete, Greece Semantically Enhanced Television News through Web and Video Integration Multimedia and the Semantic Web workshop Borislav PopovMike.
Multi-Source Information Extraction Valentin Tablan University of Sheffield.
NELL Knowledge Base of Verbs
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
Visual Information Retrieval
Exploiting Wikipedia as External Knowledge for Document Clustering
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Data Mining Jim King.
DM-Group Meeting Liangzhe Chen, Nov
Preeti Bhargava, Nemanja Spasojevic, Guoning Hu
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
GLOW- Global and Local Algorithms for Disambiguation to Wikipedia
X Ambiguity & Variability The Challenge The Wikifier Solution
Presentation 王睿.
EDIUM: Improving Entity Disambiguation via User modelling
Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
Summarization for entity annotation Contextual summary
Text Annotation: DBpedia Spotlight
Entity Linking Survey
Introduction to Search Engines
Text Analytics in ITS 2.0: Annotation of Named Entities
Presentation transcript:

Discovering Emerging Entities with Ambiguous Names 2015.08.17

Content Motivation Approach Experiments Architecture Disambiguation Confidence Extended key phrase model Experiments

Motivation Emerging entities (EE): Our world is highly dynamic. Every day, new songs are composed, new movies are released, new companies are founded, there are new weddings, sports matches… new entities may appear under the same names as existing ones: when hurricane “Sandy” occurred, several singers, cities, and other entities with the name “Sand” already existed in Wikipedia there could be multiple EE's, all out-of-KB, with the same name. NERD: named entity recognition and disambiguation

Motivation Prior methods Threshold on the scores they computed for mapping a given mention to a candidate entities. In difficult situations, the empirical quality is not good[1] hard to tune in a robust manner Have adverse effect on other entity linking decisions

Approach Assessing the confidence of the NED method's mapping of mentions to in-KB entities perturbing the mention-entity space of the NED method Enriching a possible EE with a keyphrase representation builds a global set of keyphrases compute a model difference between the global model and the union of all in-KB models

Architecture NED based on AIDA[2] and KORE[3]

Disambiguation Confidence Normalizing Scores Perturbing Mentions Perturbing Entities

Perturbing Entities

Extended keyphrase model Keyphrases for Existing Entities In-KB entities: Wikipedia category, href anchor texts Harvesting keyphrases from document collections Only for the entities that high-confidence mentions are mapped to by the given NED method Extract all sequences conforming to a set of predefined part-of-speech tag patterns Mainly proper nouns and technical terms Modeling Emerging Entities Exploiting news streams mining EE-specific keyphrases from chunks of news articles are in the vicinity of the publication date and time of the input document Model difference

Experiments Disambiguation Confidence precision@k%conf: by disposing the mentions below k% and computing the fraction of correctly disambiguated mentions with respect to the ground truth for the remaining mentions #men@k%conf: number of mention above K% confidence mean average precision(MAP)

Experiments Emerging Entity Discovery 150 Associated Press news articles published on October 1st and 150 published on November 1st, 2010 annotated with EE Wikipedia 2010-08-17

Experiments Emerging Entity Discovery D, the collection of documents; Gd, all unique mentions in document d belongs D annotated by a human annotator with a gold standard entity; Gd|EE, the subset of G annotated with an emerging entity EE; Gd|KB, the subset of G annotated with an with an existing, in-KB entity; Ad, all unique mentions in document d belongs D automatically annotated by a method. IW: Illinois Wikier linker 3,436 mentions, out of which 162 are both ambiguous and refer to an emerging entity.

Reference [1] B. Hachey, W. Radford, J. Nothman, M. Honnibal, and J. R. Curran. Evaluating Entity Linking with Wikipedia. Artificial Intelligence, 194(C):130- 150, 2013. [2] J. Hoart, M. A. Yosef, I. Bordino, H. Furstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust Disambiguation of Named Entities in Text. EMNLP pages 782-792, 2011. [3] J. Hoart, S. Seufert, D. B. Nguyen, M. Theobald, and G. Weikum. KORE: Keyphrase Overlap Relatedness for Entity Disambiguation. CIKM pages 545- 554 , 2012. [4] Johannes Hoffart, Yasemin Altun, Gerhard Weikum. Discovering Emerging Entities with Ambiguous NamesIn. WWW 2014.