The objective of an Entity Recognition and Disambiguation (ERD) system is to recognize mentions of entities in a given text, disambiguate them, and map.

Slides:

Advertisements

Similar presentations

Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.

Advertisements

Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,

Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based.

WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.

Entity Tracking in Real- Time using Sub-Topic Detection on Twitter SANDEEP PANEM, ROMIL BANSAL, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF.

Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.

Modeling the Evolution of Product Entities “Newer Model" Feature on Amazon Paper ID: sp093 1.Product search engine ranking 2.Recommendation systems 3.Comparing.

Modeling the Evolution of Product Entities Priya Radhakrishnan 1, Manish Gupta 1,2, Vasudeva Varma 1 1 Search and Information Extraction Lab, IIIT-Hyderabad,

Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao.

1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.

WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.

Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.

The Problem Finding information about people in huge text collections or on-line repositories on the Web is a common activity Person names, however, are.

Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.

C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )

TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.

Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.

Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.

1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.

Short Text Understanding Through Lexical-Semantic Analysis

Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar.

Fine-Grained Location Extraction from Tweets with Temporal Awareness Date:2015/03/19 Author:Chenliang Li, Aixin Sun Source:SIGIR '14 Advisor:Jia-ling Koh.

Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.

11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.

Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad

Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.

Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.

Learning to Link with Wikipedia David Milne and Ian H. Witten Department of Computer Science, University of Waikato CIKM 2008 (Best Paper Award) Presented.

Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.

Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma

Natural Language Questions for the Web of Data 1 Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany 2 Shady Elbassuoni.

TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.

CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.

Topical Clustering of Search Results Scaiella et al [Originally published in – “Proceedings of the fifth ACM international conference on Web search and.

An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.

Evgeniy Gabrilovich and Shaul Markovitch

Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia.

From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.

Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.

Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.

LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.

Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.

Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.

Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.

Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Named Entity Disambiguation by Leveraging Wikipedia Semantic Knowledge Presenter : Jiang-Shan Wang Authors.

An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)

Exploiting Wikipedia as External Knowledge for Document Clustering

Web News Sentence Searching Using Linguistic Graph Similarity

and Knowledge Graphs for Query Expansion Saeid Balaneshinkordan

Research at Open Systems Lab IIIT Bangalore

Applying Key Phrase Extraction to aid Invalidity Search

#VisualHashtags Visual Summarization of Social Media Events using Mid-Level Visual Elements Sonal Goel (IIIT-Delhi), Sarthak Ahuja (IBM Research, India),

EDIUM: Improving Entity Disambiguation via User modelling

Extracting Semantic Concept Relations

Discovering Emerging Entities with Ambiguous Names

35 35 Extracting Semantic Knowledge from Wikipedia Category Names

Effective Entity Recognition and Typing by Relation Phrase-Based Clustering

Text Annotation: DBpedia Spotlight

Using Uneven Margins SVM and Perceptron for IE

ProBase: common Sense Concept KB and Short Text Understanding

A new era: Topic-based Annotators

Yingze Wang and Shi-Kuo Chang University of Pittsburgh

Presentation transcript:

The objective of an Entity Recognition and Disambiguation (ERD) system is to recognize mentions of entities in a given text, disambiguate them, and map them to the entities in a given entity collection [1] or knowledge base. Paper ID: erd17 has the lowest latency among the ERD short track participant systems 2.Source code and dataset: ty-Recognition-and-Disambiguation- Challenge ty-Recognition-and-Disambiguation- Challenge Search and Information Extraction Lab (SIEL) IIIT-Hyderabad Entity Recognition and Disambiguation Challenge ERD 2014 CHALLENGE APPROACH Built from the state-of-the-art TAGME [2] system with time and performance optimizations Mention Detection - Reduce the number of DB look-ups using mention filtering. Disambiguation 1.Consider relatedness (rel) from identical pages 2.score(p a ) = α * rel a (p a ) + (1- α) * Pr(p a |a) 3.Prominent senses restriction Pruning 1.ρ(a  p a ) = coherence(a  p a ) + γ lp(a) DATA PREPROCESSING AND MEASURES Index: Process English Wikipedia dump to create three indexes 1.In-Link Graph Index 2.Anchor Dictionary 3.WikiTitlePageId Index Measures 1.Link Frequency link(a) 2.Total Frequency freq(a) 3.Pages Linking to Anchor a, Pg(a) 4.Link Probability lp(a) 5.Wikipedia Link-based Measure δ [3] ERD Knowledge Base [1]: A snapshot of Freebase from 9/29/2013, keeping only those entities that have English Wikipedia pages associated with them.Freebase MENTION DETECTION DISAMBIGUATION PRUNING 1.Relatedness between Pages: For identical pages, the δ should be 1. 2.Prominent Senses Restriction: Restrict to senses whose inlinks contribute more than x% of total inlinks. 3.Disambiguation Score: For mention a from candidate sense p a Determined α = 0.83 experimentally RUNS: Run3 achieved an F1 of rel(p a, p b ) = 1 - δ(p a, p b ) score(p a ) = α * rel a (p a ) + (1- α) * Pr(p a |a) ρ(a  p a ) = coherence(a  p a ) + γ lp(a) RESULTS CONTRIBUTIONS REFERENCES Exploiting Wikipedia Inlinks for Linking Entities in Queries Priya Radhakrishnan IIIT, Hyderabad, India Manish Gupta* IIIT, Hyderabad, India Vasudeva Varma IIIT, Hyderabad, India Romil Bansal IIIT, Hyderabad, India 1.Stopword Filtering: Filter out the mentions that contain only stopwords. We use the standard JMLR stopword list. 2.Twitter POS Filtering: The query text is Part-Of-Speech (POS) tagged with a tweet POS tagger [4]. Mentions that do not contain at least one word with POS tag as NN (indicating noun) are ignored. RUNS: Run5 and Run7. Stopword filtering gave better results (F1=0.53) than TPOS Filtering (F1=0.48) [1] D. Carmel, M.W.Chang, E. Gabrilovich, B.J.P.Hsu, K.Wang. ERD 2014: Entity Recognition and Disambiguation Challenge SIGIR Forum, 2014 [2] P. Ferravina, U. Scaiella. TAGME: On-the-fly Annotation of Short Text Fragments. CIKM 2010 [3] D. Milne and I. H. Witten. An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. AAAI Workshop on Wikipedia and Artificial Intelligence: 2008 [4] A. Ritter, S. Clark, Mausam, and O. Etzioni. Named Entity Recognition in Tweets: An Experimental Study. EMNLP 2011 [5] S. Cucerzan. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. EMNLP-CoNLL 2007 Definitions 1.Wikipedia anchor, a 2.Page linking to anchor a, p a 3.Prior probability of a linking to page p, Pr(p a |a) 4.Relatedness score from p a to a, rel a (p a ) 5.Disambiguation constant, α and Pruning constant, γ Run #Run DescriptionF1 Score 1 Base System Disambiguation Score uses Pr(p a |a) instead of lp(a) Threshold Combination + Stopword Filtering + Prominent Senses Restriction Linear Combination + Non-normalized Vote + Single- Row Anchor Index + Singleton Object TPOS Filtering Pruning Score uses lp(a) instead of Pr(p a |a) Stopword Filtering 0.53 This paper is supported by SIGIR Donald B. Crouch grant ACKNOWLEDGEMENTS * The author is Senior Applied Researcher at Microsoft and Adjunct Faculty at IIIT Hyderabad 1.Coherence: The average relatedness between the given sense p a and the senses assigned to all other anchors. 2.Pruning Score: Combines coherence and link probability. Pruning constant γ = 0.1 in our experiments. ρ = 0.05 was the threshold value. RUNS: Run6 Each run was tested on a 500 query test set, except runs 1 and 2 which were tested on a 100 query test set.