Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,

Slides:

Advertisements

Similar presentations

The objective of an Entity Recognition and Disambiguation (ERD) system is to recognize mentions of entities in a given text, disambiguate them, and map.

Advertisements

Product Review Summarization Ly Duy Khang. Outline 1.Motivation 2.Problem statement 3.Related works 4.Baseline 5.Discussion.

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.

Entity Tracking in Real- Time using Sub-Topic Detection on Twitter SANDEEP PANEM, ROMIL BANSAL, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF.

Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.

Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.

Modeling the Evolution of Product Entities “Newer Model" Feature on Amazon Paper ID: sp093 1.Product search engine ranking 2.Recommendation systems 3.Comparing.

Modeling the Evolution of Product Entities Priya Radhakrishnan 1, Manish Gupta 1,2, Vasudeva Varma 1 1 Search and Information Extraction Lab, IIIT-Hyderabad,

LingPipe Does a variety of tasks  Tokenization  Part of Speech Tagging  Named Entity Detection  Clustering  Identifies.

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

Person Name Disambiguation by Bootstrapping Presenter: Lijie Zhang Advisor: Weining Zhang.

Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.

PAIRS Forming a ranked list using mined, pairwise comparisons Reed A. Coke, David C. Anastasiu, Byron J. Gao.

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.

MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.

Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.

C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.

Survey of Semantic Annotation Platforms

Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

Fine-Grained Location Extraction from Tweets with Temporal Awareness Date:2015/03/19 Author:Chenliang Li, Aixin Sun Source:SIGIR '14 Advisor:Jia-ling Koh.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.

Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.

Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.

COLING 2012 Extracting and Normalizing Entity-Actions from Users’ comments Swapna Gottipati, Jing Jiang School of Information Systems, Singapore Management.

Publications Vamshi Ambati, Stephan Vogel and Jaime Carbonell. "Collaborative Workflow for Crowdsourcing Translation”, To Appear in the 2012 ACM Conference.

CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.

Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma

An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.

Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Semi-automatic Product Attribute Extraction from Store Website

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.

Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.

Concept-based Short Text Classification and Ranking

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

Language Identification and Part-of-Speech Tagging

Automatically Labeled Data Generation for Large Scale Event Extraction

A Brief Introduction to Distant Supervision

Proposal for Term Project

INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.

Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD

Social Knowledge Mining

Applying Key Phrase Extraction to aid Invalidity Search

EDIUM: Improving Entity Disambiguation via User modelling

Summarization for entity annotation Contextual summary

Using Uneven Margins SVM and Perceptron for IE

Entity Linking Survey

Presentation transcript:

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY, HYDERABAD 7 TH APRIL 2014

Introduction  Entity Linking is the task of associating entity name mentions in text to the correct referent entities in the knowledge base, with the goal of understanding and extracting useful information from the document.  Entity Linking could be helpful for various IR tasks like document classification and clustering, tags recommendation, relation extraction etc.

Motivation  Social Media like Twitter is a source of a wide variety of information. Identifying entities in tweets can help in various tasks like tracking products, events etc.  Tweets being short and noisy lack sufficient context for entity mention to be disambiguated completely.  So we tried to enhance the context based on the information shared by the other users about the entity on social media like Twitter along with the local context of the entity.

Related Work  Various approaches for tweet entity Linking have been proposed in the past.  Leu et. al [ELFT13] use mention-entry similarity, entry-entry similarity, and mention-mention similarity and simultaneously resolve a set of mentions from tweets.  Meij et. al [ASMP12] tried to link the entities in the tweets based on various n- grams, tweets and concept features.  Guo et. al [TLNL13] tried to model entity linking as structured learning problem by simultaneously learning mention detection and entity linking.

Our Approach (System Architecture)

Mention Detection  Mention Detection is the task of detecting phrases in the text that could be linked to possible entities in the knowledge base. We used POS patterns from ARK POS Tagger [POST11] coupled with the T-NER Tagger [NERT11] to find the mentions in the given text. 1.ARK POS Tagger: Extract all sequences of proper nouns, and label longest continuous sequence as a mention. 2.T-NER POS Tagger: Extract chunks with at least one proper noun, and label them as mention.  Merging Mentions: Merge the entity lists from the two systems. In case of conflict, select the longest possible sequence as entity mention in the text.

Entity Disambiguation  Entity Disambiguation is the task of selecting the correct candidate from the possible list of candidates for the given Entity Mention. We treated the problem of entity disambiguation as a ranking problem. We extracted the ranked entities using 3 different methods and later merged the ranked lists based on the machine learning model. 1.Wikipedia Based Measure (M1): Extract the entities that best matches the Wikipedia’s pages title and body text and rank them according to the Wikipedia’s page similarity with the mention. 2.Google Cross-Wiki Based Measure (M2): Extract and rank the entities based on the similarity between the anchor text [CLDE12] used across various web pages (for referring a Wikipedia Entity) and the mention. 3.Twitter Popularity Based Measure (M3): Extract the entities based on the similarity between the anchor text and the text used while referring the mention (in other tweets) on Twitter.

Entity Disambiguation (cont.)  The ranked lists from three different models (Wikipedia based (M1), Google Cross-Wiki Based (M2) and Twitter Popularity Based (M3)) are merged based on the LambdaMART model.  LambdaMART [ABIR10] combines MART and LambdaRank to generate an overall ranking model combining the ranks of three individual measures.  The top ranked entity is taken as the disambiguated entity for the given entity mention.

Dataset  #Microposts2014 NEEL Challenge Dataset is used for evaluating the system.  2.3K Tweets, manually annotated  70% Training – 30% Testing

Results Entity Mention Detection and Entity Disambiguation MethodAccuracy ARK POS Tagger77% T-NER POS Tagger92% ARK + T-NER (Merged) 98% MethodF1- measure M M M M1+M M2+M M1+M M1+M2+M Table 1: Performance for Mention Detection Table 2: Performance for Entity Disambiguation

Conclusion  For effective entity linking, mention detection in tweets is important. We improve the accuracy of detecting mentions by combining two Twitter POS taggers.  We resolve multiple mentions, abbreviations and spell variations of a named entity using the Wikipedia and Google Cross-Wiki Dictionary.  We also use popularity of an entity on Twitter for improving the disambiguation. Our system performed well with a F1 score of on the given dataset.

References [TNLN13] S. Guo, M.-W. Chang, and E. Kıcıman. To Link or Not to Link? A Study on End-to-End Tweet Entity Linking. In Proc. of the Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT) [ASMP12] E. Meij, W. Weerkamp, and M. de Rijke. Adding semantics to microblog posts. In WSDM ACM, 2012 [ELFT13] X. Liu, Y. Li, H. Wu, M. Zhou, F. Wei, and Y. Lu Entity linking for tweets. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics [NERT11] A. Ritter, S. Clark, Mausam, and O. Etzioni. Named Entity Recognition in Tweets: An Experimental study. In Proc. Of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011 [POST11] K. Gimpel, N. Schneider, B. O’Connor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A. Smith. Part-of-speech Tagging for Twitter: Annotation, Features, and Experiments. In Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - Volume 2 (NAACL-HLT), pages 42–47, 2011 [CLDE12] V. I. Spitkovsky and A. X. Chang. A Cross-Lingual Dictionary for English Wikipedia Concepts. In Proc. of the 8th Intl. Conf. on Language Resources and Evaluation (LREC), [ABIR10] Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting Boosting for Information Retrieval Measures. Journal of Information Retrieval, 13(3):254–270, Jun 2010