EDIUM: Improving Entity Disambiguation via User modelling

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

The objective of an Entity Recognition and Disambiguation (ERD) system is to recognize mentions of entities in a given text, disambiguate them, and map.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Entity Tracking in Real- Time using Sub-Topic Detection on Twitter SANDEEP PANEM, ROMIL BANSAL, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF.
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
Semantic Web Workshop Exploiting Synergy Between Ontologies and Recommender Systems Stuart E. Middleton, Harith Alani Nigel R. Shadbolt, David.
Title Course opinion mining methodology for knowledge discovery, based on web social media Authors Sotirios Kontogiannis Ioannis Kazanidis Stavros Valsamidis.
 Andisheh Keikha Ryerson University Ebrahim Bagheri Ryerson University May 7 th
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Presented by: Apeksha Khabia Guided by: Dr. M. B. Chandak
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Ranking of Web Services Eyhab Al-Masri. Outline Discovery of Web Services 1 Ranking of Web Services 2 Approaches 3 Conclusion 4 Q & A 5.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma
To Link or Not to Link? A Study on End-to-End Tweet Entity Linking Stephen Guo, Ming-Wei Chang, Emre Kıcıman.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.
TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science.
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Automatically Labeled Data Generation for Large Scale Event Extraction
MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS
Web Technologies Laboratory
Data-Driven Educational Data Mining ---- the Progress of Project
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Market Intelligence Analysis
Contextual Intelligence as a Driver of Services Innovation
Exploiting Synergy Between Ontologies and Recommender Systems
Memory Standardization
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
A Deep Learning Technical Paper Recommender System
Collective Network Linkage across Heterogeneous Social Platforms
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Social Knowledge Mining
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Applying Key Phrase Extraction to aid Invalidity Search
Exploring Scholarly Data with Rexplore
Learning Emoji Embeddings Using Emoji Co-Occurrence Network Graph
Topic Oriented Semi-supervised Document Clustering
iSRD Spam Review Detection with Imbalanced Data Distributions
Searching and browsing through fragments of TED Talks
A Network Science Approach to Fake News Detection on Social Media
ISWC 2013 Entity Recommendations in Web Search
A Framework for Benchmarking Entity-Annotation Systems
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
Summarization for entity annotation Contextual summary
Text Annotation: DBpedia Spotlight
INF 141: Information Retrieval
Building Topic/Trend Detection System based on Slow Intelligence
Kostas Kolomvatsos, Christos Anagnostopoulos
Topic: Semantic Text Mining
Presentation transcript:

EDIUM: Improving Entity Disambiguation via User modelling Romil Bansal, Sandeep Panem, manish gupta, vasudeva Varma International Institute of information technology, hyderabad 14th April 2014

Introduction (Entity Disambiguation) Entity Disambiguation is the task of finding the correct entity referent in the knowledge base for the given mention.

Introduction (User modelling) User modelling is the task of categorizing users’ activities, so as to customize and adapt the system based on user’s needs. Tweets by the User @GameOfThrones (Official HBO Game of Thrones TV Series Handle)

Motivation Short text from social media (e.g. Twitter, Facebook etc.) is an important source of information. Entities are important for detecting and tracking information shared about various products. Events and locations. Reputations about companies and people. Movies, Sports etc. Named Entity Detection (NED) is difficult in micro-posts as they lack sufficient context. Entities from user’s previous tweets could help in creating interest models that could further help in disambiguating new entity mentions.

Related Work Many models have been proposed to disambiguate entities in the text. Many models [ASMP12, NERT11, EDTL13] tried to disambiguate entities based on the following parameters. Context Aware Entity Disambiguation Use text around the entity for disambiguation Popularity based Entity Disambiguation Likelihood of candidate entity being the target for the given mention We try to disambiguate the entities by combining contextual models and user models by analyzing the user’s tweeting behavior.

Entity Disambiguation Problem Entity Disambiguation User modelling

Our Approach (System Architecture) 1-α α

The EDIUM System Self-learn the user’s interests. Use existing context-based method for disambiguation. Add highly confident (ratio test, confidence > 90% ) disambiguations from the user’s tweet to create user model. Cluster the interests based on semantic similarity between different entities.

The EDIUM System Compute the user based disambiguation score [𝑆𝑐𝑜𝑟 𝑒 𝑈 𝐶 𝑖 𝑗 ] of candidate entity ( 𝐶 𝑖 𝑗 ) based on the semantic similarity with the entity and interest topics (𝐼 𝐶 𝑢 𝑘 ). Compute the context based disambiguation score [𝑆𝑐𝑜𝑟 𝑒 𝐶 𝐶 𝑖 𝑗 ] of the candidate entity from the context-based systems. Rank the results on the context as well as user model scores. Select the candidate entity with the maximum score as the final disambiguated entity for the given mention.

𝑅 𝑡 = 1− 𝛼 𝑡 ×𝑠𝑖𝑚(𝑈𝑀,𝐷𝑀) 1− 𝛼 𝑡 ×𝑠𝑖𝑚 𝑈𝑀,𝐷𝑀 + 𝛼 𝑡 ×𝑠𝑖𝑚 𝐶𝑀,𝐷𝑀 The EDIUM System Re-calculate the score α based on the similarity of the user’s new tweet’s topics with the previous m tweet topics. This is done to reduce the dependency of user model for entity disambiguation in case the user model is incomplete or user tweets are too general. 𝑅 𝑡 = 1− 𝛼 𝑡 ×𝑠𝑖𝑚(𝑈𝑀,𝐷𝑀) 1− 𝛼 𝑡 ×𝑠𝑖𝑚 𝑈𝑀,𝐷𝑀 + 𝛼 𝑡 ×𝑠𝑖𝑚 𝐶𝑀,𝐷𝑀 𝛼 𝑡+1 = 1 𝑚 𝑘=0 𝑚−1 𝑅 𝑡−𝑘 Where 𝑠𝑖𝑚 𝑈𝑀,𝐷𝑀 = cos 𝑆𝑐𝑜𝑟 𝑒 𝐷 𝐶 𝑖 , 𝑆𝑐𝑜𝑟 𝑒 𝑈 𝐶 𝑖 , is the cosine similarity between tweet categories vector obtained by the system and tweet categories vector by the user model; and 𝑠𝑖𝑚 𝐶𝑀,𝐷𝑀 = cos 𝑆𝑐𝑜𝑟 𝑒 𝐷 𝐶 𝑖 , 𝑆𝑐𝑜𝑟 𝑒 𝐶 𝐶 𝑖 , is the cosine similarity between tweet categories vector obtained by the system and tweet categories vector by the contextual model.

Dataset We evaluated the performance of EDIUM on a dataset annotated manually by three individuals. The dataset consists of 200 tweets each from randomly selected 20 different Twitter users.

Results Entity Disambiguation Fig. 1: Performance with Wikipedia Miner Fig. 2: Performance with DBpedia Spotlight

Observations System works better with Wikipedia Miner [WIKIM13] than with DBpedia Spotlight [DSSL11]. System depends on the underlying Contextual modelling system to learn the user’s interests initially. More precise text contextual systems leads to greater improvement in the desired results.

Conclusion In this paper, we have modeled entity disambiguation based on the user’s past interest information. We proposed a way to model the user’s interests using the entity linking techniques and then using it later to improve the disambiguation in entity linking systems. The gain in precision is proportional to the accuracy of the underlying entity linking system.

Future Work Future work requires more analysis on the user modelling aspect of the system. Along with user’s previous tweets, user’s network and demographics information could also be considered for further improve the entity disambiguation.

Thank you! Questions?

References [RESE13] Murnane, E. L., Haslhofer, B., Lagoze, C.: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text. In: Proc. of the 22nd Intl. Conf. on World Wide Web (WWW), Republic and Canton of Geneva, Switzerland (2013) [ASMP12] E. Meij, W. Weerkamp, and M. de Rijke. Adding semantics to microblog posts. In WSDM 2012. ACM, 2012 [ELFT13] X. Liu, Y. Li, H. Wu, M. Zhou, F. Wei, and Y. Lu. 2013. Entity linking for tweets. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics [NERT11] A. Ritter, S. Clark, Mausam, and O. Etzioni. Named Entity Recognition in Tweets: An Experimental study. In Proc. Of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011 [DUTI10] Michelson, M., Macskassy, S. A.: Discovering Users’ Topics of Interest on Twitter: A First Look. In: Proc. of the 4th Workshop on Analytics for Noisy Unstructured Text Data, ACM (2010) 73–80 [ABIR10] Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting Boosting for Information Retrieval Measures. Journal of Information Retrieval, 13(3):254–270, Jun 2010

References [DSSL11] Mendes, P. N., Jakob, M., Garc´ıa-Silva, A., Bizer, C.: DBpedia Spotlight: Shedding Light on the Web of Documents. In: Proc. of the 7th Intl. Conf. on Semantic Systems, New York, NY, USA, ACM (2011) [WIKM13] Milne, D.,Witten, I. H.: An Open-source Toolkit for Mining Wikipedia. Artificial Intelligence 194 (2013) 222–239 [EDTL13] Yerva, S. R., Catasta, M., Demartini, G., Aberer, K.: Entity Disambiguation in Tweets Leveraging User Social Profiles. In: Proc. of the 2013 Intl. Conf. on Information Reuse and Integration (IRI), 2013, IEEE (2013) 120–128