Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.

Slides:

Advertisements

Similar presentations

Google News Personalization: Scalable Online Collaborative Filtering

Advertisements

A Comparison Study for Novelty Control Mechanisms Applied to Web News Stories 2012 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2012)

Chapter 5: Introduction to Information Retrieval

RCQ-ACS: RDF Chain Query Optimization Using an Ant Colony System WI 2012 Alexander Hogenboom Erasmus University Rotterdam Ewout Niewenhuijse.

Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.

Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.

Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle

A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.

Exploiting Discourse Structure for Sentiment Analysis of Text OR 2013 Alexander Hogenboom In collaboration with Flavius Frasincar, Uzay Kaymak, and Franciska.

Connecting Customer Relationship Management Systems to Social Networks 7th International Conference on Knowledge Management, Services, and Cloud Computing.

Determining Negation Scope and Strength in Sentiment Analysis SMC 2011 Paul van Iterson Erasmus School of Economics Erasmus University Rotterdam

Ranking models in IR Key idea: We wish to return in order the documents most likely to be useful to the searcher To do this, we want to know which documents.

IR Models: Overview, Boolean, and Vector

Information Retrieval in Practice

Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.

Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.

Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam

Ch 4: Information Retrieval and Text Mining

Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.

Vector Space Model CS 652 Information Extraction and Integration.

News Personalization using the CF-IDF Semantic Recommender International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011) May 25, 2011.

A Comparative Study on Feature Selection in Text Categorization (Proc. 14th International Conference on Machine Learning – 1997) Paper By: Yiming Yang,

The Vector Space Model …and applications in Information Retrieval.

Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.

Chapter 5: Information Retrieval and Web Search

Overview of Search Engines

Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.

Sentiment Analysis with a Multilingual Pipeline 12th International Conference on Web Information System Engineering (WISE 2011) October 13, 2011 Daniëlla.

Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.

Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.

A News-Based Approach for Computing Historical Value-at-Risk International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) Frederik Hogenboom.

Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.

1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.

Probabilistic Question Recommendation for Question Answering Communities Mingcheng Qu, Guang Qiu, Xiaofei He, Cheng Zhang, Hao Wu, Jiajun Bu, Chun Chen.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

Numerical Methods Part: Simpson Rule For Integration.

Chapter 6: Information Retrieval and Web Search

CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.

*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

Semantic Wordfication of Document Collections Presenter: Yingyu Wu.

Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.

Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.

Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,

Measuring Semantic Similarity between Words Using Web Search Engines WWW 07.

1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,

Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.

Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle

1 CS 430 / INFO 430 Information Retrieval Lecture 3 Searching Full Text 3.

A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

An Adaptive User Profile for Filtering News Based on a User Interest Hierarchy Sarabdeep Singh, Michael Shepherd, Jack Duffy and Carolyn Watters Web Information.

CS791 - Technologies of Google Spring A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.

Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,

Web News Sentence Searching Using Linguistic Graph Similarity

Bing-SF-IDF+: A Hybrid Semantics-Driven News Recommender

From frequency to meaning: vector space models of semantics

Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.

Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou

Presentation transcript:

Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting scheme for terms occurring in news messages and user profiles. Semantics-driven variants such as SF-IDF additionally take into account term meaning by exploiting synsets from semantic lexicons. However, they ignore the various semantic relationships between synsets, providing only for a limited understanding of news semantics. Moreover, semantics-based weighting techniques are not able to handle – often crucial – named entities, which are usually not present in semantic lexicons. Hence, we extend SF-IDF to Bing- SF-IDF+ by also considering the synset semantic relationships, and by employing named entity similarities using Bing page counts. Classic Recommendation Classic news recommenders commonly use a Term Frequency – Inverse Document Frequency (TF-IDF) weighting scheme. Here, scores increase proportionally to the number of times a specific term (word) appears in a document, but are scaled with the frequency of a word in the full set of known documents (the corpus). Hence, terms that occur relatively often in a document, but not in the rest of the corpus, are assigned high scores, highlighting their importance, while common terms receive low scores. Usually, term scores are computed for the user profile (i.e., a collection of read documents) and for a new document. These scores are subsequently compared using a similarity measure, such as the cosine similarity. Documents displaying a high term score similarity with the user profile (i.e., above a certain specified cut-off value) are recommended. A recent variant is the Synset Frequency – Inverse Document Frequency (SF-IDF), which takes into account semantics by operating on synonym sets (synsets) from a semantic lexicon, instead of terms. These synsets are obtained after performing word sense disambiguation (using SSI, Lesk, etcetera). Bing-SF-IDF+ Recommendation In essence, Bing-SF-IDF+ recommendation is similar to the synset-based SF-IDF method, as synset score similarities between the user profile and a new document are computed. Additionally, however, named entities (derived through a named entity recognizer) and synset relationships are considered. The method computes a weighted average of the entity-based and relationship- based similarity scores. Named entities that have been identified in the user profile and a new document, and that do not occur in a semantic lexicon, are submitted in pairs to the Bing search engine. Page counts are evaluated in order to compute the Point-Wise Mutual Information (PMI) entity co-occurrence similarity, measuring the discrepancy between the probability of the entities’ coincidence, given their joint distribution and their individual distributions. Frequently co- occurring entities are assumed to be highly similar, and hence contribute to the similarity between the user profile and a new document. Fig. 1. Experimental results. For identified synsets, the similarities are computed identically to SF-IDF, yet the score vectors contain not only the directly occurring synsets, but also the synsets from these concepts that are referred to by their semantical relationships. Additional weighting is applied depending on the relationships, and weights are optimized using a genetic algorithm. Results & Conclusions Experiments on 100 news documents from a Reuters news feed on technology companies, annotated by 3 experts for their relevance with respect to 8 topics, show that Bing-SF-IDF+ significantly outperforms SF-IDF and TF-IDF in terms of average F1 scores over all topics and cut-off values. As depicted in Fig. 1, Bing-SF-IDF+ obtains a score of 0.58, against 0.37 and 0.43, respectively. Also when comparing kappa statistics, SF-IDF and TF-IDF are outperformed by Bing-SF-IDF+, as the former two methods have averages of 0.32 and 0.34, respectively, and the latter method has an average of Results are optimal when giving an equal weight to both Bing similarities and extended synsets incorporating semantic relationships. Acknowledgemen t This publication was partially supported by the Dutch Organization for Scientific Research (NWO) Physical Sciences Free Competition Project : Financial Events Recognition in News for Algorithmic Trading (FERNAT) and the Dutch national program COMMIT. Contact Frederik Hogenboom Postal:Erasmus University Rotterdam P.O. Box 1738 NL-3000 DRRotterdam The Netherlands Phone: +31 (0) Web: Bing-SF-IDF+: Semantics-Driven News Recommendation Frederik Hogenboom, Michel Capelle, Marnix Moerland, Flavius Frasincar