News Recommendation with CF-IDF+

Slides:



Advertisements
Similar presentations
A Comparison Study for Novelty Control Mechanisms Applied to Web News Stories 2012 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2012)
Advertisements

WEB MINING. Why IR ? Research & Fun
Chapter 5: Introduction to Information Retrieval
Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.
Improved TF-IDF Ranker
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Hermes: News Personalization Using Semantic Web Technologies
Exploiting Discourse Structure for Sentiment Analysis of Text OR 2013 Alexander Hogenboom In collaboration with Flavius Frasincar, Uzay Kaymak, and Franciska.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
March 17, 2008SAC WT Hermes: a Semantic Web-Based News Decision Support System* Flavius Frasincar Erasmus University Rotterdam.
Automatically Annotating Web Pages Using Google Rich Snippets 11th Dutch-Belgian Information Retrieval Workshop (DIR 2011) February 4, 2011 Frederik Hogenboom.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Evaluating the Performance of IR Sytems
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
An Overview of Event Extraction from Text Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11) October 23,
News Personalization using the CF-IDF Semantic Recommender International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011) May 25, 2011.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.
Chapter 5: Information Retrieval and Web Search
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Sentiment Analysis with a Multilingual Pipeline 12th International Conference on Web Information System Engineering (WISE 2011) October 13, 2011 Daniëlla.
SIGIR’09 Boston 1 Entropy-biased Models for Query Representation on the Click Graph Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science.
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
A News-Based Approach for Computing Historical Value-at-Risk International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) Frederik Hogenboom.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
Ontology Updating Driven by Events Dutch-Belgian Database Day 2012 (DBDBD 2012) November 21, 2012 Frederik Hogenboom Jordy Sangers.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.
Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Vector Space Models.
Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom
Foxtrot seminar Capturing knowledge of user preferences with recommender systems Stuart E. Middleton David C. De Roure, Nigel R. Shadbolt Intelligence,
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
Term Weighting approaches in automatic text retrieval. Presented by Ehsan.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
IR 6 Scoring, term weighting and the vector space model.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Queensland University of Technology
Data-Driven Educational Data Mining ---- the Progress of Project
Linguistic Graph Similarity for News Sentence Searching
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Web News Sentence Searching Using Linguistic Graph Similarity
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Information Retrieval and Web Search
Bing-SF-IDF+: A Hybrid Semantics-Driven News Recommender
Multimedia Information Retrieval
Applying Key Phrase Extraction to aid Invalidity Search
Presented by: Prof. Ali Jaoua
Text Categorization Assigning documents to a fixed set of categories
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
Chapter 5: Information Retrieval and Web Search
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Web Mining Research: A Survey
Presentation transcript:

News Recommendation with CF-IDF+ Emma de Koning 370761ek@student.eur.nl Frederik Hogenboom fhogenboom@ese.eur.nl Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam PO Box 1738, NL-3000 DR Rotterdam, the Netherlands June 13, 2018 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

Introduction (1) Recommender systems help users to plough through a massive and increasing amount of information Recommender systems: Content-based Collaborative filtering Hybrid Content-based systems are often term-based Common measure: Term Frequency – Inverse Document Frequency (TF-IDF) as proposed by [Salton and Buckley, 1988] 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

Introduction (2) TF-IDF: Preprocessed documents (stop words removal and stemming) For each term, it takes into consideration: The importance in a single document The inverse of the general importance within a set of documents TF-IDF performance tends to decrease as documents get larger The red, purple, and blue terms are important, whereas the yellow, green, and pink terms are irrelevant 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

Introduction (3) Utilizing concepts instead of terms: Reduces noise caused by non-meaningful terms Yields less terms to evaluate Allows for semantic features, e.g., synonyms Therefore, in 2011 we proposed Concept Frequency – Inverse Document Frequency (CF-IDF), showing an improvement over regular TF-IDF [Goossen et al., 2011] The black concepts are important, while the brown and beige concepts are irrelevant 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

Introduction (4) Research has shown that relationships like synonymy and hyponymy provide structure and improved interpretability Hence, we coin CF-IDF+, which additionally accounts for semantic relationships CF-IDF+ is implemented in Ceryx (an extension for Hermes [Frasincar et al., 2009], a news processing framework) Results are evaluated in comparison with TF-IDF and CF-IDF The black concepts have become less important due to their relationship with beige concepts 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

Introduction (5) Earlier work has been done: CF-IDF-like methods: [Baziz et al., 2005], [Yan and Li, 2007] Frameworks: OntoSeek [Guarino et al., 1999], Quickstep [Middleton et al., 2004], News@hand [Cantador et al., 2008] Although some work shows overlap: Methods are not thoroughly compared with TF-IDF Often, WSD and synonym handling is lacking 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

TF-IDF Term Frequency: the occurrence of a term ti in a document dj, i.e., Inverse Document Frequency: the occurrence of a term ti in a set of documents D, i.e., And hence 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

CF-IDF Concept Frequency: the occurrence of a concept ci in a document dj, i.e., Inverse Document Frequency: the occurrence of a concept ci in a set of documents D, i.e., And hence 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

CF-IDF+ Each concept c from set C has a set of related concepts R(c) A related concept r is associated with a weight wr We focus on domain ontologies, and identify 3 different weights for superclasses, subclasses, and domain relationships For concept ci and related concept ri  R(ci) with weight wr in document dj  D, CF-IDF+ is computed as 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

Recommendations Ontology contains a set of concepts and relations User profile consists of (a subset of) these concepts and relations Each concept and relation is associated with all news articles Each article is represented as: TF-IDF: a set containing all terms CF-IDF: a set containing all concepts CF-IDF+: a set containing all concepts and related concepts Then, for each article, weights are calculated Weights of a new article are compared to the user profile using cosine similarity 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

Implementation: Hermes Hermes is used for building a news personalization service Its implementation Hermes News Portal (HNP): Is ontology-based Is programmed in Java Uses OWL / SPARQL / Jena / GATE / WordNet Input: RSS feeds of news items Internal processing: Classification News querying Output: news items 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

Implementation: Ceryx (1) Ceryx is a plug-in for HNP Main focus is on recommendation support User profiles are constructed TF-IDF (using a stemmer as proposed in [Krovetz, 1993]), CF-IDF, and CF-IDF+ recommendation calculations (using Lesk Word Sense Disambiguation [Jensen and Boss, 2008]) can be performed 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

Implementation: Ceryx (2) 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

Evaluation (1) Experiment: Cut-off values: {0, 0.01, 0.02, …, 1} For each cut-off value, relationship weights are optimized to maximize F1-scores: Subclass relations receive low weights (too specific) Superclass relations receive higher weights (somewhat generic) Domain relations receive highest weights (just about right) Experts 3 Profiles 8 News items 100 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

Evaluation (2) CF-IDF+ has a significantly higher recall than CF-IDF and TF-IDF, at the cost of a slightly reduced precision Overall, F1-scores for CF-IDF+ are higher at high cut-off values, which are the tougher nuts to crack! 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

Evaluation (3) CF-IDF+ consistently has a higher classification power (Kappa statistic) than CF-IDF CF-IDF+ mostly outperforms TF-IDF in the higher regions, yet stays behind for low cut-off values 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

Conclusions For strict recommendation settings, CF-IDF+ outperforms CF-IDF and TF-IDF significantly Especially classification power and recall show vast improvements, at the cost of a slight loss in precision Hence, using key concepts and semantic relations instead of analyzing all terms could be beneficial for recommender systems Future work: Invest in a more fine-grained weight learning procedure Include a larger collection of relationships Use other large ontological resources like Evaluate on a larger set of news items 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

Questions 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

References (1) Baziz, M., Boughanem, M., Traboulsi, S.: A Concept-Based Approach for Indexing Documents in IR. In: Actes du XXIIIème Congrès Informatique des Organisations et Systèmes d'Information et de Décision (INFORSID 2005). pp. 489-504. HERMES Science Publications (2005) Cantador, I., Bellogín, A., Castells, P.: News@hand: A Semantic Web Approach to Recommending News. In: 5th International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems (AH 2008). pp. 279-283. Springer-Verlag, Berlin, Heidelberg (2008) Frasincar, F., Borsje, J., Levering, L.: A Semantic Web-Based Approach for Building Personalized News Services. International Journal of E-Business Research 5(3), 35-53 (2009) 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

References (2) Goossen, F., IJntema, W., Frasincar. F., Hogenboom, F., Kaymak, U.: News Personalization using the CF-IDF Semantic Recommender. In: 1st International Conference on Web Intelligence, Mining and Semantics (WIMS 2011). ACM (2011) Guarino, N., Masolo, C., Vetere, G.: OntoSeek: Content-Based Access to the Web. IEEE Intelligent Systems 14(3), 70-80 (1999) Jensen, A.S., Boss, N.S.: Textual Similarity: Comparing Texts in order to Discover How Closely They Discuss the Same Topics. Bachelor’s thesis, Technical University of Denmark (2008) Krovetz, R.: Viewing Morphology as an Inference Process. In: 26th ACM Conference on Research and Development in Information Retrieval (SIGIR 1993). pp. 191-202. ACM (1993) 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

References (3) Middleton, S.E., Roure, D.D., Shadbolt, N.R.: Ontology-Based Recommender Systems. In: Handbook on Ontologies, pp. 577-498. International Handbooks on Information Systems, Springer (2004) Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24(5), 513-523 (1988) Yan, L., Li, C.: A Novel Semantic-based Text Representation Method for Improving Text Clustering. In: 3rd Indian International Conference on Artificial Intelligence (IICAI 2007). pp. 1738-1750 (2007) 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)