News Personalization using the CF-IDF Semantic Recommender International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011) May 25, 2011.

Slides:



Advertisements
Similar presentations
A Comparison Study for Novelty Control Mechanisms Applied to Web News Stories 2012 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2012)
Advertisements

A Vector Space Model for Automatic Indexing
Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.
Improved TF-IDF Ranker
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Hermes: News Personalization Using Semantic Web Technologies
Exploiting Discourse Structure for Sentiment Analysis of Text OR 2013 Alexander Hogenboom In collaboration with Flavius Frasincar, Uzay Kaymak, and Franciska.
Connecting Customer Relationship Management Systems to Social Networks 7th International Conference on Knowledge Management, Services, and Cloud Computing.
Determining Negation Scope and Strength in Sentiment Analysis SMC 2011 Paul van Iterson Erasmus School of Economics Erasmus University Rotterdam
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
RCQ-GA: RDF Chain Query Optimization using Genetic Algorithms BNAIC 2009 Alexander Hogenboom, Viorel Milea, Flavius Frasincar, and Uzay Kaymak Erasmus.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
March 17, 2008SAC WT Hermes: a Semantic Web-Based News Decision Support System* Flavius Frasincar Erasmus University Rotterdam.
Automatically Annotating Web Pages Using Google Rich Snippets 11th Dutch-Belgian Information Retrieval Workshop (DIR 2011) February 4, 2011 Frederik Hogenboom.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Optimizing RDF Chain Queries using Genetic Algorithms DBDBD 2010 Alexander Hogenboom, Viorel Milea, Flavius Frasincar, and Uzay Kaymak Erasmus University.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
An Overview of Event Extraction from Text Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11) October 23,
A Survey of Approaches on Mining the Structure from Unstructured Data Dutch-Belgian Database Day 2009 (DBDBD 2009) 1 Nov. 30, 2009 Frederik Hogenboom
Recommender systems Ram Akella November 26 th 2008.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.
Chapter 5: Information Retrieval and Web Search
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Sentiment Analysis with a Multilingual Pipeline 12th International Conference on Web Information System Engineering (WISE 2011) October 13, 2011 Daniëlla.
SIGIR’09 Boston 1 Entropy-biased Models for Query Representation on the Click Graph Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science.
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
A News-Based Approach for Computing Historical Value-at-Risk International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) Frederik Hogenboom.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Ontology Updating Driven by Events Dutch-Belgian Database Day 2012 (DBDBD 2012) November 21, 2012 Frederik Hogenboom Jordy Sangers.
A Hybrid Recommender System: User Profiling from Keywords and Ratings Ana Stanescu, Swapnil Nagar, Doina Caragea 2013 IEEE/WIC/ACM International Conferences.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar Erasmus University Rotterdam * Joint work with Kim Schouten,
Similar Document Search and Recommendation Vidhya Govindaraju, Krishnan Ramanathan HP Labs, Bangalore, India JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 6: Information Retrieval and Web Search
You Are What You Tag Yi-Ching Huang and Chia-Chuan Hung and Jane Yung-jen Hsu Department of Computer Science and Information Engineering Graduate Institute.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.
Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.
Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.
Towards Cross-Language Sentiment Analysis through Universal Star Ratings KMO 2012 Malissa Bal Erasmus University Rotterdam Flavius.
Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Using Game Reviews to Recommend Games Michael Meidl, Steven Lytinen DePaul University School of Computing, Chicago IL Kevin Raison Chatsubo Labs, Seattle.
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
1 A Medical Information Management System Using the Semantic Web Technology Networked Computing and Advanced INFORMATION MANAGEMENT, NCM '08. Fourth.
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
An Adaptive User Profile for Filtering News Based on a User Interest Hierarchy Sarabdeep Singh, Michael Shepherd, Jack Duffy and Carolyn Watters Web Information.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Queensland University of Technology
Data-Driven Educational Data Mining ---- the Progress of Project
Linguistic Graph Similarity for News Sentence Searching
Bing-SF-IDF+: A Hybrid Semantics-Driven News Recommender
Multimedia Information Retrieval
News Recommendation with CF-IDF+
Presented by: Prof. Ali Jaoua
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Presentation transcript:

News Personalization using the CF-IDF Semantic Recommender International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011) May 25, 2011 Frank Goossen Wouter IJntema Flavius Frasincar Frederik Hogenboom Uzay Kaymak Erasmus University Rotterdam PO Box 1738, NL-3000 DR Rotterdam, the Netherlands

Introduction (1) Recommender systems help users to plough through a massive and increasing amount of information Recommender systems: –Content-based –Collaborative filtering –Hybrid Content-based systems are often term-based Common measure: Term Frequency – Inverse Document Frequency (TF-IDF) as proposed by Salton and Buckley [1988] International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

Introduction (2) TF-IDF steps: –Filter stop words from document –Stem remaining words to their roots –Calculate term frequency (i.e., the importance of a term or word within a document) –Calculate inverse document frequency (i.e., the inverse of the general importance of a term in a set of documents) –Multiply term frequency with the inverse document frequency TF-IDF performance tends to decrease as documents get larger International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

Introduction (3) The Semantic Web offers new possibilities Utilizing concepts instead of terms: –Reduces noise caused by non-meaningful terms –Yields less terms to evaluate –Allows for semantic features, e.g., synonyms Therefore, we propose Concept Frequency – Inverse Document Frequency (CF-IDF) CF-IDF is implemented in Athena (an extension for Hermes [Frasincar et al., 2009], a news processing framework) Results are evaluated in comparison with TF-IDF International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

Introduction (4) Earlier work has been done: –CF-IDF-like methods: Baziz et al. [2005], Yan and Li [2007] –Frameworks: OntoSeek [Guarino et al., 1999], Quickstep [Middleton et al., 2004], [Cantador et al., 2008] Although some work shows overlap: –Methods are not thoroughly compared with TF-IDF –Often, WSD and synonym handling is lacking International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

Outline TF-IDF CF-IDF Recommendations Implementation: –Hermes –Athena Evaluation Conclusions International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

TF-IDF Term Frequency: the occurrence of a term t i in a document d j, i.e., Inverse Document Frequency: the occurrence of a term t i in a set of documents D, i.e., And hence International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

CF-IDF Concept Frequency: the occurrence of a concept c i in a document d j, i.e., Inverse Document Frequency: the occurrence of a concept c i in a set of documents D, i.e., And hence International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

Recommendations Ontology contains a set of concepts and relations User profile consists of (a subset of) these concepts and relations Each concept and relation is associated with all news articles Each article is represented as: –TF-IDF: a set containing all terms –CF-IDF: a set containing all concepts Then, for each article, weights are calculated Weights of a new article are compared to the user profile using cosine similarity International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

Implementation: Hermes Hermes framework is utilized for building a news personalization service Its implementation Hermes News Portal (HNP): –Is ontology-based –Is programmed in Java –Uses OWL / SPARQL / Jena / GATE / WordNet Input: RSS feeds of news items Internal processing: –Classification –News querying Output: news items International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

Implementation: Athena (1) Athena is a plug-in for HNP Main focus is on recommendation support User profiles are constructed TF-IDF (using a stemmer as proposed in [Krovetz, 1993]) and CF-IDF recommendation calculations can be performed International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

Implementation: Athena (2) Interface: –News browser –Recommendations –Evaluation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

Implementation: Athena (3) International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

Implementation: Athena (4) International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

Evaluation (1) Experiment: –We let 19 participants evaluate 100 news items –User profile: all articles that are related to Microsoft, its products, and its competitors –Athena computes TF-IDF and CF-IDF and determines interestingness using several cutoff values –Measurements: Accuracy Precision Recall Specificity F 1 -measure Kappa statistic Receiver Operating Characteristic (ROC) curves t-tests for determining significance International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

Evaluation (2) Results: –CF-IDF performs significantly better than TF-IDF for accuracy (+4.7%), recall (+24.4%), and F 1 (+21.9%) for threshold 0.5 –Precision and specificity are not significantly different International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

Evaluation (3) International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

Evaluation (4) International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

Conclusions CF-IDF outperforms TF-IDF significantly for many measures: accuracy, recall, F 1, Kappa, and ROC (AUC) Hence, using key concepts and semantics instead of analyzing all terms could be beneficial for recommender systems Future work: –Use different stemmers for TF-IDF –Investigate and compare with TF-IDF variants that account for some limitations (e.g., Okapi BM25) –Implement various concept relationship types International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

Questions International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

References (1) Baziz, M., Boughanem, M., Traboulsi, S.: A Concept- Based Approach for Indexing Documents in IR. In: Actes du XXIIIème Congrès Informatique des Organisations et Systèmes d'Information et de Décision (INFORSID 2005). pp HERMES Science Publications (2005) Cantador, I., Bellogín, A., Castells, P.: A Semantic Web Approach to Recommending News. In: 5th International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems (AH 2008). pp Springer-Verlag, Berlin, Heidelberg (2008) International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

References (2) Frasincar, F., Borsje, J., Levering, L.: A Semantic Web-Based Approach for Building Personalized News Services. International Journal of E-Business Research 5(3), (2009) Guarino, N., Masolo, C., Vetere, G.: OntoSeek: Content-Based Access to the Web. IEEE Intelligent Systems 14(3), (1999) Krovetz, R.: Viewing Morphology as an Inference Process. In: 26th ACM Conference on Research and Development in Information Retrieval (SIGIR 1993). pp ACM (1993) International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)

References (3) Middleton, S.E., Roure, D.D., Shadbolt, N.R.: Ontology-Based Recommender Systems. In: Handbook on Ontologies, pp International Handbooks on Information Systems, Springer (2004) Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24(5), (1988) Yan, L., Li, C.: A Novel Semantic-based Text Representation Method for Improving Text Clustering. In: 3rd Indian International Conference on Artificial Intelligence (IICAI 2007). pp (2007) International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011)