An Adaptive User Profile for Filtering News Based on a User Interest Hierarchy Sarabdeep Singh, Michael Shepherd, Jack Duffy and Carolyn Watters Web Information.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Personalized News Josh Alspector, Alek Kolcz - University of Colorado at Colorado Springs.
Personalization and Search Jaime Teevan Microsoft Research.
Evaluating Search Engine
Information Retrieval in Practice
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
PROBLEM BEING ATTEMPTED Privacy -Enhancing Personalized Web Search Based on:  User's Existing Private Data Browsing History s Recent Documents 
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Evaluating the Performance of IR Sytems
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Recognizing User Interest and Document Value from Reading and Organizing Activities in Document Triage Rajiv Badi, Soonil Bae, J. Michael Moore, Konstantinos.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Search Engines
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
School of Electronics Engineering and Computer Science Peking University Beijing, P.R. China Ziqi Wang, Yuwei Tan, Ming Zhang.
Genre and Task for Web Page Filtering Michael Shepherd Web Information Filtering Lab Faculty of Computer Science Dalhousie University.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Mining the Structure of User Activity using Cluster Stability Jeffrey Heer, Ed H. Chi Palo Alto Research Center, Inc – SIAM Web Analytics Workshop.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
User Modeling, Recommender Systems & Personalization Pattie Maes MAS 961- week 6.
Clustering Personalized Web Search Results Xuehua Shen and Hong Cheng.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
Personalized Search Xiao Liu
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
1 Computing Relevance, Similarity: The Vector Space Model.
Collaborative Information Retrieval - Collaborative Filtering systems - Recommender systems - Information Filtering Why do we need CIR? - IR system augmentation.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,
Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
1 Automatic indexing Salton: When the assignment of content identifiers is carried out with the aid of modern computing equipment the operation becomes.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
Post-Ranking query suggestion by diversifying search Chao Wang.
Web Search Personalization with Ontological User Profile Advisor: Dr. Jai-Ling Koh Speaker: Shun-hong Sie.
1 CS 430: Information Discovery Lecture 5 Ranking.
1 Personalized IR Reloaded Xuehua Shen
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Major Issues n Information is mostly online n Information is increasing available in full-text (full-content) n There is an explosion in the amount of.
Case of Jack Smith Learning to Learn
Text Based Information Retrieval
Author: Kazunari Sugiyama, etc. (WWW2004)
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
INF 141: Information Retrieval
VECTOR SPACE MODEL Its Applications and implementations
Presentation transcript:

An Adaptive User Profile for Filtering News Based on a User Interest Hierarchy Sarabdeep Singh, Michael Shepherd, Jack Duffy and Carolyn Watters Web Information Filtering Lab Faculty of Computer Science Dalhousie University

Overview News Reading Behaviour Related Research Our Approach Experiments Results Summary

News Reading Behaviour Uses and Gratification –An example of extrinsically motivated behaviour in that there is some reward to be gained by engaging in the activity –Based on the assumption that the reader has some underlying goal, outside the reading itself, that reading the news satisfies. Ludic or Play theory of news reading –An example of intrinsically motivated behavior in that the activity appears to be spontaneously initiated by the person in pursuit of no other goal than the activity itself. –This theory asserts that, “... the process of news reading is intrinsically pleasurable, … a more casual, spontaneous, and unstructured form of news reading.”

Reading News is a Social Phenomenon News has a social and context function in that it provides the information necessary to participate fully as a citizen in the local, national, and international community Several research projects have focused on fine- grained filtering of news articles. Results suggest that personal profiles need to be offset by community interests for ludic news reading behavior.

Knowledge Acquisition and Modeling There are many systems for user modeling and news reading described in the literature The key research issues for modeling for ludic behaviour include: –Implicit or explicit knowledge acquisition –Long term interests and/or short term interests –Drifting interests

Our Approach This research does not filter in the sense that it removes articles Rather it re-ranks the news articles bringing articles “of interest” closer to the front of the queue without eliminating articles that may be, serendipitously, of interest to the user

Research Questions Can we develop a system that learns a user profile? Can the system adapt to changes in the user’s interests?

User’s Interest Hierarchy Profile k, w category 1,1 Category 2,2 Category 2,3 Category 2,1 Category 1,2 Category 1,3 Category 2,4 Category 3,1 Category 3,21 k, w

Bigrams Bigram consists of two words that occur in the same news article A term may be part of many bigrams Strength of relationship between terms of a bigram is based on the Augmented Expected Mutual Information –Prob of both terms occurring in the same news article –Less prob of one term occurring without the other Modified by a specificity function that acts like the inverse document frequency to counter the effect of a term occurring in many news article

T1 T2 T5 T6 T3 T9 T7 T10 T12 T4 T8 T11 Bigram Graph

T1 T2 T5 T6 T3 T9 T7 T10 T12 T4 T8 T11 Removal of Edges weight < 0.35

Profile k, w Topics of Interest category 1,1 Category 2,2 Category 2,3 Category 2,1 Category 1,2 Category 1,3 Category 2,4 Category 3,1 Category 3,21 k, w

Process User evaluates 100 news articles in order to initialize profile 100 news articles Order news articles by profile Explicit feedback from user Update profile User profile

Initialize the Profile Create bigram graph from 100 news articles, with keyword weights = 0. Ask user to rate these news articles as being either “of interest” or “not of interest” Initialize weights of the keywords in the graph based on user evaluations

Adapting the Weights in the Interest Hierarchy For each article, i, in which term k occurs, the weight in the profile associated with k, is modified as follows: where a i is the learning rate associated with article k and is (-0.9, +0.9) is the weight of term k in the profile and is the weight of term k in the term vector representing news article i.

Ordering the News Articles According to the Profile Each leaf category in the profile is represented as a vector of weighted terms. Each news article is represented as weighted term vector where weights are tf.idf The cosine similarity is calculated between an article and every leaf category in the profile. The average of these similarity measures is then taken to be the closeness of that news article to the user’s profile.

Note Profile categories are not developed from individual articles. Rather, they are developed from categories of user interests developed from the bigram graph. As the terms from an article may occur in several different categories, the news articles themselves are not associated with a particular category, but are distributed over multiple categories in the profile.

Updating the Profile Existing User Profile Bigram of newest 100 news articles with user feedback Merge

Merging For each leaf category in the new hierarchy Calculate cosine similarity with each leaf category in existing profile Find profile leaf category with max similarity If max similarity > Threshold Merge new leaf category with profile leaf category Else Create new leaf category in profile with leaf category from new hierarchy Endif Endfor

Experiments 3 users with static user interests Each initialized a profile on those interests Each then iterated through 5 sets of 100 news articles, evaluating based on these static interests Each then created a new set of user interests and iterated through another 5 sets of 100 news articles, evaluating based on the new set of user interests

Processing and Measurement After each set of 100 news articles were evaluated, the Normalized Recall was determined for that set of 100 Normalized Recall measures how close the system was to being perfect, i.e., all the articles “of interest” would be ranked before all the articles evaluated as “not of interest” These 100 news articles and their evaluations were then used to update the user profile

Assume 5 news articles out of 10 are “of interest”

Number Relevant Random R Norm After Training Set Training + set 1 Training + sets 1-2 Training + sets 1-3 Training + sets 1-4 Set Set Set Set Set Normalized Recall – User 1

All Users over all Sets

Summary Results There were significant differences among the users The system did learn for all users, but not equally The system stopped learning after 3 iterations on first set of trials The system did adapt to the changed profiles The system appears to be sensitive to the amount of positive feedback (“of interest”) when learning a new set of interests

Conclusions and Discussion The system did learn the users’ interests and did adapt to changes in interests Although only 3 users, the results are significant for these users as there were 1000 data points for each users Cannot generalize to other users

Future Research Larger study with more users and dynamic news feeds Fine-grained learning rate based on Likert scale of user evaluations Collaborative interest hierarchy