Download presentation
Presentation is loading. Please wait.
Published byJacob Francis Modified over 8 years ago
1
An Adaptive User Profile for Filtering News Based on a User Interest Hierarchy Sarabdeep Singh, Michael Shepherd, Jack Duffy and Carolyn Watters Web Information Filtering Lab Faculty of Computer Science Dalhousie University
2
Overview News Reading Behaviour Related Research Our Approach Experiments Results Summary
3
News Reading Behaviour Uses and Gratification –An example of extrinsically motivated behaviour in that there is some reward to be gained by engaging in the activity –Based on the assumption that the reader has some underlying goal, outside the reading itself, that reading the news satisfies. Ludic or Play theory of news reading –An example of intrinsically motivated behavior in that the activity appears to be spontaneously initiated by the person in pursuit of no other goal than the activity itself. –This theory asserts that, “... the process of news reading is intrinsically pleasurable, … a more casual, spontaneous, and unstructured form of news reading.”
4
Reading News is a Social Phenomenon News has a social and context function in that it provides the information necessary to participate fully as a citizen in the local, national, and international community Several research projects have focused on fine- grained filtering of news articles. Results suggest that personal profiles need to be offset by community interests for ludic news reading behavior.
5
Knowledge Acquisition and Modeling There are many systems for user modeling and news reading described in the literature The key research issues for modeling for ludic behaviour include: –Implicit or explicit knowledge acquisition –Long term interests and/or short term interests –Drifting interests
6
Our Approach This research does not filter in the sense that it removes articles Rather it re-ranks the news articles bringing articles “of interest” closer to the front of the queue without eliminating articles that may be, serendipitously, of interest to the user
7
Research Questions Can we develop a system that learns a user profile? Can the system adapt to changes in the user’s interests?
8
User’s Interest Hierarchy Profile k, w category 1,1 Category 2,2 Category 2,3 Category 2,1 Category 1,2 Category 1,3 Category 2,4 Category 3,1 Category 3,21 k, w
9
Bigrams Bigram consists of two words that occur in the same news article A term may be part of many bigrams Strength of relationship between terms of a bigram is based on the Augmented Expected Mutual Information –Prob of both terms occurring in the same news article –Less prob of one term occurring without the other Modified by a specificity function that acts like the inverse document frequency to counter the effect of a term occurring in many news article
10
0.5 0.4 0.6 0.80.7 0.3 0.80.23 0.2 0.1 0.8 0.6 0.87 0.53 0.4 0.30.23 0.45 0.2 0.6 0.7 T1 T2 T5 T6 T3 T9 T7 T10 T12 T4 T8 T11 Bigram Graph
11
0.5 0.4 0.6 0.80.7 0.8 0.6 0.87 0.53 0.4 0.45 0.6 0.7 T1 T2 T5 T6 T3 T9 T7 T10 T12 T4 T8 T11 Removal of Edges weight < 0.35
12
Profile k, w Topics of Interest category 1,1 Category 2,2 Category 2,3 Category 2,1 Category 1,2 Category 1,3 Category 2,4 Category 3,1 Category 3,21 k, w
13
Process User evaluates 100 news articles in order to initialize profile 100 news articles Order news articles by profile Explicit feedback from user Update profile User profile
14
Initialize the Profile Create bigram graph from 100 news articles, with keyword weights = 0. Ask user to rate these news articles as being either “of interest” or “not of interest” Initialize weights of the keywords in the graph based on user evaluations
16
Adapting the Weights in the Interest Hierarchy For each article, i, in which term k occurs, the weight in the profile associated with k, is modified as follows: where a i is the learning rate associated with article k and is (-0.9, +0.9) is the weight of term k in the profile and is the weight of term k in the term vector representing news article i.
17
Ordering the News Articles According to the Profile Each leaf category in the profile is represented as a vector of weighted terms. Each news article is represented as weighted term vector where weights are tf.idf The cosine similarity is calculated between an article and every leaf category in the profile. The average of these similarity measures is then taken to be the closeness of that news article to the user’s profile.
18
Note Profile categories are not developed from individual articles. Rather, they are developed from categories of user interests developed from the bigram graph. As the terms from an article may occur in several different categories, the news articles themselves are not associated with a particular category, but are distributed over multiple categories in the profile.
19
Updating the Profile Existing User Profile Bigram of newest 100 news articles with user feedback Merge
20
Merging For each leaf category in the new hierarchy Calculate cosine similarity with each leaf category in existing profile Find profile leaf category with max similarity If max similarity > Threshold Merge new leaf category with profile leaf category Else Create new leaf category in profile with leaf category from new hierarchy Endif Endfor
21
Experiments 3 users with static user interests Each initialized a profile on those interests Each then iterated through 5 sets of 100 news articles, evaluating based on these static interests Each then created a new set of user interests and iterated through another 5 sets of 100 news articles, evaluating based on the new set of user interests
22
Processing and Measurement After each set of 100 news articles were evaluated, the Normalized Recall was determined for that set of 100 Normalized Recall measures how close the system was to being perfect, i.e., all the articles “of interest” would be ranked before all the articles evaluated as “not of interest” These 100 news articles and their evaluations were then used to update the user profile
23
Assume 5 news articles out of 10 are “of interest”
24
Number Relevant Random R Norm After Training Set Training + set 1 Training + sets 1-2 Training + sets 1-3 Training + sets 1-4 Set 1210.5560.796 Set 2190.5590.7990.862 Set 3220.5600.7600.7880.810 Set 4170.5000.6940.7560.7950.802 Set 5150.4380.7580.8410.8640.8650.875 Normalized Recall – User 1
25
All Users over all Sets
26
Summary Results There were significant differences among the users The system did learn for all users, but not equally The system stopped learning after 3 iterations on first set of trials The system did adapt to the changed profiles The system appears to be sensitive to the amount of positive feedback (“of interest”) when learning a new set of interests
27
Conclusions and Discussion The system did learn the users’ interests and did adapt to changes in interests Although only 3 users, the results are significant for these users as there were 1000 data points for each users Cannot generalize to other users
28
Future Research Larger study with more users and dynamic news feeds Fine-grained learning rate based on Likert scale of user evaluations Collaborative interest hierarchy
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.