Download presentation
Presentation is loading. Please wait.
1
Combining Content-Based and Collaborative Filters in an Online Newspaper Mark Claypool, Anuja Gokhale, Tim Miranda, Pavel Murnikov, Dmitry Netes and Matthew Sartin Computer Science Department Worcester Polytechnic Institute
2
Outline F Introduction F Approach F System F Experiments F Conclusions
3
Information Overload F Newspapers –1/2 dozen delivered daily –2500 daily via Web –Thousands of articles –Personalization u (Bogart, 1989) Need Filters! Quantity! Quality? F Filters –Usenet news: GroupLens, NewsWeeder, PHOAKS –Aggregate: CRAYON, Fishwrap, My Yahoo! –Layout: Krakatoa Chronicle
4
Information Filtering F Apply power of computers to filtering F How do we filter information? –Get recommendation from friend –Most popular newspapers F Opinions –Peers –Aggregate opinions F Collaborative Filters
5
Collaborative Filtering Problems Early Rater Problem 1 64 3 2 5 1 64 3 2 Sparsity Problem E DA C F B 1 64 3 2 5 “Gray Sheep” Problem Changing Interests Problem X
6
Information Filtering F How else do we filter information? –Skimming the newspaper –Picking newspaper section –Reading byline F Item characteristics –Like sports –Hate field hockey –Like reporter F Content-Based Filters
7
Other Approaches F ProfBuilder –(Wasfi, 1999) F GroupLens –(Sarwar, Konstan et al, 1998) F Basu, Hirsh and Cohen –(1998) F Fab –(Balabanovic and Shoham, 1997)
8
Research Approach F Combine Collaborative and Content-Based using Weighted Average F Per-User Weights –address “gray sheep” problem F Per-Item Weights –address “early rater” problem F Other benefits –realize individual algorithm improvements –extensible –hierarchical
9
Collaborative Filtering Algorithm U x = U + J x - J)r uj |r uj | F Article x F User U F Pearson’s Correlation r uj F (GroupLens, 1994)
10
Content-Based Filter F Selectable sections –business, sports, entertainment … F Keywords –explicit –implicit –article: stop list, word stems, top 50% F Keyword match by Overlap Coefficient M = 2 x A P min ( A , P ) F Combine via weighted average (1/3 each)
11
Combination Filter F Linear combination of scores –(Vogt and Cotrell, 1996, 1998) F Weights are based on previous accuracy F Reorder F “Top-10” Section
12
P-Tango System Architecture Database Tango Web Browser Front End User Profile Login Ratings Back End Correlations Import Keywords Predictions Web Browser
13
P-Tango User Profile
14
P-Tango Interface
15
Experiment F 18 Users F 3 Week trial F 1300 Articles F Density: 0.5% –EachMovie: 2% F User correlation: -0.2 / 0.02 / 1
16
Results
17
Conclusions F Need to incorporate content-based with collaborative predictions F Linear-mixture of predictions –simple –effective –extensible –realizes benefits of individual alg improvements F Online newspapers promising domain for exploration
18
Future Work F More Experiments –more users, more time, real newspaper readers F Confidence in prediction –per-item weights –better restructuring –information to user F Implicit Ratings F Restructuring
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.