Collaborative Filtering With Decoupled Models for Preferences and Ratings Rong Jin 1, Luo Si 1, ChengXiang Zhai 2 and Jamie Callan 1 Language Technology Inst School of Computer Science Carnegie Mellon University 1 {rong, lsi, Dept of Computer Science University of Illinois at Urbana-Champaign 2
2 © CIKM 2003 Abstract Task: New algorithm to address an important problem of collaborative filtering systems and to improve the performance Outline: Introduction to collaborative filtering Previous work Decoupled Model (DM) of decoupling user preferences and ratings Experiment results Related work Conclusion and future work
3 © CIKM 2003 What is Collaborative Filtering? Collaborative Filtering (CF): Making recommendation decisions for a specific user based on the judgments of users with similar tastes Content-Based Filtering: Recommend by analyzing the content information Collaborative Filtering: Make recommendation by judgments of similar users
4 © CIKM 2003 Why Collaborative Filtering? Advantages of Collaborative Filtering: The contents of items belong to the third-party (not accessible or available) The contents of items are difficult to index or analyze (multimedia information etc) Applications:
5 © CIKM 2003 Formal Framework for Collaborative Filtering Test User U t 2 3 What we have: Assume there are some ratings by training users Test user provides some amount of additional training data What we do: Predict test user’s rating based training information R u t (O j ) = Training Users: U n O 1 O 2 O 3 ……O j ………… O M U1U2U1U2 UNUN UiUi Objects: O m 3 2 4
6 © CIKM 2003 Previous Work: Memory-Based Approaches Memory-Based Approaches: No training procedure Calculate similarities of training users to test user and predict with weighted average of training users’ ratings Pearson Correlation Coefficient Similarity Average Ratings Vector Space Similarity Prediction:
7 © CIKM 2003 Previous Work: Model-Based Approaches Model-Based Approaches: Aspect Model (Hofmann et al., 1999) –Model individual ratings as convex combination of preference factors Z R O U P(o|Z) P(Z) P(u|Z) P(r|Z) Personality Diagnosis Model (Pennock et al., 1999) –Hybrid of memory and model-based approach
8 © CIKM 2003 Previous Work: Thoughts Thoughts: Previous algorithms address the problem that users with similar tastes may have different rating patterns implicitly (Normalize user rating) Explicitly decouple users preference values out of the rating values Decoupled Model (DM) Nice Rating: 5 Mean Rating: 2 Nice Rating: 3 Mean Rating: 1
9 © CIKM 2003 Decoupled Model (DM) Decoupled Model (DM): Task: Separate preference values out of surface rating values Preference Value PV= PV=0.667 (0 disfavor,1 favor) PV=
10 © CIKM 2003 Decoupled Model (DM) Simple method User rating frequency vector Smoothed version User Rating Pattern Similarity
11 © CIKM 2003 Decoupled Model (DM) Memory-Based approaches with preference values Predict preference value on an object of test user User Preference Pattern Similarity Convert preference value back to rating value
12 © CIKM 2003 Experimental Data MovieRatingEachMovie Number of Users Number of Movies Avg. # of rated items/User Scale of ratings1,2,3,4,51,2,3,4,5,6 Datasets: MovieRating and EachMovie Evaluation: MAE: average absolute deviation of the predicted ratings to the actual ratings on items.
13 © CIKM 2003 Experimental Methodology Vary Number of Training User Test behaviors of algorithms with different amount of training data –For MovieRating 100 and 200 training users –For EachMovie 200 and 400 training users Vary Amount of Given Information from the Test User Test behaviors of algorithms with different amount of given information from test user –For both testbeds Vary among given 5, 10, or 20 items
14 © CIKM 2003 Experimental Results New Model and Other Baseline Algorithms Movie Rating, 100 Training Users Movie Rating, 200 Training Users Each Movie, 400 Training Users Each Movie, 200 Training Users Given: Given: Given: Given: MAEMAE MAEMAE MAEMAE MAEMAE PCC VS PD AM New Model
15 © CIKM 2003 Experimental Results Compare Two Methods of Computing Preference Values Training Users Size Algorithms 5 Items Given 10 Items Given 20 Items Given 100 Simple Smoothed Simple Smoothed Results on Movie Rating Training Users Size Algorithms 5 Items Given 10 Items Given 20 Items Given 200 Simple Smoothed Simple Smoothed Results on Each Movie
16 © CIKM 2003 Decoupled Model (DM) Simple method User rating frequency vector Smoothed version User Rating Pattern Similarity
17 © CIKM 2003 Experimental Results Compare Two Methods of Computing Preference Values Training Users Size Algorithms 5 Items Given 10 Items Given 20 Items Given 100 Simple Smoothed Simple Smoothed Results on Movie Rating Training Users Size Algorithms 5 Items Given 10 Items Given 20 Items Given 200 Simple Smoothed Simple Smoothed Results on Each Movie
18 © CIKM 2003 Conclusion and Future Work Conclusions: Propose the decoupled model –Explicitly extract preference values from the surface rating values –Combine the decoupled model with memory-based approach and improve the performance Our Related Work: Combine decoupled model with model-based approach (ICML’03) A more formal and unified probabilistic graphical model (UAI’03) Future Work: Combine content-based filtering and collaborative filtering recommendation methods together.