Download presentation
Presentation is loading. Please wait.
1
Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)
2
Glasgow - 30th March 2008 EIIR 2008 2 Outline Introduction Background in Collaborative Filtering Proposed algorithms Experiments Conclusions
3
Glasgow - 30th March 2008 EIIR 2008 3 Introduction More and more information every day Personalized retrieval systems are quite interesting – Recommender systems: recommend items that would be more appropriate for the user’s needs or preferences – Useful in e-commerce, but we think they could be also useful in Web IR Recommender systems store some information about the user preferences User profile – Explicit or implicit
4
Glasgow - 30th March 2008 EIIR 2008 4 Introduction Types of recommender systems: – Content-based filtering: recommend items based on their content Depends on automatic analysis of the items Unable to determine the item quality Serendipitous find – Collaborative filtering: based on other users evaluations It will recommend items well considered by other users with similar interests Problems with computational performance and efficiency
5
Glasgow - 30th March 2008 EIIR 2008 5 Outline Introduction Background in Collaborative Filtering Proposed algorithms Experiments Conclusions
6
Glasgow - 30th March 2008 EIIR 2008 6 Background User profile: evaluations carried by the user Evaluation: numerical value (e.g. 1 – 5) Evaluation matrix: contains the evaluation of the users Types of collaborative filtering algorithms: – Memory-based: use similarity measures to predict related neighbours (users or items) The entire matrix is used in each prediction – Model-based: build a model that represents the user behaviour predict his evaluations The parameters of the model are estimated using the evaluation matrix (off-line)
7
Glasgow - 30th March 2008 EIIR 2008 7 Background Memory-based – Simple and give reasonably precise results – Low scalability – More sensitive to common recommender systems problems: sparsity, cold-start and spam. Model-based – Finds underlying characteristics in the data – Faster in prediction time – Complexity of the models: Sensitive to changes in the data High construction times Model updating when new data are available
8
Glasgow - 30th March 2008 EIIR 2008 8 Background: Notation i1i1 i2i2 u1u1 u2u2 …...... inin umum v 11 … …v 2n v m1 v m2 …........................ Items (I) Users (U) User profile (I 1 ) Users that have evaluated i 1 (U 1 ) Evaluation matrix (V) Prediction of evaluation of user m for item n (p mn ) v u. : evaluations of user u v. i : evaluations for item i Mean values: v u. and v. i
9
Glasgow - 30th March 2008 EIIR 2008 9 Outline Introduction Background in Collaborative Filtering Proposed algorithms Experiments Conclusions
10
Glasgow - 30th March 2008 EIIR 2008 10 Proposed algorithms Objectives: – Good behaviour in low density – Computational efficiency – Constant updating Item mean algorithm – Our base Use the mean of an item as its prediction – Simple mean based algorithm – The item mean is corrected with the mean of the user –
11
Glasgow - 30th March 2008 EIIR 2008 11 Proposed algorithms Tendencies based algorithm – Main idea: users tend to evaluate items positively or negatively Include tendencies in the formula – Tendency ≠ mean – Tendency of a user (ub u ) and tendency of an item (ib i ): – In this algorithm we use the mean of the item and the user as well as their respective tendencies.
12
Glasgow - 30th March 2008 EIIR 2008 12 Proposed algorithms Tendencies based algorithm
13
Glasgow - 30th March 2008 EIIR 2008 13 Outline Introduction Background in Collaborative Filtering Proposed algorithms Experiments Conclusions
14
Glasgow - 30th March 2008 EIIR 2008 14 Experiments Algorithms evaluated – Memory-based: user-based, item-based and similarity fusion – Model-based: regression based, slope one, latent semantic index and cluster based smoothing – Hybrid: personality diagnosis Dataset MovieLens – Real rating of films: 1 (very bad) – 5 (excellent) – 100,000 evaluations from 943 users for 1,682 movies (1.78 items evaluated/user). Density 6% – Training set: 10%, 50% and 90% For each algorithm we evaluated (5 times): – Training and prediction times – Quality of the predictions
15
Glasgow - 30th March 2008 EIIR 2008 15 Proposed algorithms Tendencies based algorithm Only 5% of the prediction with 10% training set 2% of the prediction with 90% training set This case represents some unusual elements Tendencies seem a good prediction mechanism
16
Glasgow - 30th March 2008 EIIR 2008 16 Experiments: Computational complexity Algorithm Training complexity Prediction complexity User Based-O(mn) Item-BasedO(mn²)O(n) Similarity FusionO(n²m + m²n)O(mn) Personality DiagnosisO(m²n)O(m) Regression BasedO(mn²)O(n) Slope OneO(mn²)O(n) Latent Semantic IndexingO((m+n)³)O(1) Cluster Based SmoothingO(mnα + m²n)O(mn) Item MeanO(mn)O(1) Simple Mean BasedO(mn)O(1) Tendencies BasedO(mn)O(1)
17
Glasgow - 30th March 2008 EIIR 2008 17 Experiments: Training time Algorithms 10%50%90% User Based000 Item Based4151,0601,986 Similarity Fusion9873,8405,474 Personality Diagnosis2579942,213 Regression Based3,3024,5757,780 Slope One1,2462,1752,541 Latent Semantic Indexing117,758115,218102,855 Cluster Based Smoothing60,24771,52944,635 Item Mean233 Simple Mean Based7105 Tendencies Based11159
18
Glasgow - 30th March 2008 EIIR 2008 18 Experiments: Prediction time Algorithms 10%50%90% User Based6,25015,5978,915 Item Based2211,864909 Similarity Fusion227,736756,834264,951 Personality Diagnosis1,3693,8451,400 Regression Based205570265 Slope One319501116 Latent Semantic Indexing16215820 Cluster Based Smoothing70,515251,595118,552 Item Mean24122 Simple Mean Based25114 Tendencies Based24164
19
Glasgow - 30th March 2008 EIIR 2008 19 Experiments: Prediction quality Algorithms 10%50%90% User Based0.990.710.68 Item Based0.920.750.71 Similarity Fusion0.840.730.71 Personality Diagnosis0.820.78 Regression Based1.030.760.74 Slope One0.900.720.70 Latent Semantic Indexing0.850.770.73 Cluster Based Smoothing0.970.870.80 Item Mean0.820.79 Simple Mean Based0.790.72 Tendencies Based0.790.720.71
20
Glasgow - 30th March 2008 EIIR 2008 20 Outline Introduction Background in Collaborative Filtering Proposed algorithms Experiments Conclusions
21
Glasgow - 30th March 2008 EIIR 2008 21 Conclusions We have presented a couple of algorithms for collaborative filtering: – Very simple Good response times – Tendencies based algorithm: Quality of the predictions equivalent to the best algorithms Even better in low density training sets Next steps: use these algorithms in Web IR – Problems: dataset?
22
Glasgow - 30th March 2008 EIIR 2008 22 Thank you! Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.