Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology Gabriela Polčicová Pavol Návrat
Overview Information Filtering and its Types Combined Method Experiment with Information Filtering Methods Conclusions
Information Filtering (1) –delivery of relevant information to the people who need it Types of Information Filtering –Content-based - for textual documents –Collaborative - for communities of users Interests –information about interests - stored in profiles –expressing opinions to documents - ratings Ratings {i, j, r ij } –for user i, item j, the value of rating r ij
Information Filtering (2) Filter Learning interests Estimating the value of rating Choosing recommendations Rated items {user, item, value} Unrated items {user, item} Recommendations {user, item, estimation}
Content-based Filtering (1) Basic idea –recommending documents based on content and properties of document Profile –consists of keywords with assigned weights –only documents matching profile are recommended Recommendations –based on objective measurable properties
Content-based Filtering (2) Documents rated by the user Documents of interest Documents unrated by the user PROFILE Keywords, phrases with weights Documents matching profile => recommended documents Documents, ratings
Collaborative Filtering (1) Basic idea –automating “word of mouth” –leverage opinions of like-minded users while making decisions Schema –collecting users’ opinions –searching for like-minded users –making recommendations
Collaborative Filtering (2) Profile of current user Profile of user 1 Profile of user 2 Profile of user 3 Profile of user 4 Profile of user 5 Documents from like-minded users’ profiles => recommended documents
k ci = (r cj - r c ) (r ij - r i ) j I ci (r cj - r c ) 2 (r ij - r i ) 2 j I ci Recommendations computation: weighted sum of ratings r cj = r c + (r ij - r i ) k ci i U cj |k ci | i U cj Collaborative Filtering (3) Similarity measure: Pearson Correlation Coefficient
Combining Content-based and Collaborative Filtering (1) Computing of estimates for missing ratings by Content- based Filtering method for each user Searching for like-minded users –computing coefficient k ci between current and i-th user (only from ratings) –computing coefficient k ci ’ between current and i-th user (from both ratings and estimates) New recommendations computation –using ratings (with coefficients k ci ) and also ratings with estimates (with coefficient k ci ’) as weights in weighted sum of ratings and estimates
Datasets for Experiments Data: –EachMovie - users‘ ratings for movies –IMDB - textual information for CBF (movies‘ descriptions) Datasets: –A - ratings from the period up to Mar 1, 1996 (810 ratings from 71 users) –B - ratings from the period uo to Mar 15, 1996 (2407 ratings from 131 users) –C - ratings from the period up to Apr 1, 1996 (12290 ratings from 651 users)
EachMovie Data and Constant Method Constant Method r cj = 5
Experiments with Combination of Content- based and Collaborative Filtering (2) Dataset Divide dataset into training set (90%) and test set (10%) Apply filtering methods and evaluate their performance Content-based Filtering method Collaborative Filtering method Combined Filtering method recommendations test, training sets Evaluation of methods’ performance Constant method recommendations test set
Metrics Coverage = percentage of items for which the method is able to compute estimates Accuracy = F-measure = NMAE = 2.Precision.Recall Precision + Recall |R L| + |R L| |L| + |L| |R L| |R| |R L| |L| |r ij - r ij | n.s Precision = Recall = R - set of recommended items L - set of liked items
Results of Experiments
Conclusions Combination of content-based and collaborative filtering might help in initial phase Future work Weighting of coefficients Comparing method with additional methods
Content-based Filtering - Vector Representation of Documents and Profiles W j = (0, …, 0, 0.5, 0, …, 0, 0.3, 0, …, 0, 0.2, 0, …, 0) profile i = r j.w ij n j = 1 D = ( …, computer, …, learning, …, machine, …. ) Document j computer machine learning TF-IDF W. Profile |W|. |Profile| Sim(W, Profile) =
Collaborative Filtering - Example ABCDEFG current
k ci = (r cj - r c ) (r ij - r i ) j I ci (r cj - r c ) 2 (r ij - r i ) 2 j I ci Recommendations computation: weighted sum of ratings and estimates r cj = r c + (r ij - r i ) k ci + (r ij - r i ) k ci ’ i U cj CBF |k ci | + |k ci ’| i U’ cj i U cj i U’ cj Combining Content-based and Collaborative Filtering (2) Similarity measure: Pearson Correlation Coefficient ’ ’ ’’ CBF
Experiments with Combination of Content- based and Collaborative Filtering (1) Content-based Filtering Method (CBF) –documents and profiles: vector representation - weighted keywords (TF-IDF) –estimation computation: normalized dot product of document and profile vectors Collaborative Filtering (CF) –Pearson correlation coefficient –weighted sum of ratings Combination of CF and CBF –Pearson correlation coefficients –weighted sum of ratings and CBF estimations Constant Method (r cj = 5)