Item-Based Collaborative Filtering Recommendation Algorithms Ali Hamie Jatin Saluja
What is collaborative filtering? Most successful recommendation technique to date. It is the idea of recommending an item or prediction depending on other like minded individuals. Consists of set of users,set of items, set of opinions about the item, ratings,reviews or purchases.
Two Types of Collaborative filtering algorithms Memory-based collaborative filtering algorithms which means it is user based. Model-based collaborative filtering algorithm which is item based and our current paper.
Memory based collaborative filtering This approach uses the entire user-item data set to make different “neighborhoods “ of users. These neighborhoods are users who like the same items or disliked the same items. An algorithm like user based collaborative filtering is then used to recommend an item to similar users.
Challenges of User -Based CF-Algorithms Sparsity: big data companies like amazon or CD now that recommends books and music. Scalability: Nearest neighbor algorithms grows with item and user data.
Item based recommendation system Instead of looking into users the item based looks into the items the user has rated and computes their similarity through different algorithms. It produces k most similar items. A prediction algorithm is ran to choose which item is the most similar.
Another Example
Cosine Based similarity. An algorithm to calculate the similarity between items. Similarity is computed by computing the cosine of the angle between two vectors. Here, items are vectors in m dimensional user space. How does this work?
Example: Consider the following texts: Julie loves me more than Linda loves me. Jane likes me more than Julie loves me. .
Example... me 2 2 Jane 0 1 Julie 1 1 Linda 1 0 likes 0 1 loves 2 1 more 1 1 than 1 1
Example... a: [2, 1, 0, 2, 0, 1, 1, 1] b: [2, 1, 1, 1, 1, 0, 1, 1] Two 8 dimensional vectors of the two texts. The cosine of their angle would be around 0.8. The closer to one the value the more similar it is.
Correlation based similarity This similarity measure is based on how much the ratings by common users for a pair of items deviate from average ratings for those items. Let the set of users who both rated i and j are denoted by U then the correlation similarity is given by R(u.i) = Rating of user u on item i R(i) = Average rating of the i-th item
Adjusted Cosine Similarity This similarity measurement is a modified form of cosine-based similarity where we take into the fact that different users have different ratings schemes. Some users might rate items highly in general, and others might give items lower ratings as a preference. To remove this drawback from cosine-based similarity, we subtract average ratings for each user from each user's rating for the pair of items in question:
Prediction Computation Once we make a model using one of the similarity measures described above, we can predict the rating for any user-item pair by using the idea of weighted sum.
Prediction Computation... First, we take all the items similar to our target item, and from those similar items, we pick items which the active user has rated. Then, we weight the user's rating for each of these items by the similarity between that and the target item. Finally, we scale the prediction by the sum of similarities to get a reasonable value for the predicted rating.
Advantages and Disadvantages More scalable Better memory No sparsity problems Can be done offline Item Similarity takes a long time Time Complexity O(n^3)
Experimental Data Movie Lens data set 100,000 ratings Matrix of users and items Data was read into an item hashtable and user hashtable
Similarity algorithms,Mean Absolute Error
Model Size Sensitivity
Conclusion User based CF Item based CF Adjusted Cosine has less error. Item based better performance and efficiency.
References http://stackoverflow.com/questions/1746501/can-someone-give-an- example-of-cosine-similarity-in-a-very-simple-graphical-wa https://cran.r- project.org/web/packages/recommenderlab/vignettes/recommenderlab.p df