Distributed Networks & Systems Lab
Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent advances in CF Conclusion
Distributed Networks & Systems Lab Recommendation System Help users to discover new items that may be hard for users to find Subclass of information filtering system that seek to predict the ‘rating’ or ‘preference’ that user would give to an item Recommender systems identify recommendations autonomously for individual users based on past purchases and searches, and on other users' behavior
Distributed Networks & Systems Lab
Recommendation System Content-based Collaborative Filtering Hybrid based on a descripti on of the item and a profile of the user’s p reference Combination of collabora tive filtering and content- based approach based on collecting and analyz ing a large amount of informati on on users’ behaviors, activiti es or preferences and predicti ng what users will like based o n their similarity to other users.
Distributed Networks & Systems Lab Recommendation System Content-based Collaborative Filtering Hybrid Memory-based Model-basedHybrid
Distributed Networks & Systems Lab
Collaborative filtering has performance challenges from the distinguishable characteristics Data sparsity Scalability Synonymy Gray sheep Shilling attacks
Distributed Networks & Systems Lab In internet markets, the variation of products makes user-item matrix sparse. How to process sparse data and match?
Distributed Networks & Systems Lab Cold start problem A new user or item has just entered the system. Hard to find similar ones since there is not enough information Too small users’ ratings compared to the large number of items in the system Causes reduced coverage
Distributed Networks & Systems Lab Users with same tastes may not be indentified as such if there is no co- rated items
Distributed Networks & Systems Lab Dimensionality reduction techniques Singular Value Decomposition Removes unrepresentative or insignificant users or items to reduce the dimensionalities of the user-item based matrix directly Reduced sparsity, but some drawbacks Meaningful data also discarded Caused decrease in quality
Distributed Networks & Systems Lab Large size of data caused longer compute time under limited resources Dimensionality reduction can help this problem, but requires extra steps(matrix factorization) which has expensive cost Incremental SVD algorithm has been suggested to reduce the cost of the step
Distributed Networks & Systems Lab Same kind of products, different names “Children movie”, “children film” Memory based CF systems are vulnerable to this problem Attempts were made to solve this Intellectual or automatically term expansion could have partial solution, but has some drawbacks
Distributed Networks & Systems Lab Users that are not ordinary Hard to make prediction for them No full solution for this Per-user approach were made to reduce this problem
Distributed Networks & Systems Lab Intended increase in good rating and negative rating by the product sales company Item based CF algorithm was much less affected by the attacks than the user-based CF algorithm
Distributed Networks & Systems Lab Observing personal habit of users Privacy invasion Noise increase From increase in diversity Explainability Let users know the reason why the system recommends the specific item
Distributed Networks & Systems Lab Memorize the rating matrix and issue recommendations based on the relationship between the queried user and item and the rest of the matrix Uses the entire or a sample of the user-item database to make prediction Every user is part of a group of people with similar interests
Distributed Networks & Systems Lab Most popular memory-based CF method Predict ratings by referring to users whose ratings are similar to the queried user, or to items that are similar to queried item. Calculate similarity or weight then, Aggregate the neighbors to get the top-N most frequent items as the recommendation
Distributed Networks & Systems Lab Critical step For item-based CF Compute similarity between items For user-based CF Compute similarity between users u and v who have both rated the same items
Distributed Networks & Systems Lab To get the similarity W u,v between two users u and v W i,j between two items i and j Pearson Correlation is used to measure similarity Measures the linear independence between two variables(or users) as a function of their attributes
Distributed Networks & Systems Lab User-based algorithm i ∈ I summations are over the items that both the users u and v have rated, And is the average rating of the co-rated items of the u-th user. Item-based algorithm r u,I s is the rating of user u on item I, And is the rating of the i-th item by those users.
Distributed Networks & Systems Lab Used to find similarity between two documents each document as a vector of word frequencies Compute the cosine of the angle formed by the frequency vectors For collaborative filtering, Treat users or items as a vector of ratings and compute the cosine of the angle formed by the rating vectors
Distributed Networks & Systems Lab Similarity between two items i and j Example: For vector A={x1, y1}, vector B={x2, y2}
Distributed Networks & Systems Lab In the neighborhood-based CF, a subset of nearest neighbors of the active user are chosen based on their similarity with him or her and weighted aggregate of their ratings is used to generate predictions for the active user
Distributed Networks & Systems Lab To make prediction for active user a, on a certain item i, We can take a weighted average of all the ratings on that item by using this average ratings for the user a on all other ratings average ratings for the user u on all other ratings w a,u weight between the user a and user u
Distributed Networks & Systems Lab To predict the rating for U1 on I2,
Distributed Networks & Systems Lab For item-based prediction, We can use simple weighted average P u,i for user u on item i
Distributed Networks & Systems Lab To recommend a set of N top-ranked items that will be of interest to a certain user Returning customer may get the list of recommendation Top-N recommendation techniques analyze the user-item matrix to discover relations between different users or items and use them to compute recommendations Association rule mining can be used to make Top-N recommendations
Distributed Networks & Systems Lab The design and development of models (machine learning, data mining algorithms) can allow the system to learn to recognize the complex patterns based on training data and make predictions from learned models Classification algorithm can be used as CF models if the user ratings are categorical Regression models and SVD methods can be used for numerical ratings
Distributed Networks & Systems Lab Uses a naïve Bayes (NB) strategy to make predictions Assuming the features are independent given the class The probability of a certain class given all of the features can be computed Then class with the highest probability will be classified as the predicted classes
Distributed Networks & Systems Lab Shows better scalability Make predictions within much smaller clusters rather than the entire customer bse
Distributed Networks & Systems Lab Memory-based and model-based CF approaches are combined to from hybrid CF approaches Shows some improvement Probabilistic memory-based CF Personality diagnosis
Distributed Networks & Systems Lab Combined memory-based and model based To address the New user problem, an active learning extension to the PMCF system can be used to actively query a user for additional information. To reduce computation time, PMCF Selects a small subset, ‘profile space’ from the entire database of user ratings and make prediction from the small profile space, not the whole database Better accuracy than Pearson correlation-based CF Model based using naïve Bayes
Distributed Networks & Systems Lab Combined and keeps the both advantage Given the active user’s known ratings, we can calculate the probability that he or she is the same “personality type” as other users, and predict whether he will like the new items
Distributed Networks & Systems Lab