Collaborative Filtering Zaffar Ahmed
Overview Method of making automatic predictions (filtering) about the interests of a user or item by collecting state information from many users or items (collaborating). It analyzes data which relies on using data from numerous sources to develop profiles of people who are related with similar tastes and spending habits. It is based on ‘word-of-mouth’ idea Gives reliable recommendations
Facts It needs a lot of stored data for reliable recommendations for the active user. Bigger population – more useful and effective recommendtions will be produced (Smart Mobs) Small data – shows false connections or poor predictions of active user tastes Suffers from cold start problem – database needs to be populated first.
Methodology Divided into two steps Look for users who share the same rating patterns with the active user Use the ratings from those like-minded users found in step 1 to calculate a prediction for the active user.
Types of Collaborative Filtering Memory-based: uses user rating data to compute similarity between users or items Neighborhood-based CF calculates similarity b/w two users or items, produces a prediction for the active user taking the weighted average of all the ratings. Item/user based top-N recommendations identifies the K most similar users using similarity based vector model. Locality sensing hashing: It implements nearest neighbor mechanism in linear time. Advantages: explainability of the results, 2) easy to create and use, 3) new data can be added easily and incrementally Disadvantages: 1) depends on human rating, 2) performance decreases when data gets sparse, 3) it can not handle new users or items
Types of Collaborative Filtering Model-based: models (ontologies) are developed using data mining, machine learning algorithms to find patterns based on training data. It has more holistic goal to uncover latent factors that explain observed ratings. Bayesian Networks Clustering models Latent semantic models Singular value decomposition Probabilistic latent semantic analysis Multiple multiplicative factor Latent dirichlet allocation Markov decision process based models Advantages Handles sparsity better than memory based algos: improves scalability and prediction performance. Disadvantages Expensive model building
Types of Collaborative Filtering Hybrid Combines model-based and memory-based CF algos. overcomes the limitations of native CF approaches. Advantages Improves prediction performance Disadvantages Increased complexity Expensive to implement
Thank you