MATRIX FACTORIZATION TECHNIQUES FOR RECOMMENDER SYSTEMS Badsha Chandra Deepak Manoharan Nishant Negi
Recommender System Strategies Content Filtering Profile for each user or product. Movie profile User profiles Profiles associate users with matching products. Collaborative Filtering Relationships between users and interdependencies among products. More accurate than content-based techniques. Primary areas of collaborative filtering Neighbourhood methods Latent factor models.
Latent Factor Models Find features that describe the characteristics of rated objects Item characteristics and user preferences are described with numerical factor values Assumption: Ratings can be inferred from a model put together from a smaller number of parameters
Latent Factor Models Items and users are associated with a factor vector ■ Dot product captures the user’s estimated interest in the item: Challenge: How to compute a mapping of items and users to factor vectors? Approaches: □ Singular Value Decomposition (SVD) □ Matrix Factorization
Temporal Dynamics ■ Ratings may be affected by temporal effects □ Popularity of an item may change □ User’s identity and preferences may change ■ Modelling temporal affects can improve accuracy significantly ■ Rating predictions as a function of time:
Biases ■ Item or user specific rating variations are called biases ■ Example: □ Alice rates no movie with more than 2 (out of 5) □ Movie X is hyped and rated with 5 only ■ Matrix factorization allows modelling of biases ■ Including bias parameters in the prediction:
Confidence Interval Not all observed ratings deserve the same weight or confidence. For example, massive advertising might influence votes for certain items, which do not aptly reflect longer-term characteristics. A system might face adversarial users that try to tilt the ratings of certain items.
Learning Algorithms Stochastic gradient descent □ Calculation of the prediction error □ Error = actual rating – predicted rating □ Modification of parameters (qi , pu) relative to prediction error □ By magnitude proportional to γ □ In the opposite direction of the gradient Alternating least squares □ Fix one of the unknowns, the optimization problem becomes quadratic and can be solved optimally. □ Allows massive parallelization □ Better for densely filled matrices
Data for Experimentation user.dat movies.dat rating.dat
Matrix Factorization Methods Characterizes both items and users by vectors of factors inferred from item rating patterns. High correspondence between item and user factors leads to a recommendation. Input data placed in a matrix with one dimension representing users and the other dimension representing items of interest. Matrix factorization models map both users and items to a joint latent factor space of dimensionality. Advantages Good scalability with predictive accuracy. Offer much flexibility for modelling various real-life situations.
Singular Value Decomposition (SVD) Decomposes a matrix R into the best lower rank approximation of the original matrix R Mathematically, it decomposes R into two unitary matrices and a diagonal matrix: R=UΣVT R is user ratings matrix U is the user "features" matrix Σ is the diagonal matrix of singular values (essentially weights) VT is the movie "features" matrix
Singular Value Decomposition (SVD) Predictions matrix for every user. Build a function to recommend movies for any user. Return the movies with the highest predicted rating that the specified user hasn’t already rated. Advantage Scales significantly better to larger datasets. We can approximate the SVD with gradient descent.
Limitations of SVD Conventional SVD is undefined for incomplete matrices! Imputation to fill in missing values Increases the amount of data We need an approach that can simply ignore missing ratings.
Alternate Implementation – Content Filtering Data Cluster Group
Content Filtering Based on properties of items. Similarity of the items are determined by measuring the similarity of their properties. A profile for each item is constructed, (records representing important characteristics). Eg. The genres or movie type. Most viewers prefer movies based on genres Set of actors of the movie. Some viewers prefer movies with their favourite actors The genres are assigned based on movie reviews, for e.g. IMDB assigns genres to every movie. Implementation uses Hierarchical Clustering to group the related movies based on the genres. Based on a user’s preference of a movie, he would be recommended similar movies from the same cluster group, from which the rated movie belongs to. The code is implemented in R. Data: http://files.grouplens.org/datasets/movielens/ml-100k/u.item
Hybrid Recommendation(for Online Recommender Systems) High demand in the current industry The system uses four matrices User-user proximity matrix Item-item proximity matrix User-user similarity matrix Item-item similarity matrix Model-based as it combines the information from these four matrices in an ordinal logistic regression model to predict the ratings in a 5-point scale (say). Logic - express the recommendation problem as a discrete choice problem where the alternatives are provided by ordinal information from a 5-point scale, while the decision making, and the choices are described by a set of content-based and collaboration-based features. This is referred to some degree of closeness between the unknown user/item combination with the known user/item in two spaces i.e. the attribute (feature) space and the neighbourhood (memory) space.
Cont.. Who is the user similar to u2 ?? The value of linear correlation coefficient don’t preserve the information on the number of shared items. The Jaccard distance ignores the covariances from the rating scale altogether and preserving only the information about the extent of the shared ratings. Proximity measure describes user’s taste, via frequency distribution, by binarizing his/her ratings Similarity measure captures the content-based approach in terms of the attributes that describes a particular use of its similarity from other users in terms of same attributes Additionally, adding user demographics to the binary vector before computing the similarities too, would enrich the representation The final step is to use the neighbourhood of different size from these matrices (i.e. say X similar users from the content-based proximity and the memory based user-user similarity matrices, and Y similar items from the content-based proximity and the memory-based item-item similarity matrices) as regressors in an ordinal logistic model to predict a 5-point rating of novel user- item combinations.
Conclusion Content-based approach and Collaborative approach. Enhanced Matrix Factorization technique. Flexibility of incorporating various factors like biases, temporal effects, confidence intervals, implicit rating factors. Collaborative models a preferred choice over the content based model. Industry is progressing towards a more intuitive online version of the recommender systems Considers both proximity and similarity measures along with the incorporation of the demographics, temporal and other factors.
References Freitag, M., & Schwarz, J.-F. (2011, April). Matrix Factorization Techniques For Recommender Systems. Retrieved October 8, 2017, from https://hpi.de/fileadmin/user_upload/fachgebiete/naumann/lehre/SS2011/Collaborative_Filtering/p res1-matrixfactorization.pdf Isinkaye, F. O., Folajimi, Y. O., & Ojokoh, B. A. (2015). Recommendation systems: Principles, methods and evaluation. Egyptian Informatics Journal, 16(3), 261–273. https://doi.org/10.1016/j.eij.2015.06.005 Koren, Y., Bell, R., & Volinsky, C. (2009, August). Recommender-Systems-[Netflix].pdf. Retrieved October 5, 2017, from https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf Kovac, B. (2017, April). Hybrid Content-Based and Collaborative Filtering Recommendations: Part I - DZone Big Data. Retrieved October 8, 2017, from https://dzone.com/articles/hybrid-content-based- and-collaborative-filtering-r Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014, March). Mining of Massive Datasets - Chapter 9 Recommendation Systems. Retrieved October 4, 2017, from http://infolab.stanford.edu/~ullman/mmds/ch9.pdf
Thank you