Download presentation
Presentation is loading. Please wait.
Published byPenelope Gordon Modified over 9 years ago
1
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp. 87-94, 2010
2
Introduction Recommendation Systems Suggest items based on user preferences Recommendation Approaches: Content-based Items are recommended based on a user profile and product information Collaborative Filtering Use similarity to recommend items that were liked by similar users, i.e., recommendation is based on the rating history of the system Predict unknown ratings so that users can be given suggestions based on items with a high expected rating 2
3
Challenges Existing approaches are more adequate for static settings Incorporating new data to this models is not a trivial task Recommendations are based on the best predicted ratings However, predicting ratings is very computationally expensive in large datasets Euclidean embedded (EE) method for collaborative filtering Users and items are embedded in a unified Euclidean space The distance between a user and an item is inversely proportional to the rating 3 Solution
4
Euclidean Embedding (EE) Advantages of EE Is more intuitively understandable for human allowing useful visualizations Allows very efficient recommendation query implementation Facilitates online implementation requirements, e.g., mapping new users/items 4
5
Related Work Neighborhood/Memory-based CF Algorithms Item-based or user-based KNN associates to each user/item its set of NNs; predicts a user’s rating on an item using the ratings of its NNs Utilize the entire DB of user preferences when computing recommendations Model-based CF Algorithms Matrix Factorization & Non-Negative Matrix Factorization Compute a model of the preference data & use it to produce recommendations Find patterns based on training on a subset of the DB 5
6
Collaborative Filtering (CF) Given N users and M items In a model-based approach for CF The model is trained based on known ratings (training set) so that the prediction error is minimized Root mean squared error (RMSE) is a popular error function The objective function of a model-based CF, i.e., Matrix Factorization, approach is defined as r ui the rating of user u for item i is the prediction of the model for the rating of u for i w ui is 1 if r ui is known, and 0 otherwise 6
7
CF via Matrix Factorization CF via EE is similar to CF via matrix factorization (MF) The predicted rating, i.e.,, via MF is computed as μ is the total average of all ratings b u is the deviation of user u from the average b i is the deviation of item i from the average p u and q i are the user-factor and item-factor vector in a D- dimensional space, respectively, and p u q i ’ is the dot product of p u & q i ’ A higher p u q i ’ means u likes i more than average 7
8
CF via Matrix Factorization A gradient descent approach is used to solve CF problems with a highly sparse data matrix The goal is to minimize the following objective function where avoids overfitting the magnitude parameters, and λ is an algorithmic parameter The gradient descent updates for each known rating r ui are there are T, i.e., number of known rating steps, to go through all ratings in the training dataset 8 Current error for rating r ui Step size of the algorithm
9
CF via Euclidean Embedding All items & users are embedded in a unified Euclidean space The characteristics of each person/item is defined by its location If an item is close to a user in a unified space, its characteristics are attractive for the user 9 A user is expected to like an item which is close in the space
10
CF via Euclidean Embedding The predicted rating, i.e.,, via EE is computed as x u and y i are point vectors of user u and item i in a D- dimensional Euclidean space (x u - y i )(x u - y i )’ is the squared Euclidean distance The squared Euclidean distance is computationally cheaper while the accuracy remains the same 10
11
CF via Euclidean Embedding EE is a supervised learning approach The training phase involves finding the location of each item and user to minimize a loss function EE modifies the previous objective function (on Slide #7) Using gradient descent to minimize the EE objective function, updates in each step are defined as 11 step size
12
CF via Euclidean Embedding Time Complexity Training Prediction Recommendation Visualization 1. Implement CF via EE in a high-dimensional space 2. Select the top K items for an active user 3. Embed user, selected items, and some favorite items in a 2-dimensional space via multi-dimensional scaling (MDS), using distances for the high dimensional space in step 1 12 O(D), where D is the dimension of the space O(K-Nearest Neighbor) = O(N 2 )
13
CF via Euclidean Embedding Example. Using a low-dimensional unified user-item space, it is possible to represent items to users via a graphical interface 13 Representing close items ( ) to a user ( ) besides the movies he has already liked ( ) to assist him in selection
14
The search space for a query user ( ): EE searches for the K nearest neighbors while MF explores a large space CF via Euclidean Embedding Fast recommendation generation Mapped space allows candidate retrieval via neighborhood search The smaller the distance, the more desirable an item will be 14
15
CF via Euclidean Embedding Incorporating new users and items For a new user or item, there are D + 1 unknown values D for the vector p or q and 1 for the scalar b Active learning may be used by a recommender by asking new users to provide their favorite items Since the point vector of items in the space is known, and a new user is probably very close to his favorite items in the EE space, a user vector, x u, can be estimated as 15 Items that a new user u has selected as his favorites Number of selected items
16
Experimental Results Datasets used Netflix dataset consists of 17,770 movies, ~480,000 users, and ~100,000,000 ratings Dimension D = 50, regularization parameter = 0.005, & step side = 0.005 MovieLens dataset consists of 1,682 movies, 943 users, and 100,000 ratings Dimension D = 50, regularization parameter = 0.03, & step side = 0.005 16
17
Experimental Results Learning curve Test RMSE of EE & MF in each iteration of the gradient descent algorithm for five different folds MF is more prone to overfitting, since its error increases faster after it passes the optimal point 17
18
Experimental Results Dimension, accuracy, and time EE & MF give similar results in 5, 25, and 50 dimensions Precise & recall: rates 4 & 5 are considered desirable 18 EE performs better than MF
19
Experimental Results Visualization For a typical user, the top n movies are selected based on EE with D = n dimensions In the picture of EE, items are embedded based on the “taste” of the active user, while in the picture of MDS, it is based on the “tastes” of all users 19
20
Experimental Results Generating Fast Recommendations Generating new recommendations for a user using EE can be treated as a kNN search problem in a Euclidean space The table shows the top-10 recommendations to all users D(imension) = 50 In MF & EE, an exhaustive search was applied, whereas for EE-KNN, first 100 movies for each user were selected as candidates 20 Search time decreases significantly
21
Experimental Results New Users New users can be quickly mapped in the existing space MFa & EEa implement averaging for new users, whereas EEp represents the precision/recall values for the regular settings when the users are not new 21
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.