Download presentation
Presentation is loading. Please wait.
Published byDorothy Hampton Modified over 8 years ago
1
Yue Xu Shu Zhang
2
A person has already rated some movies, which movies he/she may be interested, too? If we have huge data of user and movies, this problem can be achieved by several high dimensional data algorithms.
3
Dataset MovieLens (A movie recommendation website) 100,000 ratings, 943 users, 1682 movies collected between September, 1997 to April, 1998 Information Ratings range from 1 to 5 as integers from most dislike to most like. Data has been cleared up, each user has rated at least 20 movies. Users information are given (age, gender, occupation) Movies simple genres are given.
4
Missing Data Data can be transformed to a 1682*943 matrix. Not all users have rated all of the movies. Objectives Kmeans Clustering & Sparse SVD algorithms to find out the missing ratings. Use Root Mean Standard Error (RMSE) to evaluate the result. Sparse Rating Matrix Stochastic Gradient Descent
5
Data Preprocessing Randomly choose 90% of the data as training data 10% of the data as test data Evaluation Compute the full rating matrix based on training data, then calculate RMSE on test data to evaluate the results.
6
Movie Data Movies have some simple genres in the raw data. 19 genres, 1 = yes; 0 = no 1682*19 matrix Clustering k = 4 (clusters) movie id | unknown | Action | Adventure | Animation |...
7
Get missing ratings by clustering 1. Add a column of movie cluster number to the user -movie, rating matrix. By locating movie IDs, the 4th column shows which cluster it is in. 2. Form a 943* 1682matrix by user ID and movie ID. Zeros are the missing ratings.
8
Get missing ratings by clustering 3. Whenever there are missing ratings (zero) in the matrix, locate the userID and cluster number 4. calculate the average rating of this cluster from this user. Use this value instead of zero in this matrix. RMSE = 1.119
9
This approach is an extension of SVD method. It models both users and movies by giving them coordinates in a low dimensional feature space. Set all missing values of rating matrix( 943*1682) as zero for modeling. R = U∑V T = MovieFactor * UserFactor T, MovieFactor = U, UserFactor T = ∑V T So that we find a low-rank movie factor and user factor based on the huge sparse rating matrix. Each rating is a inner product of movie vector and user vector. Therefore if we need to find out missing values of rating matrix, we need to calculate movie vector and user vector matrix.
10
Calculate MovieFactor (m) & UserFactor (u) Minimize sum of squared errors Regularization, to solve overfitting. Overfitting A statistical model describes random error or noise instead of the underlying relationship. μ = 0.1 to μ = 0.01 to find the minimum SSE μ = 0.03 is the optimum! Error Length
11
Regularization parameter chosen Lamda = 0; No regularization considered Training set error decrease a lot with increasing k. But test set error increases. Lamda = 0.2; Both training set and test set error decrease a lot with increasing k. So we apply lamda = 0.2!
12
Calculate MovieFactor (m) & UserFactor (u) 1. Initial value using SVD of r m = U (left singular vector) u T = ∑V T ( the product of singular value and the right singular vector) 2. Stochastic Gradient Descent e ij = 2 * (r ij – m i *u j ) m i = m i + μ ( e ij *u j – lamda*m i ), m i is the ith row of m u j = u j + μ ( e ij *m i – lamda*u j ), u j is the jth column of u t μ = 0.03 lamda =0.2 3. Iterate to get the final value.
13
Get Missing Values r = m*u T Evaluation RMSE = 0.95 after 40 iterations
14
Evaluation by RMSE - K-means Clustering = 1.15 There are only 4 clusters, a user would have same ratings for all movies in a cluster. Ratings are still missing when a user has not seen any movie in a cluster. - Stochastic Gradient Descent = 0.95, much better one! considers both user and movie factors
15
Computed the missing values for a very sparse matrix by two dimensional reduction algorithms. Stochastic Gradient Descent is much better than k-means clustering. A RMSE of 0.95 is achieved. Other methods may work, such as Bayes Network. This result can be useful to recommend movies to a user if he/she already rated some other movies.
16
Thank You For Listening! Any Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.