Download presentation
Presentation is loading. Please wait.
1
EigenTaste: A Constant Time Collaborative Filtering Algorithm Ken Goldberg Students: Theresa Roeder, Dhruv Gupta, Chris Perkins Industrial Engineering and Operations Research Electrical Engineering and Computer Science UC Berkeley
2
CF Problem Definition A set of objects (movies, books, jokes) A user rates a subset of objects Based on the ratings, retrieve objects from the complement of this subset. Criteria: –Effective : recommended objects should receive high ratings –Efficient : the online recommendation process should run quickly and be scalable
3
Some Previous Work D. Goldberg, et al. - Tapestry (1992) Riedel, Resnick, Konstan et. al. - GroupLens(1994- ) Shardanand and Maes - Ringo (1995) Resnick and Varian (1997) Breese et. al. at Microsoft Research (1998) Pazzani (1999) Herlocker et. al. - GroupLens (1999)
4
WWW-based Recommender Systems Firefly MovieCritic MovieLens
5
EigenTaste Algorithm 1) Principal Component Analysis 2) Universal Queries (dense ratings matrix) 3) Fine-grained ratings bar (captures nuances) 4) Offline and Online Processing 5) Online: Constant time recommendations
6
Universal Queries Most CF systems require users to select which items they want to rate: sparse ratings matrix Eigentaste allows users to rate all items based on short unbiased descriptions (eg, film synopsis) Eigentaste uses a subset of highly discriminatory items for the gauge set
7
DisapproveApprove Continuous Rating Scale
8
EigenTaste Algorithm A is the n x m normalized rating matrix –n users –m objects C is the k x k reduced correlation matrix –k objects in the gauge set: –C = (1/n) A T A –assumes ratings are continuous with linear rel. E is the ortho. matrix of eigenvectors of C is the diagonal matrix of eigenvalues
9
Correlation Matrix
10
EigenTaste ECE T = C = E T E Let B = AE T R B = (1/n) B T B = ECE T = –transformed points are uncorrelated and each column of B has variance i Principle Components (Pearson 1901) –consider m largest eigenvectors, E m B m = AE m T choose m based on “knee” in eigenvalues
11
Dimensionality Reduction First two principal components (eigenvectors) account for nearly 50% of the variation in user ratings Project user ratings along first two principal components: x = AE 2 T Facilitates visualization...
12
Eigen Plane Recursive Clustering
13
The EigenTaste Algorithm Offline: –Compute eigenvectors and project users onto eigen plane. –Cluster and compute average ratings for each cluster. Online: –Collect ratings for objects in gauge set –Project onto the eigen plane –Find representative cluster –Recommend objects based on average ratings within that cluster
14
First Application (1999) Jester: Recommending Jokes Sense of humor is difficult to specify Advantages: –Rating process is not altogether unpleasant –Can evaluate jokes quickly: –Dense ratings matrix (large sample size) Disadvantages: –Offensive/Shaggy Dog jokes –Temporal Effects, Portfolio Effects –Priming/Masking
15
Jester: User Interface
16
System Architecture Client Web Server Recommendation Engine User Rating Profiles Content Database Internet CGI Login Interface CGI
17
Measure of Effectiveness Metric: Normalized Mean Absolute Error (NMAE): Average absolute deviation of actual ratings from predicted ratings, normalized over rating range. MAE = 1/c |r - p| NMAE = MAE / (r_max - r_min)
18
Effectiveness Based on 18,000 users
19
Computational Complexity n - number of users k - number of objects in gauge set Nearest Neighborhood algorithm : Online processing - O(kn) EigenTaste algorithm: Offline processing - O(k 2 n) Online processing - O(k)
20
Effectiveness and Efficiency
21
Prediction Speed Algorithm Time to process 9000 users Nearest Neighbor 28 hours EigenTaste 3 minutes
22
Current Jester Dataset 62,000 registered users approx. 3,000,000 ratings
23
Second Application (2000) Sleeper: Recommending Books
28
EigenTaste Algorithm 1) Principal Component Analysis 2) Universal Queries (dense ratings matrix) 3) Fine-grained ratings bar (captures nuances) 4) Offline and Online Processing 5) Online: Constant time recommendations Patent application 21 December 1999 by UC Regents
29
www.cs.berkeley.edu/~goldberg goldberg@cs.berkeley.edu Eigentaste: A Constant Time Collaborative Filtering Algorithm (to appear: Information Retrieval Journal, 2001)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.