Presentation is loading. Please wait.

Presentation is loading. Please wait.

EigenTaste: A Constant Time Collaborative Filtering Algorithm Ken Goldberg Students: Theresa Roeder, Dhruv Gupta, Chris Perkins Industrial Engineering.

Similar presentations


Presentation on theme: "EigenTaste: A Constant Time Collaborative Filtering Algorithm Ken Goldberg Students: Theresa Roeder, Dhruv Gupta, Chris Perkins Industrial Engineering."— Presentation transcript:

1 EigenTaste: A Constant Time Collaborative Filtering Algorithm Ken Goldberg Students: Theresa Roeder, Dhruv Gupta, Chris Perkins Industrial Engineering and Operations Research Electrical Engineering and Computer Science UC Berkeley

2 CF Problem Definition A set of objects (movies, books, jokes) A user rates a subset of objects Based on the ratings, retrieve objects from the complement of this subset. Criteria: –Effective : recommended objects should receive high ratings –Efficient : the online recommendation process should run quickly and be scalable

3 Some Previous Work D. Goldberg, et al. - Tapestry (1992) Riedel, Resnick, Konstan et. al. - GroupLens(1994- ) Shardanand and Maes - Ringo (1995) Resnick and Varian (1997) Breese et. al. at Microsoft Research (1998) Pazzani (1999) Herlocker et. al. - GroupLens (1999)

4 WWW-based Recommender Systems Firefly MovieCritic MovieLens

5 EigenTaste Algorithm 1) Principal Component Analysis 2) Universal Queries (dense ratings matrix) 3) Fine-grained ratings bar (captures nuances) 4) Offline and Online Processing 5) Online: Constant time recommendations

6 Universal Queries Most CF systems require users to select which items they want to rate: sparse ratings matrix Eigentaste allows users to rate all items based on short unbiased descriptions (eg, film synopsis) Eigentaste uses a subset of highly discriminatory items for the gauge set

7 DisapproveApprove Continuous Rating Scale

8 EigenTaste Algorithm A is the n x m normalized rating matrix –n users –m objects C is the k x k reduced correlation matrix –k objects in the gauge set: –C = (1/n) A T A –assumes ratings are continuous with linear rel. E is the ortho. matrix of eigenvectors of C  is the diagonal matrix of eigenvalues

9 Correlation Matrix

10 EigenTaste ECE T =  C = E T  E Let B = AE T R B = (1/n) B T B = ECE T =  –transformed points are uncorrelated and each column of B has variance i Principle Components (Pearson 1901) –consider m largest eigenvectors, E m B m = AE m T choose m based on “knee” in eigenvalues

11 Dimensionality Reduction First two principal components (eigenvectors) account for nearly 50% of the variation in user ratings Project user ratings along first two principal components: x = AE 2 T Facilitates visualization...

12 Eigen Plane Recursive Clustering

13 The EigenTaste Algorithm Offline: –Compute eigenvectors and project users onto eigen plane. –Cluster and compute average ratings for each cluster. Online: –Collect ratings for objects in gauge set –Project onto the eigen plane –Find representative cluster –Recommend objects based on average ratings within that cluster

14 First Application (1999) Jester: Recommending Jokes Sense of humor is difficult to specify Advantages: –Rating process is not altogether unpleasant –Can evaluate jokes quickly: –Dense ratings matrix (large sample size) Disadvantages: –Offensive/Shaggy Dog jokes –Temporal Effects, Portfolio Effects –Priming/Masking

15 Jester: User Interface

16 System Architecture Client Web Server Recommendation Engine User Rating Profiles Content Database Internet CGI Login Interface CGI

17 Measure of Effectiveness Metric: Normalized Mean Absolute Error (NMAE): Average absolute deviation of actual ratings from predicted ratings, normalized over rating range. MAE = 1/c  |r - p| NMAE = MAE / (r_max - r_min)

18 Effectiveness Based on 18,000 users

19 Computational Complexity n - number of users k - number of objects in gauge set Nearest Neighborhood algorithm : Online processing - O(kn) EigenTaste algorithm: Offline processing - O(k 2 n) Online processing - O(k)

20 Effectiveness and Efficiency

21 Prediction Speed Algorithm Time to process 9000 users Nearest Neighbor 28 hours EigenTaste 3 minutes

22 Current Jester Dataset 62,000 registered users approx. 3,000,000 ratings

23 Second Application (2000) Sleeper: Recommending Books

24

25

26

27

28 EigenTaste Algorithm 1) Principal Component Analysis 2) Universal Queries (dense ratings matrix) 3) Fine-grained ratings bar (captures nuances) 4) Offline and Online Processing 5) Online: Constant time recommendations Patent application 21 December 1999 by UC Regents

29 www.cs.berkeley.edu/~goldberg goldberg@cs.berkeley.edu Eigentaste: A Constant Time Collaborative Filtering Algorithm (to appear: Information Retrieval Journal, 2001)


Download ppt "EigenTaste: A Constant Time Collaborative Filtering Algorithm Ken Goldberg Students: Theresa Roeder, Dhruv Gupta, Chris Perkins Industrial Engineering."

Similar presentations


Ads by Google