Presentation is loading. Please wait.

Presentation is loading. Please wait.

SWAMI Shared Wisdom Through Amalgamation of Many Interpretations.

Similar presentations


Presentation on theme: "SWAMI Shared Wisdom Through Amalgamation of Many Interpretations."— Presentation transcript:

1 SWAMI Shared Wisdom Through Amalgamation of Many Interpretations

2 Introduction What is collaborative filtering? Prediction and recommendation Goal End-to-end prediction for users Drop-in framework for evaluating multiple algorithms

3 Prediction Different Predictors Demo Evaluation Visualization Conclusion

4 Prediction Task: Predict how a user will rate an item based on ratings by other users. Accurately Efficiently Examined three techniques Pearson-correlation predictors (popular) Support Vector Method (sound theory) Pearson + clustering (better scalability)

5 Pearson Predictor Users U, movies M, votes V = {v u,m } Prediction for user u: pred u,m = mean u +   i  u w(u,i) (v i,m -mean u ) Weights are given by the Pearson correlation coefficient (Resnick, 1994): w(u,i) =  c (v u,c -mean u ) (v i,c -mean i ) / (std u std i )

6 Pearson Predictor Simple to understand Lots of opportunities for hacking Overlap penalty Neighborhood sizes Variance weighting and boosting unpopular sentiment

7

8 Support Vector Method Optimal Margin Classification Given a set of points {x i } with binary labels y i  {1,-1} Find a hyperplane that maximizes the min gap between the classes Solution: A classifier of the form C(x) =  i  i y i + b sgn C(x) => class Solve for the weight from “examples” * o * * * * o o o o o

9 Scalability Issues Pearson predictor Sample the underlying data set Cluster users to create neighborhoods or profiles SVMs Train off-line O(n 2 ) Embarrassingly parallel

10 DEMO http://bahama.cs.berkeley.edu/swami/inde x.html

11 Evaluation Prediction Demo Evaluation Baseline predictors Evaluation strategy Visualization Conclusion

12 Goal and Framework Goal: Framework for comparing prediction algorithms Test: Vary how many ratings the algorithm has from the user Framework: Baseline predictors Data Set Selection Metrics

13 Baseline Predictors Why? Simple Fast If you can’t beat them, algorithms need work User Average returns average of user’s votes Movie Average returns average of movie’s votes

14 Data Set Selection Vary amount of data data that the predictor sees Test users picked randomly from the remainder Random movie selection to test a variety of both high and low variance movies

15 Metrics Average Absolute Error average of absolute differences from “true” predictions  accuracy of predictions Variance of AAE  reliability of predictions Weighted Mean measure of error that weights errors made on high variance movies more strongly |(true – user_average) * (true – prediction)|

16 Sample Results http://www.cs.berkeley.edu/~mct//f99/sam pleevalchart.html

17 Visualization Prediction Demo Evaluation Visualization How to view the data as a programmer? How to view the data as a user? Conclusion

18 The Visualization Task Provide tools to the programmer to Understand data set Tune code to data set Evaluate code Provide tools to the user to Understand output results

19 EachMovie Dataset Source: DEC Systems Research Center Collected over 18 months, 1995-1997 74427 users, 1649 movies, 2.8 million votes Voting scale: 0 to 5 stars (integer) Facts Median votes: 26 per user; 379 per movie Most votes: 1,455 by a user; 32,864 for a film

20 The Raw Data

21 The Raw Data (II)

22

23

24 Means and Popularity

25 Variance and Mean

26 The Gray Blobs High dimensional visualization is hard Several attempts at PCA Eigenvectors (& results!) Multidimensional scaling Gauge set for low dimensions. Fuzzy blob? Or fuzzy glasses?

27 PCA for Gray Blobs

28 First Coordinate is Centrality

29 Second Eigenvector Il Postino Richard III Eat Drink Man Woman Jury Duty Bio-Dome Richie Riche

30 Third Eigenvector Bridge on the River Kwai (1957) 20,000 Leagues Under the Sea (1954) Ben-Hur (1959) I Shot Andy Warhol Mallrats Things to Do In Denver When You’re Dead

31 Fourth Eigenvector Black Beauty A Little Princess Little Women Beavis and Butthead Do America Private Parts Jackie Chan’s First Strike

32 Fifth Eigenvector First Wives Club Jane Eyre Emma Evil Dead II Heavy Metal Body Snatchers

33 Viz Scalability Good sampling is important Tends to be time consuming, even for small sets

34 Conclusion Prediction Demo Evaluation Visualization Conclusion

35 Summary Different predictors Consistent Evaluation Framework Data Analysis Informs Prediction and Evaluation

36 Possible Extensions User splits Gauge sets More ways to select test sets Chronological Effectiveness is biased by attributes? Fake user sets of known character

37 References & Related Work Tapestry MIT GroupLens (UMN) MSR Jester: reduces sparse many D -> 10 d - > 2d

38 SVM (cont’d) For each movie, we trained 6 binary classifiers to distinguish among the six votes (0 to 5) Mercer’s Theorem: Can replace dot product with any kernel function to get dot products in higher dimensional feature spaces Training: Use “sequential minimal optimization” (Platt, 1998) Issue: What about missing votes?


Download ppt "SWAMI Shared Wisdom Through Amalgamation of Many Interpretations."

Similar presentations


Ads by Google