Download presentation
Presentation is loading. Please wait.
Published byHarry Caldwell Modified over 8 years ago
1
SWAMI Shared Wisdom Through Amalgamation of Many Interpretations
2
Introduction What is collaborative filtering? Prediction and recommendation Goal End-to-end prediction for users Drop-in framework for evaluating multiple algorithms
3
Prediction Different Predictors Demo Evaluation Visualization Conclusion
4
Prediction Task: Predict how a user will rate an item based on ratings by other users. Accurately Efficiently Examined three techniques Pearson-correlation predictors (popular) Support Vector Method (sound theory) Pearson + clustering (better scalability)
5
Pearson Predictor Users U, movies M, votes V = {v u,m } Prediction for user u: pred u,m = mean u + i u w(u,i) (v i,m -mean u ) Weights are given by the Pearson correlation coefficient (Resnick, 1994): w(u,i) = c (v u,c -mean u ) (v i,c -mean i ) / (std u std i )
6
Pearson Predictor Simple to understand Lots of opportunities for hacking Overlap penalty Neighborhood sizes Variance weighting and boosting unpopular sentiment
8
Support Vector Method Optimal Margin Classification Given a set of points {x i } with binary labels y i {1,-1} Find a hyperplane that maximizes the min gap between the classes Solution: A classifier of the form C(x) = i i y i + b sgn C(x) => class Solve for the weight from “examples” * o * * * * o o o o o
9
Scalability Issues Pearson predictor Sample the underlying data set Cluster users to create neighborhoods or profiles SVMs Train off-line O(n 2 ) Embarrassingly parallel
10
DEMO http://bahama.cs.berkeley.edu/swami/inde x.html
11
Evaluation Prediction Demo Evaluation Baseline predictors Evaluation strategy Visualization Conclusion
12
Goal and Framework Goal: Framework for comparing prediction algorithms Test: Vary how many ratings the algorithm has from the user Framework: Baseline predictors Data Set Selection Metrics
13
Baseline Predictors Why? Simple Fast If you can’t beat them, algorithms need work User Average returns average of user’s votes Movie Average returns average of movie’s votes
14
Data Set Selection Vary amount of data data that the predictor sees Test users picked randomly from the remainder Random movie selection to test a variety of both high and low variance movies
15
Metrics Average Absolute Error average of absolute differences from “true” predictions accuracy of predictions Variance of AAE reliability of predictions Weighted Mean measure of error that weights errors made on high variance movies more strongly |(true – user_average) * (true – prediction)|
16
Sample Results http://www.cs.berkeley.edu/~mct//f99/sam pleevalchart.html
17
Visualization Prediction Demo Evaluation Visualization How to view the data as a programmer? How to view the data as a user? Conclusion
18
The Visualization Task Provide tools to the programmer to Understand data set Tune code to data set Evaluate code Provide tools to the user to Understand output results
19
EachMovie Dataset Source: DEC Systems Research Center Collected over 18 months, 1995-1997 74427 users, 1649 movies, 2.8 million votes Voting scale: 0 to 5 stars (integer) Facts Median votes: 26 per user; 379 per movie Most votes: 1,455 by a user; 32,864 for a film
20
The Raw Data
21
The Raw Data (II)
24
Means and Popularity
25
Variance and Mean
26
The Gray Blobs High dimensional visualization is hard Several attempts at PCA Eigenvectors (& results!) Multidimensional scaling Gauge set for low dimensions. Fuzzy blob? Or fuzzy glasses?
27
PCA for Gray Blobs
28
First Coordinate is Centrality
29
Second Eigenvector Il Postino Richard III Eat Drink Man Woman Jury Duty Bio-Dome Richie Riche
30
Third Eigenvector Bridge on the River Kwai (1957) 20,000 Leagues Under the Sea (1954) Ben-Hur (1959) I Shot Andy Warhol Mallrats Things to Do In Denver When You’re Dead
31
Fourth Eigenvector Black Beauty A Little Princess Little Women Beavis and Butthead Do America Private Parts Jackie Chan’s First Strike
32
Fifth Eigenvector First Wives Club Jane Eyre Emma Evil Dead II Heavy Metal Body Snatchers
33
Viz Scalability Good sampling is important Tends to be time consuming, even for small sets
34
Conclusion Prediction Demo Evaluation Visualization Conclusion
35
Summary Different predictors Consistent Evaluation Framework Data Analysis Informs Prediction and Evaluation
36
Possible Extensions User splits Gauge sets More ways to select test sets Chronological Effectiveness is biased by attributes? Fake user sets of known character
37
References & Related Work Tapestry MIT GroupLens (UMN) MSR Jester: reduces sparse many D -> 10 d - > 2d
38
SVM (cont’d) For each movie, we trained 6 binary classifiers to distinguish among the six votes (0 to 5) Mercer’s Theorem: Can replace dot product with any kernel function to get dot products in higher dimensional feature spaces Training: Use “sequential minimal optimization” (Platt, 1998) Issue: What about missing votes?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.