SWAMI Shared Wisdom Through Amalgamation of Many Interpretations.

SWAMI Shared Wisdom Through Amalgamation of Many Interpretations

Introduction What is collaborative filtering? Prediction and recommendation Goal End-to-end prediction for users Drop-in framework for evaluating multiple algorithms

Prediction Different Predictors Demo Evaluation Visualization Conclusion

Prediction Task: Predict how a user will rate an item based on ratings by other users. Accurately Efficiently Examined three techniques Pearson-correlation predictors (popular) Support Vector Method (sound theory) Pearson + clustering (better scalability)

Pearson Predictor Users U, movies M, votes V = {v u,m } Prediction for user u: pred u,m = mean u +   i  u w(u,i) (v i,m -mean u ) Weights are given by the Pearson correlation coefficient (Resnick, 1994): w(u,i) =  c (v u,c -mean u ) (v i,c -mean i ) / (std u std i )

Pearson Predictor Simple to understand Lots of opportunities for hacking Overlap penalty Neighborhood sizes Variance weighting and boosting unpopular sentiment

Support Vector Method Optimal Margin Classification Given a set of points {x i } with binary labels y i  {1,-1} Find a hyperplane that maximizes the min gap between the classes Solution: A classifier of the form C(x) =  i  i y i + b sgn C(x) => class Solve for the weight from “examples” * o * * * * o o o o o

Scalability Issues Pearson predictor Sample the underlying data set Cluster users to create neighborhoods or profiles SVMs Train off-line O(n 2 ) Embarrassingly parallel

DEMO http://bahama.cs.berkeley.edu/swami/inde x.html

Evaluation Prediction Demo Evaluation Baseline predictors Evaluation strategy Visualization Conclusion

Goal and Framework Goal: Framework for comparing prediction algorithms Test: Vary how many ratings the algorithm has from the user Framework: Baseline predictors Data Set Selection Metrics

Baseline Predictors Why? Simple Fast If you can’t beat them, algorithms need work User Average returns average of user’s votes Movie Average returns average of movie’s votes

Data Set Selection Vary amount of data data that the predictor sees Test users picked randomly from the remainder Random movie selection to test a variety of both high and low variance movies

Metrics Average Absolute Error average of absolute differences from “true” predictions  accuracy of predictions Variance of AAE  reliability of predictions Weighted Mean measure of error that weights errors made on high variance movies more strongly |(true – user_average) * (true – prediction)|

Sample Results http://www.cs.berkeley.edu/~mct//f99/sam pleevalchart.html

Visualization Prediction Demo Evaluation Visualization How to view the data as a programmer? How to view the data as a user? Conclusion

The Visualization Task Provide tools to the programmer to Understand data set Tune code to data set Evaluate code Provide tools to the user to Understand output results

EachMovie Dataset Source: DEC Systems Research Center Collected over 18 months, 1995-1997 74427 users, 1649 movies, 2.8 million votes Voting scale: 0 to 5 stars (integer) Facts Median votes: 26 per user; 379 per movie Most votes: 1,455 by a user; 32,864 for a film

The Raw Data

The Raw Data (II)

Means and Popularity

Variance and Mean

The Gray Blobs High dimensional visualization is hard Several attempts at PCA Eigenvectors (& results!) Multidimensional scaling Gauge set for low dimensions. Fuzzy blob? Or fuzzy glasses?

PCA for Gray Blobs

First Coordinate is Centrality

Second Eigenvector Il Postino Richard III Eat Drink Man Woman Jury Duty Bio-Dome Richie Riche

Third Eigenvector Bridge on the River Kwai (1957) 20,000 Leagues Under the Sea (1954) Ben-Hur (1959) I Shot Andy Warhol Mallrats Things to Do In Denver When You’re Dead

Fourth Eigenvector Black Beauty A Little Princess Little Women Beavis and Butthead Do America Private Parts Jackie Chan’s First Strike

Fifth Eigenvector First Wives Club Jane Eyre Emma Evil Dead II Heavy Metal Body Snatchers

Viz Scalability Good sampling is important Tends to be time consuming, even for small sets

Conclusion Prediction Demo Evaluation Visualization Conclusion

Summary Different predictors Consistent Evaluation Framework Data Analysis Informs Prediction and Evaluation

Possible Extensions User splits Gauge sets More ways to select test sets Chronological Effectiveness is biased by attributes? Fake user sets of known character

References & Related Work Tapestry MIT GroupLens (UMN) MSR Jester: reduces sparse many D -> 10 d - > 2d

SVM (cont’d) For each movie, we trained 6 binary classifiers to distinguish among the six votes (0 to 5) Mercer’s Theorem: Can replace dot product with any kernel function to get dot products in higher dimensional feature spaces Training: Use “sequential minimal optimization” (Platt, 1998) Issue: What about missing votes?

SWAMI Shared Wisdom Through Amalgamation of Many Interpretations.

Similar presentations

Presentation on theme: "SWAMI Shared Wisdom Through Amalgamation of Many Interpretations."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SWAMI Shared Wisdom Through Amalgamation of Many Interpretations.

Similar presentations

Presentation on theme: "SWAMI Shared Wisdom Through Amalgamation of Many Interpretations."— Presentation transcript:

Similar presentations

About project

Feedback