Download presentation
Presentation is loading. Please wait.
Published byJeffery Goodman Modified over 8 years ago
1
Company LOGO MovieMiner A collaborative filtering system for predicting Netflix user’s movie ratings [ECS289G Data Mining] Team Spelunker: Justin Becker, Philip Fisher-Ogden
2
The Problem Given a set of entries, predict the ratings values for unknown entries. Example: –X-Men, Philip, 5, 05-02-2007 –Spiderman 3, Philip, 4, 05-10-2007 –X-Men, Justin, 4, 04-05-2006 –Spiderman 3, Justin, ?, 02-28-2008 What rating do you predict Justin would give Spiderman 3?
3
Our Approach - Motivation Motivating Factors –Review current approaches taken by the Netflix prize top leaders –Leverage and extend existing libraries, to minimize the ramp-up time required to implement a working system –Utilize the UC Davis elvis cluster to alleviate any scale problems
4
What - Our Approach Collaborative Filtering (CF) –Weighted average of predictions from the following recommenders: Slope One recommender Item-based recommender User-based recommender
5
What - Our Approach Leveraging three CF recommenders –Similarities: Each uses prior preference information to predict values for unrated entries –Differences: How is the similarity between two entries computed? How are the neighbors selected? How are the interpolation weights determined? Image source: http://taste.sourceforge.net/
6
Why - Our Approach Why Collaborative Filtering? –“Those who agreed in the past tend to agree again in the future“ –Requires no external data sources –Uses k-Nearest-Neighbor approaches to predict the class (rating) of an unknown entry –Exists a full features CF Java library- Taste –CF is one of two main approaches used by the Netflix prize top leaders (with the other being SVD).
7
How – Slope One Recommender Introduced by Daniel Lemire and Anna Maclachlan Simple and accurate predictor Average difference between two items Weighted average to produce better results Number of user having rated both items
8
Ex: Slope One Recommender Average difference between X-Men and Spiderman 3 is 1. Justin's rating for Spiderman 3 is then 4+1=5 X-Men Spiderman 3Batman BeginsNacho Libre Justin4?54 Philip5342 Dan4455 Ian3433 Michael2315
9
How – User-based Recommender Predicts a user u’s rating for an item i: –Find the k nearest neighbors to the user u Similarity measure = Pearson correlation Missing preferences are inferred by using the user’s average rating –Interpolate between those in-common neighbors’ ratings for item i Interpolation weights = Pearson correlation Neighbors are ignored if they did not rate i
10
Ex: User-based Recommender X-Men Spiderman 3Batman BeginsNacho Libreavg Justin4?544.33333 Philip53423.5 Dan44554.5 Ian34333.25 Michael23152.75 centered data (user average) X-MenSpiderman 3Batman BeginsNacho Libre Eucl Norm Justin-0.3333?0.666666667-0.3333333330.816497 Philip1.5-0.50.5-1.52.236068 Dan-0.5 0.5 1 Ian-0.250.75-0.25 0.866025 Michael-0.750.25-1.752.252.95804
11
Ex: User-based Recommender Similarities are calculated using the Pearson correlation coefficient (on centered data): Interpolation between nearest neighbors produces the prediction: User-user similarities Justin-Phil0.182574186 Justin-Dan0.40824829 Justin-Ian-3.14018E-16 Justin-Michael-0.690065559 Prediction using 2-nearest neighbors Philip, Dan3.690983006 round(prediction)4
12
How – Item-based Recommender Predicts a user u’s rating for an item i: –Find the k most similar items to i Similarity measure = Pearson correlation –Keep only similar items also rated by u –Interpolate between the remaining items’ ratings Interpolation weights = Pearson correlation –Note: Item-item similarities allow for more efficient computations as cnt(items) << cnt(users) and, thus, the similarity matrix can be pre- computed and leveraged as needed.
13
Ex: Item-based Recommender X-Men Spiderman 3Batman BeginsNacho Libre Justin4?54 Philip5342 Dan4455 Ian3433 Michael2315 avg3.63.53.63.8 centered data (item average) X-Men Spiderman 3Batman BeginsNacho Libre Justin0.4?1.40.2 Philip1.4-0.50.4-1.8 Dan0.40.51.41.2 Ian-0.60.5-0.6-0.8 Michael-1.6-0.5-2.61.2 Eucl Norm2.2449913.039732.6
14
Ex: Item-based Recommender Similarities are calculated using the Pearson correlation coefficient (on centered data): Interpolation between nearest neighbors produces the prediction: Item-item similarities S-Xm0 S-BB0.493463771 S-NL0.192307692 Prediction from 2-nearest neighbors BB, NL4.719574665 round(prediction)5
15
Initial Results Bottom line: correct=91934, loss=319,710 Parameters used: 40% user, 60% item, 20 nearest neighbors ~97% scored with composite recommender (user,item) ~3% scored with random recommender RMSE 1.4445
16
Final Results Bottom line: correct=106,253, loss=236,523 Parameters used: 25% user, 5% item, 70% slope one, 20 nearest neighbors ~97% scored with composite recommender (user, item, slope one) ~3% scored with weighted average RMSE: 1.0871
17
Questions? ?
18
Conclusion
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.