Lessons from the Netflix Prize

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

By Klejdi Muca & Stephen Quinn. A method used by companies like IMDB or Netlfix to turn raw data into useful information, for example It helps companies.
Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.
Linear Regression.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007.
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
Lessons from the Netflix Prize Robert Bell AT&T Labs-Research In collaboration with Chris Volinsky, AT&T Labs-Research & Yehuda Koren, Yahoo! Research.
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.
Sean Blong Presents: 1. What are they…?  “[…] specific type of information filtering (IF) technique that attempts to present information items (movies,
G54DMT – Data Mining Techniques and Applications Dr. Jaume Bacardit
Rubi’s Motivation for CF  Find a PhD problem  Find “real life” PhD problem  Find an interesting PhD problem  Make Money!
Probability based Recommendation System Course : ECE541 Chetan Tonde Vrajesh Vyas Ashwin Revo Under the guidance of Prof. R. D. Yates.
Customizable Bayesian Collaborative Filtering Denver Dash Big Data Reading Group 11/19/2007.
Quest for $1,000,000: The Netflix Prize Bob Bell AT&T Labs-Research July 15, 2009 Joint work with Chris Volinsky, AT&T Labs-Research and Yehuda Koren,
CS 277: Data Mining Recommender Systems
Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.
Performance of Recommender Algorithms on Top-N Recommendation Tasks
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Matrix Factorization Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering.
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
By Rachsuda Jiamthapthaksin 10/09/ Edited by Christoph F. Eick.
Data Mining - Volinsky Columbia University 1 Topic 12 – Recommender Systems and the Netflix Prize.
Training and Testing of Recommender Systems on Data Missing Not at Random Harald Steck at KDD, July 2010 Bell Labs, Murray Hill.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Report #1 By Team: Green Ensemble AusDM 2009 ENSEMBLE Analytical Challenge: Rules, Objectives, and Our Approach.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Online Learning for Collaborative Filtering
Netflix Netflix is a subscription-based movie and television show rental service that offers media to subscribers: Physically by mail Over the internet.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan.
Temporal Diversity in Recommender Systems Neal Lathia, Stephen Hailes, Licia Capra, and Xavier Amatriain SIGIR 2010 April 6, 2011 Hyunwoo Kim.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Collaborative Filtering with Temporal Dynamics Yehuda Koren Yahoo Research Israel KDD’09.
Recommender Systems. Recommender Systems (RSs) n RSs are software tools providing suggestions for items to be of use to users, such as what items to buy,
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Online Social Networks and Media Recommender Systems Collaborative Filtering Social recommendations Thanks to: Jure Leskovec, Anand Rajaraman, Jeff Ullman.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Recommendation Systems By: Bryan Powell, Neil Kumar, Manjap Singh.
Innovation Team of Recommender System(ITRS) Collaborative Competitive Filtering : Learning Recommender Using Context of User Choice Keynote: Zhi-qiang.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Matrix Factorization and Collaborative Filtering
High dim. data Graph data Infinite data Machine learning Apps
Statistics 202: Statistical Aspects of Data Mining
Chapter 7. Classification and Prediction
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
CSE 4705 Artificial Intelligence
MATRIX FACTORIZATION TECHNIQUES FOR RECOMMENDER SYSTEMS
North Dakota State University Fargo, ND USA
Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
DATA MINING LECTURE 6 Dimensionality Reduction PCA – SVD
Collaborative Filtering Nearest Neighbor Approach
Step-By-Step Instructions for Miniproject 2
Q4 : How does Netflix recommend movies?
Link Prediction and Collaborative Filtering
North Dakota State University Fargo, ND USA
Ensembles.
Recommender Systems: Latent Factor Models
Matrix Factorization & Singular Value Decomposition
North Dakota State University Fargo, ND USA
Recommender Systems Group 6 Javier Velasco Anusha Sama
Presentation transcript:

Lessons from the Netflix Prize Yehuda Koren The BellKor team (with Robert Bell & Chris Volinsky) movie #15868 movie #7614 movie #3250

We Know What You Ought To Be Watching This Summer Recommender systems We Know What You Ought To Be Watching This Summer

Collaborative filtering Recommend items based on past transactions of users Analyze relations between users and/or items Specific data characteristics are irrelevant Domain-free: user/item attributes are not necessary Can identify elusive aspects

Movie rating data Training data Test data score movie user 1 21 5 213 4 345 2 123 3 768 76 45 568 342 234 6 56 score movie user ? 62 1 96 7 2 3 47 15 41 4 28 93 5 74 69 6 83

Netflix Prize Competition Training data 100 million ratings 480,000 users 17,770 movies 6 years of data: 2000-2005 Test data Last few ratings of each user (2.8 million) Evaluation criterion: root mean squared error (RMSE) Netflix Cinematch RMSE: 0.9514 Competition 2700+ teams $1 million grand prize for 10% improvement on Cinematch result $50,000 2007 progress prize for 8.43% improvement

Overall rating distribution Third of ratings are 4s Average rating is 3.68 From TimelyDevelopment.com

#ratings per movie Avg #ratings/movie: 5627

#ratings per user Avg #ratings/user: 208

Average movie rating by movie count More ratings to better movies From TimelyDevelopment.com

Most loved movies Count Avg rating Title 137812 4.593 The Shawshank Redemption 133597 4.545 Lord of the Rings: The Return of the King 180883 4.306 The Green Mile 150676 4.460 Lord of the Rings: The Two Towers 139050 4.415 Finding Nemo 117456 4.504 Raiders of the Lost Ark 180736 4.299 Forrest Gump 147932 4.433 Lord of the Rings: The Fellowship of the ring 149199 4.325 The Sixth Sense 144027 4.333 Indiana Jones and the Last Crusade

Important RMSEs erroneous Global average: 1.1296 User average: 1.0651 accurate User average: 1.0651 Movie average: 1.0533 Personalization Cinematch: 0.9514; baseline BellKor: 0.8693; 8.63% improvement Grand Prize: 0.8563; 10% improvement Inherent noise: ????

Challenges Size of data Missing data Avoiding overfitting Scalability Keeping data in memory Missing data 99 percent missing Very imbalanced Avoiding overfitting Test and training data differ significantly movie #16322

The BellKor recommender system Use an ensemble of complementing predictors Two, half tuned models worth more than a single, fully tuned model

The BellKor recommender system Use an ensemble of complementing predictors Two, half tuned models worth more than a single, fully tuned model But: Many seemingly different models expose similar characteristics of the data, and won’t mix well Concentrate efforts along three axes...

The three dimensions of the BellKor system Global effects The first axis: Multi-scale modeling of the data Combine top level, regional modeling of the data, with a refined, local view: k-NN: Extracting local patterns Factorization: Addressing regional effects Factorization k-NN

Multi-scale modeling – 1st tier Global effects: Mean rating: 3.7 stars The Sixth Sense is 0.5 stars above avg Joe rates 0.2 stars below avg Baseline estimation: Joe will rate The Sixth Sense 4 stars

Multi-scale modeling – 2nd tier Factors model: Both The Sixth Sense and Joe are placed high on the “Supernatural Thrillers” scale Adjusted estimate: Joe will rate The Sixth Sense 4.5 stars

Multi-scale modeling – 3rd tier Neighborhood model: Joe didn’t like related movie Signs Final estimate: Joe will rate The Sixth Sense 4.2 stars

The three dimensions of the BellKor system The second axis: Quality of modeling Make the best out of a model Strive for: Fundamental derivation Simplicity Avoid overfitting Robustness against #iterations, parameter setting, etc. Optimizing is good, but don’t overdo it! global local quality

The three dimensions of the BellKor system The third dimension will be discussed later... Next: Moving the multi-scale view along the quality axis global local ??? quality

Local modeling through k-NN Earliest and most popular collaborative filtering method Derive unknown ratings from those of “similar” items (movie-movie variant) A parallel user-user flavor: rely on ratings of like-minded users (not in this talk)

k-NN users 12 11 10 9 8 7 6 5 4 3 2 1 movies - unknown rating - rating between 1 to 5

k-NN users 12 11 10 9 8 7 6 5 4 3 2 1 ? movies - estimate rating of movie 1 by user 5

Neighbor selection: Identify movies similar to 1, rated by user 5 k-NN users 12 11 10 9 8 7 6 5 4 3 2 1 ? movies Neighbor selection: Identify movies similar to 1, rated by user 5

Compute similarity weights: s13=0.2, s16=0.3 k-NN users 12 11 10 9 8 7 6 5 4 3 2 1 ? movies Compute similarity weights: s13=0.2, s16=0.3

Predict by taking weighted average: (0.2*2+0.3*3)/(0.2+0.3)=2.6 k-NN users 12 11 10 9 8 7 6 5 4 3 2 1 2.6 movies Predict by taking weighted average: (0.2*2+0.3*3)/(0.2+0.3)=2.6

Properties of k-NN Intuitive No substantial preprocessing is required Easy to explain reasoning behind a recommendation Accurate?

k-NN on the RMSE scale erroneous Global average: 1.1296 accurate Global average: 1.1296 User average: 1.0651 Movie average: 1.0533 0.96 Cinematch: 0.9514 k-NN 0.91 BellKor: 0.8693 Grand Prize: 0.8563 Inherent noise: ????

baseline estimate for rui k-NN - Common practice Define a similarity measure between items: sij Select neighbors -- N(i;u): items most similar to i, that were rated by u Estimate unknown rating, rui, as the weighted average: baseline estimate for rui

k-NN - Common practice Define a similarity measure between items: sij Select neighbors -- N(i;u): items similar to i, rated by u Estimate unknown rating, rui, as the weighted average: Problems: Similarity measures are arbitrary; no fundamental justification Pairwise similarities isolate each neighbor; neglect interdependencies among neighbors Taking a weighted average is restricting; e.g., when neighborhood information is limited

Interpolation weights Use a weighted sum rather than a weighted average: (We allow ) Model relationships between item i and its neighbors Can be learnt through a least squares problem from all other users that rated i:

Interpolation weights Mostly unknown Interpolation weights derived based on their role; no use of an arbitrary similarity measure Explicitly account for interrelationships among the neighbors Challenges: Deal with missing values Avoid overfitting Efficient implementation Estimate inner-products among movie ratings

Latent factor models serious Braveheart The Color Purple Amadeus Lethal Weapon Sense and Sensibility Geared towards females Ocean’s 11 Geared towards males Dave The Lion King Dumb and Dumber The Princess Diaries Independence Day Gus escapist

Latent factor models ~ ~ users items users items 4 5 3 1 2 items ~ users .2 -.4 .1 .5 .6 -.5 .3 -.2 2.1 1.1 -2 -.7 .7 -1 -.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 -.2 1.1 1.3 -.1 1.2 -.7 2.9 -1 .7 -.8 .1 -.6 .4 -.3 .9 1.7 .6 2.1 ~ items A rank-3 SVD approximation

Estimate unknown ratings as inner-products of factors: users 4 5 3 1 2 ? items ~ users .2 -.4 .1 .5 .6 -.5 .3 -.2 2.1 1.1 -2 -.7 .7 -1 -.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 -.2 1.1 1.3 -.1 1.2 -.7 2.9 -1 .7 -.8 .1 -.6 .4 -.3 .9 1.7 .6 2.1 ~ items A rank-3 SVD approximation

Estimate unknown ratings as inner-products of factors: users 4 5 3 1 2 ? items ~ users .2 -.4 .1 .5 .6 -.5 .3 -.2 2.1 1.1 -2 -.7 .7 -1 -.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 -.2 1.1 1.3 -.1 1.2 -.7 2.9 -1 .7 -.8 .1 -.6 .4 -.3 .9 1.7 .6 2.1 ~ items A rank-3 SVD approximation

Estimate unknown ratings as inner-products of factors: users 4 5 3 1 2 2.4 items ~ users .2 -.4 .1 .5 .6 -.5 .3 -.2 2.1 1.1 -2 -.7 .7 -1 -.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 -.2 1.1 1.3 -.1 1.2 -.7 2.9 -1 .7 -.8 .1 -.6 .4 -.3 .9 1.7 .6 2.1 ~ items A rank-3 SVD approximation

Latent factor models ~ Properties: .2 -.4 .1 .5 .6 -.5 .3 -.2 2.1 1.1 -2 -.7 .7 -1 4 5 3 1 2 ~ -.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 -.2 1.1 1.3 -.1 1.2 -.7 2.9 -1 .7 -.8 .1 -.6 .4 -.3 .9 1.7 .6 2.1 Properties: SVD isn’t defined when entries are unknown  use specialized methods Very powerful model  can easily overfit, sensitive to regularization Probably most popular model among contestants 12/11/2006: Simon Funk describes an SVD based method 12/29/2006: Free implementation at timelydevelopment.com

Factorization on the RMSE scale Global average: 1.1296 User average: 1.0651 Movie average: 1.0533 Cinematch: 0.9514 0.93 factorization BellKor: 0.8693 0.89 Grand Prize: 0.8563 Inherent noise: ????

Our approach User factors: Model a user u as a vector pu ~ Nk(, ) Movie factors: Model a movie i as a vector qi ~ Nk(γ, Λ) Ratings: Measure “agreement” between u and i: rui ~ N(puTqi, ε2) Maximize model’s likelihood: Alternate between recomputing user-factors, movie-factors and model parameters Special cases: Alternating Ridge regression Nonnegative matrix factorization

Combining multi-scale views Residual fitting Weighted average factorization global effects regional effects k-NN local effects A unified model factorization k-NN

Localized factorization model Standard factorization: User u is a linear function parameterized by pu rui = puTqi Allow user factors – pu – to depend on the item being predicted rui = pu(i)Tqi Vector pu(i) models behavior of u on items like i

Results on Netflix Probe set More accurate

multi-scale modeling

Seek alternative perspectives of the data Can exploit movie titles and release year But movies side is pretty much covered anyway... It’s about the users! Turning to the third dimension... global local ??? quality

The third dimension of the BellKor system A powerful source of information: Characterize users by which movies they rated, rather than how they rated  A binary representation of the data users users 4 5 3 1 2 1 movies movies

The third dimension of the BellKor system Great news to recommender systems: Works even better on real life datasets (?) Improve accuracy by exploiting implicit feedback Implicit behavior is abundant and easy to collect: Rental history Search patterns Browsing history ... Implicit feedback allows predicting personalized ratings for users that never rated! movie #17270

The three dimensions of the BellKor system Where do you want to be? All over the global-local axis Relatively high on the quality axis Can we go in between the explicit-implicit axis? Yes! Relevant methods: Conditional RBMs [ML@UToronto] NSVD [Arek Paterek] Asymmetric factor models global local ratings explicit binary implicit quality

Lessons What it takes to win: Think deeper – design better algorithms Think broader – use an ensemble of multiple predictors Think different – model the data from different perspectives At the personal level: Have fun with the data Work hard, long breath Good teammates Rapid progress of science: Availability of large, real life data Challenge, competition Effective collaboration movie #13043

BellKor homepage: www.research.att.com/~volinsky/netflix/ 4 5 3 1 2 3 4 2 1 5 3 2 5 4 Yehuda Koren AT&T Labs – Research yehuda@att.com BellKor homepage: www.research.att.com/~volinsky/netflix/