Q4 : How does Netflix recommend movies? Networked Life: 20 Questions and Answers (M. Chiang, Princeton University) Q4 : How does Netflix recommend movies? Prof. Hongseok Kim
Netflix Business 1 : DVD Business 2 : Online streaming Rental business in 1997 Just wait for DVDs to arrive by mail Cannot receive a new DVD without returning the old one Sliding window Business 2 : Online streaming Streaming movies and TV programs Up to 23 million subscribers by April 2011
Examples Amazon: Content-based filtering YouTube: Co-visitation counts Pandora: Experts + Thumbs up or down Netflix: Collaborative-filtering
Input User ID Movie ID Rating Timing 𝒖 𝒊 {1, 2, 3, 4, 5} , 𝒓 𝒖𝒊 {1, 2, 3, 4, 5} , 𝒓 𝒖𝒊 Timing date of rating , 𝒕 𝒖𝒊
Output Predicted rating Example) Predicted rating : 4.2 User will rate 4 stars with 80% probability & 5 stars with 20% probability
Metric Customer satisfaction Prediction effectiveness Prediction error RMSE Hamming distance Hard to gather data C : (u,i) pairs The smaller the RMSE, the better the recommendation system
The Netflix Prize Objective October 2006 10% over Cinematch? Could recommendation accuracy be improved by 10% RMSE over what Netflix was using? October 2006 Open, online, international competition 10% over Cinematch? $1M and 100 Million data points 1999 ~ 2006(7 years) 480,000 users 17,770 movies Skewed, Sparse data
Data Sets Similar statistical properties Can be used by each competing team as often as they want At most once a day Final decision is based on comparison of RMSE on the test set
Timeline 5,000 teams 44,000 submissions
The problem Unknown ratings to be predicted (Only Netflix knows)
Challenges and solutions Large and sparse data Two main types of techniques for recommendation Content-based filter : Amazon Only looks at each row in isolation and attaches labels to the columns If you like a comedy with X, you will probably like another comedy with X Collaborative filter : Netflix Exploits all the data in the entire table Neighborhood method Compute a similarity score, Similar movies & users Latent factor method Hidden, low-dimensional structures
A few detours Least squares Convex optimization Implicit feedback Linear regressions Convex optimization Generalizes linear programs Implicit feedback Which movies she browsed, which ones she watched, and which ones she bothered to rate at all are all helpful hints Temporal dynamics Time-dependent parameters Allows the model to capture changes in a person’s taste and in trends of the movie market, as well as the mood of the day
Parameterized models
Baseline predictor Average predictor Baseline predictor RMSE = (𝑢,𝑖) 𝑟 𝑢,𝑖 𝐶 C= (u,i) pairs
RMSE minimization Condition : user1, two movie(A,B)
Least squares B에 대해 미분
Solution
Regularization Overfitting Regularization Least squares solutions often suffer from the overfitting problem Fits the known data in the training set so well that it loses the flexibility to adjust to a new data set Regularization A standard technique to avoid overfitting Minimize weight of parameters Original least square Trade-off parameter Penalty
After baseline predictor Error matrix Prediction matrix Actual rating matrix
Convex optimization Minimize convex objective function Least squares is a special case of convex optimization Subject to convex constraint set Easy in theory and in practice
Convex set (c) (d) (e) Which is a convex set?
Convex set Definition Most important property Separate by a line
Convex function Which is a convex function?
Convex function Second derivative test Hessian matrix of a function All eigenvalues of hessian matrix are non-negative Positive Semi Definite(PSD) 𝑓( 𝑥 1 , 𝑥 2 , … 𝑥 𝑛 ) ( 𝛻 2 𝑓) 𝑖𝑗 = 𝜕 2 𝑓 𝜕 𝑥 𝑖 𝜕 𝑦 𝑗
Neighborhood method From local to global structure Pairwise statistical correlation User-user Two similar people Movie-movie Two similar movie
Similarity metric Cosine coefficient
Neighborhood
Neighborhood predictor Baseline predictor + weighted sum of ratings from neighbor movies weight Similar movie Baseline predictor Normalize
Summary
Example test data training data
Baseline predictor min 30 training data 15 variables b 30x15
Prediction User Movie
Rating matrix(Estimated by the baseline predictor)
Prediction
Similarity Use the cosine coefficient to measure the similarity between movies represented in The entire similarity matrix
Neighborhood predictor
Prediction
Summary Netflix Prize is a special case of recommendation system Collaborative filter leverages similarities among users or among movies to make prediction Minimizing RMSE may lead to least squares A special case of convex optimization