Regression Based Latent Factor Models Deepak Agarwal Bee-Chung Chen Yahoo! Research KDD 2009, Paris 6/29/2009
- 2 - OUTLINE Problem Definition –Predicting dyadic response exploiting covariate information Factorization models – Brief Overview Incorporating covariate information through regressions –Cold start and warm-start through a single model Closer look at induced correlations Fitting algorithms : Monte Carlo EM and Iterated CM Experiments –Movie Lens –Yahoo! Front Page Summary
- 3 - DYADIC DATA i = user; j = movie; y ij =rating (rest of the talk) DYAD (i,j) RESPONSE: y ij (Click rates, ratings) Queries, webpages, users i j ads, articles,movies COVARIATES X ij =(w i,x ij,z j )
- 4 - PROBLEM DEFINITION Models to predict ratings for new dyads –Warm-start: (user, movie) present in the training data –Cold-start: At least one of (user, movie) new Challenges –Highly incomplete (user, movie) matrix –Heavy tailed degree distributions for users/movies Large fraction of ratings from small fraction of users/movies –Handling both warm-start and cold-start effectively
- 5 - Possible approaches Large scale regression based on covariates –Does not provide good estimates for heavy users/movies –Large number of predictors to estimate interactions Collaborative filtering –Neighborhood based –Factorization (our approach in this paper) –Good for warm-start; cold-start dealt with separately Single model that handles cold-start and warm-start –Heavy users/movies → User/movie specific model –Light users/movies → fallback on regression model –Smooth fallback mechanism for good performance
- 6 - Factorization – Brief Overview Latent user factors: (α i, u i =(u i1,…,u ir )) (N + M)(r+1) parameters Key technical issue: Usual approach: Latent movie factors: (β j, v j =(v j1,….,v jr )) will overfit for moderate values of r Regularization Gaussian ZeroMean prior Interaction
- 7 - Existing Zero-Mean Factorization Model Observation Equation State Equation Predict for new dyad:
- 8 - Regression-based Factorization Model (RLFM) Main idea: Flexible prior, predict factors through regressions Seamlessly handles cold-start and warm-start Modified state equation to incorporate covariates
- 9 - Advantages of RLFM Better regularization of factors –Covariates “shrink” towards a better centroid Cold-start: Fallback regression model (FeatureOnly)
Graphical representation of the model
Advantages of RLFM illustrated on Yahoo! FP data Only the first user factor plotted in the comparisons
Induced correlations among observations Hierarchical random-effects model Marginal distribution obtained by integrating out random effects
Closer look at induced marginal correlations
Model Fitting Challenging, multi-modal posterior Monte-Carlo EM (MCEM) –E-step: Sample factors through Gibbs sampling –M-step: Estimate regressions through off-the-shelf linear regression routines using sampled factors as response We used t-regression, others like LASSO could be used Iterated Conditional Mode (ICM) –Replace E-step by CG : conditional modes of factors –M-step: Estimate regressions using the modes as response Incorporating uncertainty in factor estimates in MCEM helps
Monte Carlo E-step Through a vanilla Gibbs sampler (conditionals closed form) Other conditionals also Gaussian and closed form Conditionals of users (movies) sampled simultaneously Small number of samples in early iterations, large numbers in later iterations
M-step (Why MCEM is better than ICM) Update G, optimize Update A u =a u I Ignored by ICM, underestimates factor variability Factors over-shrunk, posterior not explored well
Experiment 1: Better regularization MovieLens-100K, avg RMSE using pre-specified splits ZeroMean, RLFM and FeatureOnly (no cold-start issues) Covariates: –Users : age, gender, zipcode (1 st digit only) –Movies: genres
Experiment 2: Better handling of Cold-start MovieLens-1M; EachMovie Training-test split based on timestamp Same covariates as in Experiment 1.
Experiment 3: Online updates help Covariates provide good initialization for new user/movie factors but updating factor estimates frequently (e.g. every hour) helps Dyn-RLFM –Estimate posterior mean and covariance at the end of MCEM by running large number of Gibbs iterations –For online updates, we do not change the posterior covariance but only adapt the posterior means through EWMA This is done by running small number of Gibbs iterations
Experiment 3: Continued
New Application: Today Module on Today Module Today Module is the top- center part Four tabs: Featured, Entertainment, Sports, and Video Featured: displays content from all categories Today Module: Routes traffic to other Y! pages, increases user engagement Defaults to the Featured Tab
Some More Background… Featured Tab in Detail Four articles on F1,F2,F3,F4 F1 article as story by default Footer click → corresponding article as story Click rates (CTR): Story clicks per display (maximize this) F1 → max exposure, large fraction of story clicks F1F2 F3F4 STORY POSITION FOOTER POSITION
Experiment 4: Predicting click-rate on articles Goal: Predict click-rate on articles for a user on F1 position Article lifetimes short, dynamic updates important User covariates: –Age, Gender, Geo, Browse behavior Article covariates –Content Category, keywords 2M ratings, 30K users, 4.5 K articles
Results on Y! FP data
Related Work Little work in a model based framework in the past –PDLF, KDD 07 (does not predict factors using covariates) Recent work at WWW 09 published in parallel –Matchbox: Bayesian online recommendation algorithm Both models same (motivation different), Estimation methods different –Matchbox based on variational Bayes, we conjecture the performance would be similar to the ICM method Some papers at ICML this year are also related –(not done with my reading yet)
Summary Regularizing factors through covariates effective We presented a regression based factor model that regularizes better and deals with both cold-start and warm- start in a single framework in a seamless way Fitting method scalable; Gibbs sampling for users and movies can be done in parallel. Regressions in M-step can be done with any off-the-shelf scalable linear regression routine Good results on benchmark data and a new Y! FP data
Ongoing Work Investigating various non-linear regressions in M-step Better MCMC sampling schemes for faster convergence Addressing model choice issues (through Bayes factors) Tensor factorization