Regression Based Latent Factor Models Deepak Agarwal Bee-Chung Chen Yahoo! Research KDD 2009, Paris 6/29/2009.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Personalized Recommendation on Dynamic Content Using Predictive Bilinear Models Wei ChuSeung-Taek Park WWW 2009 Audience Science Yahoo! Labs.
Google News Personalization: Scalable Online Collaborative Filtering
Offline Components: Collaborative Filtering in Cold-start Situations.
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
- 1 - Intro to Content Optimization Yury Lifshits. Yahoo! Research Largely based on slides by Bee-Chung Chen, Deepak Agarwal & Pradheep Elango.
Boosting Approach to ML
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.
Markov-Chain Monte Carlo
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Probability based Recommendation System Course : ECE541 Chetan Tonde Vrajesh Vyas Ashwin Revo Under the guidance of Prof. R. D. Yates.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Presented by Zeehasham Rasheed
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Study of Sparse Online Gaussian Process for Regression EE645 Final Project May 2005 Eric Saint Georges.
Robin McDougall, Ed Waller and Scott Nokleby Faculties of Engineering & Applied Science and Energy Systems & Nuclear Science 1.
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Data mining and machine learning A brief introduction.
ICML’11 Tutorial: Recommender Problems for Web Applications Deepak Agarwal and Bee-Chung Chen Yahoo! Research.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
RecSys 2011 Review Qi Zhao Outline Overview Sessions – Algorithms – Recommenders and the Social Web – Multi-dimensional Recommendation, Context-
Evaluation Methods and Challenges. 2 Deepak Agarwal & Bee-Chung ICML’11 Evaluation Methods Ideal method –Experimental Design: Run side-by-side.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
- 1 - Recommender Problems for Content Optimization Deepak Agarwal Yahoo! Research MMDS, June 15 th, 2010 Stanford, CA.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Online Learning for Collaborative Filtering
Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]
A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models Russell Almond Florida State University College of Education Educational Psychology.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
David Stern, Thore Graepel, Ralf Herbrich Online Services and Advertising Group MSR Cambridge.
Predictive Discrete Latent Factor Models for large incomplete dyadic data Deepak Agarwal, Srujana Merugu, Abhishek Agarwal Y! Research MMDS Workshop, Stanford.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Cold Start Problem in Movie Recommendation JIANG CAIGAO, WANG WEIYAN Group 20.
Lecture 2: Statistical learning primer for biologists
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Tutorial I: Missing Value Analysis
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Personalized Recommendations using Discrete Choice Models with Inter- and Intra-Consumer Heterogeneity Moshe Ben-Akiva With Felix Becker, Mazen Danaf,
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.
RECITATION 2 APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Bayesian Semi-Parametric Multiple Shrinkage
Matchbox Large Scale Online Bayesian Recommendations
Learning Recommender Systems with Adaptive Regularization
Intro to Content Optimization
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
Probabilistic Models for Linear Regression
Roberto Battiti, Mauro Brunato
Predictive distributions
Probabilistic Models with Latent Variables
Robust Full Bayesian Learning for Neural Networks
Expectation-Maximization & Belief Propagation
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Wellington Cabrera Advisor: Carlos Ordonez
Presentation transcript:

Regression Based Latent Factor Models Deepak Agarwal Bee-Chung Chen Yahoo! Research KDD 2009, Paris 6/29/2009

- 2 - OUTLINE Problem Definition –Predicting dyadic response exploiting covariate information Factorization models – Brief Overview Incorporating covariate information through regressions –Cold start and warm-start through a single model Closer look at induced correlations Fitting algorithms : Monte Carlo EM and Iterated CM Experiments –Movie Lens –Yahoo! Front Page Summary

- 3 - DYADIC DATA i = user; j = movie; y ij =rating (rest of the talk) DYAD (i,j) RESPONSE: y ij (Click rates, ratings) Queries, webpages, users i j ads, articles,movies COVARIATES X ij =(w i,x ij,z j )

- 4 - PROBLEM DEFINITION Models to predict ratings for new dyads –Warm-start: (user, movie) present in the training data –Cold-start: At least one of (user, movie) new Challenges –Highly incomplete (user, movie) matrix –Heavy tailed degree distributions for users/movies Large fraction of ratings from small fraction of users/movies –Handling both warm-start and cold-start effectively

- 5 - Possible approaches Large scale regression based on covariates –Does not provide good estimates for heavy users/movies –Large number of predictors to estimate interactions Collaborative filtering –Neighborhood based –Factorization (our approach in this paper) –Good for warm-start; cold-start dealt with separately Single model that handles cold-start and warm-start –Heavy users/movies → User/movie specific model –Light users/movies → fallback on regression model –Smooth fallback mechanism for good performance

- 6 - Factorization – Brief Overview Latent user factors: (α i, u i =(u i1,…,u ir )) (N + M)(r+1) parameters Key technical issue: Usual approach: Latent movie factors: (β j, v j =(v j1,….,v jr )) will overfit for moderate values of r Regularization Gaussian ZeroMean prior Interaction

- 7 - Existing Zero-Mean Factorization Model Observation Equation State Equation Predict for new dyad:

- 8 - Regression-based Factorization Model (RLFM) Main idea: Flexible prior, predict factors through regressions Seamlessly handles cold-start and warm-start Modified state equation to incorporate covariates

- 9 - Advantages of RLFM Better regularization of factors –Covariates “shrink” towards a better centroid Cold-start: Fallback regression model (FeatureOnly)

Graphical representation of the model

Advantages of RLFM illustrated on Yahoo! FP data Only the first user factor plotted in the comparisons

Induced correlations among observations Hierarchical random-effects model Marginal distribution obtained by integrating out random effects

Closer look at induced marginal correlations

Model Fitting Challenging, multi-modal posterior Monte-Carlo EM (MCEM) –E-step: Sample factors through Gibbs sampling –M-step: Estimate regressions through off-the-shelf linear regression routines using sampled factors as response We used t-regression, others like LASSO could be used Iterated Conditional Mode (ICM) –Replace E-step by CG : conditional modes of factors –M-step: Estimate regressions using the modes as response Incorporating uncertainty in factor estimates in MCEM helps

Monte Carlo E-step Through a vanilla Gibbs sampler (conditionals closed form) Other conditionals also Gaussian and closed form Conditionals of users (movies) sampled simultaneously Small number of samples in early iterations, large numbers in later iterations

M-step (Why MCEM is better than ICM) Update G, optimize Update A u =a u I Ignored by ICM, underestimates factor variability Factors over-shrunk, posterior not explored well

Experiment 1: Better regularization MovieLens-100K, avg RMSE using pre-specified splits ZeroMean, RLFM and FeatureOnly (no cold-start issues) Covariates: –Users : age, gender, zipcode (1 st digit only) –Movies: genres

Experiment 2: Better handling of Cold-start MovieLens-1M; EachMovie Training-test split based on timestamp Same covariates as in Experiment 1.

Experiment 3: Online updates help Covariates provide good initialization for new user/movie factors but updating factor estimates frequently (e.g. every hour) helps Dyn-RLFM –Estimate posterior mean and covariance at the end of MCEM by running large number of Gibbs iterations –For online updates, we do not change the posterior covariance but only adapt the posterior means through EWMA This is done by running small number of Gibbs iterations

Experiment 3: Continued

New Application: Today Module on Today Module Today Module is the top- center part Four tabs: Featured, Entertainment, Sports, and Video Featured: displays content from all categories Today Module: Routes traffic to other Y! pages, increases user engagement Defaults to the Featured Tab

Some More Background… Featured Tab in Detail Four articles on F1,F2,F3,F4 F1 article as story by default Footer click → corresponding article as story Click rates (CTR): Story clicks per display (maximize this) F1 → max exposure, large fraction of story clicks F1F2 F3F4 STORY POSITION FOOTER POSITION

Experiment 4: Predicting click-rate on articles Goal: Predict click-rate on articles for a user on F1 position Article lifetimes short, dynamic updates important User covariates: –Age, Gender, Geo, Browse behavior Article covariates –Content Category, keywords 2M ratings, 30K users, 4.5 K articles

Results on Y! FP data

Related Work Little work in a model based framework in the past –PDLF, KDD 07 (does not predict factors using covariates) Recent work at WWW 09 published in parallel –Matchbox: Bayesian online recommendation algorithm Both models same (motivation different), Estimation methods different –Matchbox based on variational Bayes, we conjecture the performance would be similar to the ICM method Some papers at ICML this year are also related –(not done with my reading yet)

Summary Regularizing factors through covariates effective We presented a regression based factor model that regularizes better and deals with both cold-start and warm- start in a single framework in a seamless way Fitting method scalable; Gibbs sampling for users and movies can be done in parallel. Regressions in M-step can be done with any off-the-shelf scalable linear regression routine Good results on benchmark data and a new Y! FP data

Ongoing Work Investigating various non-linear regressions in M-step Better MCMC sampling schemes for faster convergence Addressing model choice issues (through Bayes factors) Tensor factorization