Training and Testing of Recommender Systems on Data Missing Not at Random Harald Steck at KDD, July 2010 Bell Labs, Murray Hill.

Slides:



Advertisements
Similar presentations
Trustworthy Service Selection and Composition CHUNG-WEI HANG MUNINDAR P. Singh A. Moini.
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Yehuda Koren , Joe Sill Recsys’11 best paper award
Active Learning and Collaborative Filtering
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 14: Embeddings 1Lecture 14: Embeddings.
Intro to RecSys and CCF Brian Ackerman 1. Roadmap Introduction to Recommender Systems & Collaborative Filtering Collaborative Competitive Filtering 2.
Visual Recognition Tutorial
2. Introduction Multiple Multiplicative Factor Model For Collaborative Filtering Benjamin Marlin University of Toronto. Department of Computer Science.
Evaluating Search Engine
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Collaborative Filtering Matrix Factorization Approach
CS 277: Data Mining Recommender Systems
CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.
Performance of Recommender Algorithms on Top-N Recommendation Tasks
Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.
Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering.
Yan Yan, Mingkui Tan, Ivor W. Tsang, Yi Yang,
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 6 Sampling Distributions.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
GAUSSIAN PROCESS FACTORIZATION MACHINES FOR CONTEXT-AWARE RECOMMENDATIONS Trung V. Nguyen, Alexandros Karatzoglou, Linas Baltrunas SIGIR 2014 Presentation:
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Efficient computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Anders Eriksson and Anton van den Hengel.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION ASEN 5070 LECTURE 11 9/16,18/09.
Investigation of Various Factorization Methods for Large Recommender Systems G. Takács, I. Pilászy, B. Németh and D. Tikk 10th International.
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
EigenRank: A ranking oriented approach to collaborative filtering By Nathan N. Liu and Qiang Yang Presented by Zachary 1.
Similarity & Recommendation Arjen P. de Vries CWI Scientific Meeting September 27th 2013.
Xutao Li1, Gao Cong1, Xiao-Li Li2
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
GeoMF: Joint Geographical Modeling and Matrix Factorization for Point-of-Interest Recommendation Defu Lian, Cong Zhao, Xing Xie, Guangzhong Sun, EnhongChen,
Machine Learning 5. Parametric Methods.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Collaborative Competitive Filtering: Learning recommender using context of user choices Shuang Hong Yang Bo Long, Alex Smola, Hongyuan Zha Zhaohui Zheng.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Innovation Team of Recommender System(ITRS) Collaborative Competitive Filtering : Learning Recommender Using Context of User Choice Keynote: Zhi-qiang.
Chapter 6 Sampling and Sampling Distributions
Collaborative Deep Learning for Recommender Systems
Hao Ma, Dengyong Zhou, Chao Liu Microsoft Research Michael R. Lyu
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Author(s): Rahul Sami, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial.
Ranking: Compare, Don’t Score Ammar Ammar, Devavrat Shah (LIDS – MIT) Poster ( No preprint), WIDS 2011.
Matrix Factorization and Collaborative Filtering
Learning Recommender Systems with Adaptive Regularization
Ch3: Model Building through Regression
CH 5: Multivariate Methods
Methods and Metrics for Cold-Start Recommendations
MATRIX FACTORIZATION TECHNIQUES FOR RECOMMENDER SYSTEMS
Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.
Author(s): Rahul Sami, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial.
Step-By-Step Instructions for Miniproject 2
Advanced Artificial Intelligence
Q4 : How does Netflix recommend movies?
Collaborative Filtering Matrix Factorization Approach
Probabilistic Latent Preference Analysis
Recommender Systems Problem formulation Machine Learning.
Presentation transcript:

Training and Testing of Recommender Systems on Data Missing Not at Random Harald Steck at KDD, July 2010 Bell Labs, Murray Hill

Copyright © 2010 Alcatel-Lucent. All rights reserved. 2 | Recommender Systems | July 2010 Overview items Real-World Problem: Make personalized recommendations to users that they find “relevant”: 1. from all items (in store) 2. pick a few for each user 3. with the goal: each user finds recommended items “relevant”. eg “relevant” = 5-star rating in Netflix data Define Data Mining Goal (how to test): - off-line test with historical rating data - high accuracy - RMSE on observed ratings (popular) - nDCG on observed ratings [Weimer et al. ‘08] Find (approximate) solution to Goal defined above: - choose model(s) - appropriate training-objective function - efficient optimization method approx.

Copyright © 2010 Alcatel-Lucent. All rights reserved. 3 | Recommender Systems | July 2010 Overview items Define Data Mining Goal (how to test): - off-line test with historical rating data - high accuracy - RMSE on observed ratings (popular) - nDCG on observed ratings [Weimer et al. ‘08] Find (approximate) solution to Goal defined above: - choose model(s) - appropriate training-objective function - efficient optimization method approx. this talk Real-World Problem: Make personalized recommendations to users that they find “relevant”: 1. from all items (in store) 2. pick a few for each user 3. with the goal: each user finds recommended items “relevant”. eg “relevant” = 5-star rating in Netflix data

Copyright © 2010 Alcatel-Lucent. All rights reserved. 4 | Recommender Systems | July 2010 Data (unknown) complete rating matrix items i users u

Copyright © 2010 Alcatel-Lucent. All rights reserved. 5 | Recommender Systems | July 2010 Data (unknown) complete rating matrix observed ratings (e.g., 1% in Netflix data) items i users u

Copyright © 2010 Alcatel-Lucent. All rights reserved. 6 | Recommender Systems | July 2010 Data (unknown) complete rating matrix observed ratings (e.g., 1% in Netflix data) missing-data mechanism -(General) missing-data mechanism cannot be ignored [Rubin ’76; Marlin et al. ’09,’08,’07]. - Missing at random [Rubin ’76; Marlin et al. ’09,’08,’07]: -Rating value has no effect on probability that it is missing -Correct results obtained by ignoring missing ratings. items i users u

Copyright © 2010 Alcatel-Lucent. All rights reserved. 7 | Recommender Systems | July 2010 Ratings are missing not at random (MNAR): Empirical Evidence Graphs from [Marlin & Zemel ‘09]: Survey: ask users to rate a random list of items: approximates complete data Typical Data: users are free to choose which items to rate -> available data are MNAR : instead of giving low ratings, users tend to not give a rating at all.

Copyright © 2010 Alcatel-Lucent. All rights reserved. 8 | Recommender Systems | July 2010 Overview items Real-World Problem: Make personalized recommendations to users that they find “relevant”: 1. from all items (in store) 2. pick a few for each user 3. with the goal: each user finds recommended items “relevant”. Define Data Mining Goal (how to test): - off-line test with historical rating data - high accuracy - RMSE, nDCG,… on observed ratings - Top-k Hit-Rate,… on all items Find (approximate) solution to Goal defined above: - choose model(s) - appropriate training-objective function - efficient optimization method approx. this talk

Copyright © 2010 Alcatel-Lucent. All rights reserved. 9 | Recommender Systems | July 2010 Test Performance Measures on MNAR Data -many popular performance measures cannot readily deal with missing ratings -only a few from among all items can be recommended -Top-k Hit Rate w.r.t. all items: - -

Copyright © 2010 Alcatel-Lucent. All rights reserved. 10 | Recommender Systems | July 2010 Test Performance Measures on MNAR Data -most popular performance measures cannot readily deal with missing ratings -only a few from among all items can be recommended -Top-k Hit Rate w.r.t. all items: - when comparing different rec. sys. on fixed data and fixed k: recall precision - under mild assumption: recall on MNAR data = unbiased estimate of recall on (unknown) complete data Assumption: The relevant ratings are missing at random. - -

Copyright © 2010 Alcatel-Lucent. All rights reserved. 11 | Recommender Systems | July 2010 Test Performance Measures on MNAR Data Top-k Hit-Rate: - depends on k - ignores ranking k 1 TOPK normalized w.r.t. # items

Copyright © 2010 Alcatel-Lucent. All rights reserved. 12 | Recommender Systems | July 2010 Test Performance Measures on MNAR Data Top-k Hit-Rate: - depends on k - ignores ranking Area under TOPK curve (ATOP): - independent of k - in [0,1], larger is better - captures ranking of all items - agrees with area under ROC curve in leading order if # relevant items << # items - unbiased estimate from MNAR data for unknown complete data under above assumption k 1 TOPK ATOP normalized w.r.t. # items

Copyright © 2010 Alcatel-Lucent. All rights reserved. 13 | Recommender Systems | July 2010 Overview items Real-World Problem: Make personalized recommendations to users that they find “relevant”: 1. from all items (in store) 2. pick a few for each user 3. with the goal: each user finds recommended items “relevant”. Define Data Mining Goal (how to test): - off-line test with historical rating data - high accuracy - TOPK, ATOP,… on all items approx. this talk Find (approximate) solution to Goal defined above: - choose model(s) - appropriate training objective function - efficient optimization

Copyright © 2010 Alcatel-Lucent. All rights reserved. 14 | Recommender Systems | July 2010 Matrix of predicted ratings: = r_m +. -rating offset: r_m -rank of matrices P,Q: dimension of low-dimensional latent space, eg d_0 = 50 Low-rank Matrix Factorization Model items i users u

Copyright © 2010 Alcatel-Lucent. All rights reserved. 15 | Recommender Systems | July 2010 Training Objective Function: AllRank minimal modification of usual least squares problem: - account for all items per user: observed and missing ratings - imputed value for missing ratings: r_m - create balanced training set: weights (1 if observed, w_m if missing) - (usual) regularization of matrix elements: lambda Efficient Optimization: - gradient descent by alternating least squares - tuning parameters r_m, w_m, lambda have to be optimized as well (eg w.r.t. ATOP)

Copyright © 2010 Alcatel-Lucent. All rights reserved. 16 | Recommender Systems | July 2010 Experimental Results on Netflix Data: Imputed Rating Value r_m - optimum for imputed value exists - optimal r_m 2 - optimal r_m may be interpreted as mean of missing ratings - exact imputation value < 2 is not critical - imputed value < observed mean observed mean ratings: 1…5 stars

Copyright © 2010 Alcatel-Lucent. All rights reserved. 17 | Recommender Systems | July 2010 Experimental Results on Netflix Data: Weight of Missing Ratings w_m w_m=1: standard SVD (plus penalty term), like in Latent Semantic Analysis w_m is optimal; compare to fraction of observed ratings = 0.01 w_m=0: ignores missing ratings, and is worst w.r.t. ATOP

Copyright © 2010 Alcatel-Lucent. All rights reserved. 18 | Recommender Systems | July 2010 Experimental Results on Netflix Data: Top-k Hit-Rate Comparison of Approaches: AllRank (RMSE = 1.106) ignore missing ratings (RMSE = 0.921)

Copyright © 2010 Alcatel-Lucent. All rights reserved. 19 | Recommender Systems | July 2010 Experimental Results on Netflix Data: Top-k Hit-Rate Comparison of Approaches: AllRank (RMSE = 1.106) ignore missing ratings (RMSE = 0.921) zoomed into top 2 %:

Copyright © 2010 Alcatel-Lucent. All rights reserved. 20 | Recommender Systems | July 2010 Experimental Results on Netflix Data: Top-k Hit-Rate Comparison of Approaches: AllRank (RMSE = 1.106) ignore missing ratings (RMSE = 0.921) integrated model [Koren ’08] (RMSE = 0.887) (trained to minimize RMSE) 39 % …………………………….…………… 50 % larger Top-k Hit-Rate: AllRank vs. integrated model zoomed into top 2 %: x x x x

Copyright © 2010 Alcatel-Lucent. All rights reserved. 21 | Recommender Systems | July 2010 Experimental Results on Netflix Data: Top-k Hit-Rate Comparison of Approaches: AllRank (RMSE = 1.106) ignore missing ratings (RMSE = 0.921) integrated model [Koren ’08] (RMSE = 0.887) (trained to minimize RMSE) 39 % …………………………….…………… 50 % larger Top-k Hit-Rate: AllRank vs. integrated model zoomed into top 2 %: x x x x Large increase in Top-k Hit-Rate when accounting also for missing ratings when training on MNAR data.

Copyright © 2010 Alcatel-Lucent. All rights reserved. 22 | Recommender Systems | July 2010 Related Work explicit feedback data (ratings): -improved RMSE on observed data also increases Top-k Hit-Rate on all items [Koren ’08] -ratings are missing not at random: -improved models: conditional RBM, NSVD1/2, SVD++ [Salakhutdinov ’07; Paterek ’07; Koren ‘08] -test on “complete” data, train multinomial mixture model on MNAR data [Marlin et al. ’07,’09] implicit feedback data (clickstream data, TV consumption, tags, bookmarks, purchases, …): -[Hu et al. ’07; Pan et al. ’07]: -binary data, only positives are observed -> missing ones assumed negatives -trained matrix-factorization model with weighted least-squares objective function -claimed difference to explicit feedback data: latter provides positive and negative observations

Copyright © 2010 Alcatel-Lucent. All rights reserved. 23 | Recommender Systems | July 2010 Conclusions and Future Work - considered explicit feedback data missing not at random (MNAR) - test performance measures: - close to real-world problem - unbiased on MNAR data (under mild assumption) - (Area under) Top-k Hit Rate,... - efficient surrogate objective function for training: - AllRank: accounting for missing ratings leads to large improvements in Top-k Hit-Rate Future Work: - better test performance measures, training objective functions and models - results obtained w.r.t. RMSE need not hold w.r.t. Top-k Hit-Rate on MNAR data, eg collaborative filtering vs content based methods

Copyright © 2010 Alcatel-Lucent. All rights reserved. 24 | Recommender Systems | July