Neural Collaborative Filtering

Slides:



Advertisements
Similar presentations
Recommender System A Brief Survey.
Advertisements

Neural networks Introduction Fitting neural networks
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Intro to RecSys and CCF Brian Ackerman 1. Roadmap Introduction to Recommender Systems & Collaborative Filtering Collaborative Competitive Filtering 2.
x – independent variable (input)
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Adapting Deep RankNet for Personalized Search
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Yan Yan, Mingkui Tan, Ivor W. Tsang, Yi Yang,
User Profiling based on Folksonomy Information in Web 2.0 for Personalized Recommender Systems Huizhi (Elly) Liang Supervisors: Yue Xu, Yuefeng Li, Richi.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
GAUSSIAN PROCESS FACTORIZATION MACHINES FOR CONTEXT-AWARE RECOMMENDATIONS Trung V. Nguyen, Alexandros Karatzoglou, Linas Baltrunas SIGIR 2014 Presentation:
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Xutao Li1, Gao Cong1, Xiao-Li Li2
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
NTU & MSRA Ming-Feng Tsai
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Unsupervised Streaming Feature Selection in Social Media
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Deep Residual Learning for Image Recognition
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Collaborative Deep Learning for Recommender Systems
Tagommenders: Connecting Users to Items through Tags Written by Shilad Sen, Jesse Vig, and John Riedl (2009) Presented by Ken Hu and Hassan Hattab.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Big data classification using neural network
He Xiangnan Research Fellow National University of Singapore
Convolutional Sequence to Sequence Learning
Dimensionality Reduction and Principle Components Analysis
TJTS505: Master's Thesis Seminar
Collaborative Filtering for Implicit Feedback
Deep Feedforward Networks
Fall 2004 Perceptron CS478 - Machine Learning.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Learning Recommender Systems with Adaptive Regularization
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Deep Predictive Model for Autonomous Driving
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Nikolay Karpov Pavel Shashkin National Research University Higher School of Economics 5th Int. Workshop on News Recommendation and Analytics (INRA.
Regularizing Face Verification Nets To Discrete-Valued Pain Regression
Multimodal Learning with Deep Boltzmann Machines
Compositional Human Pose Regression
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Intelligent Information System Lab
Collective Network Linkage across Heterogeneous Social Platforms
Deep Residual Learning for Image Recognition
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Learning with information of features
Two-Stream Convolutional Networks for Action Recognition in Videos
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
Deep Learning Hierarchical Representations for Image Steganalysis
[Figure taken from googleblog
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
Designing Neural Network Architectures Using Reinforcement Learning
GANG: Detecting Fraudulent Users in OSNs
Neural networks (1) Traditional multi-layer perceptrons
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Human-object interaction
Aesthetic-based Clothing Recommendation
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Modeling IDS using hybrid intelligent systems
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Adversarial Personalized Ranking for Recommendation
Learning and Memorization
Relational Collaborative Filtering:
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Presentation transcript:

Neural Collaborative Filtering He Xiangnan1, Liao Lizi1, Zhang Hanwang1, Nie Liqiang2, Hu Xia3, Tat-Seng Chua1 April 05, 2017 @ WWW 2017 Presented by Xiangnan He 1 National University of Singapore 2 Shandong University, China 3 Texas A&M University

Matrix Factorization (MF) MF is a linear latent factor model: items 1 User 'u' interacted with item 'i' users Learn latent vector for each user, item: 0/1 Interaction matrix Matrix factorization is known to be the most simple yet effective model for the collaborative filtering task. Basically, it models each user and item as a low-dimentional latent vector, and estimates an interaction between user and item as the inner product of their latent vectors. Affinity between user ‘u’ and item ‘i’:

Limitation of Matrix Factorization The simple choice of inner product function can limit the expressiveness of a MF model. Example: (E.g., assuming a unit length) S42 > S43 (X) u1 S42 > S43 (X) sim(u1, u2) = 0.5 u2 sim(u3, u1) = 0.4 sim(u3, u2) = 0.66 u3 sim(u4, u1) = 0.6 ***** sim(u4, u2) = 0.2 * sim(u4, u3) = 0.4 *** Jaccard Similarity:

Limitation of Matrix Factorization The simple choice of inner product function can limit the expressiveness of a MF model. Example: The inner product can incur a large ranking loss for MF How to address? - Using a large number of latent factors; however, it may hurt the generalization of the model (e.g. overfitting) Our solution: Learning the interaction function from data! Rather than the simple, fixed inner product. (E.g., assuming a unit length) S42 > S43 (X) u1 S42 > S43 (X) sim(u1, u2) = 0.5 u2 sim(u3, u1) = 0.4 sim(u3, u2) = 0.66 u3 sim(u4, u1) = 0.6 ***** sim(u4, u2) = 0.2 * sim(u4, u3) = 0.4 *** Jaccard Similarity:

Related Work Deep Learning Recommender Systems Our work This work tackles the recommendation problem by utilizing the deep learning techniques. As such we position our work as the intersect of the two areas. In the next, we will review some recent work that use deep learning from recommender systems.

Related Work Zhang et al. KDD 2016. Collaborative Knowledge Base Embedding for Recommender Systems Y. Song et al. SIGIR 2016. Multi-Rate Deep Learning for Temporal Recommendation Li et al. CIKM 2015. Deep Collaborative Filtering via Marginalized Denoising Auto-encoder H. Wang et al. KDD 2015. Collaborative deep learning for recommender systems A. Elkahky et al. WWW 2015. A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems X. Wang et al. MM 2014. Improving content-based and hybrid music recommendation using deep learning Oord et al. NIPS 2013. Deep content-based music recommendation Deep Learning (e.g., SDAE, CNN, SCAE) is only used for modelling SIDE INFORMATION of users and items. For modelling the interaction between users and items, existing work still uses the simple inner product

Proposed Methods Our Proposals: A Neural Collaborative Filtering (NCF) framework that learns the interaction function with a deep neural network. A NCF instance that generalizes the MF model (GMF). A NCF instance that models nonlinearities with a multi-layer perceptron (MLP) A NCF instance NeuMF that fuses GMF and MLP.

NCF Framework Interaction function NCF uses a multi-layer model to learn the user-item interaction function Input: sparse feature vector for user u (vu) and item i (vi) Output: predicted score ŷui Note: Input feature vector can include any categorical variables other than user/item ID, such as attributes, contexts and content. NCF adopts two pathways to model users and items.

Generalized Matrix Factorization (GMF) NCF can express and generalize MF: Let we define Layer 1 as an element-wise product, and Output Layer as a fully connected layer without bias, we have: As MF is the most popular model for recommendation and has been investigated extensively in literature, being able to recover it allows NCF to simulate a large family of factorization models, such as SVD++, timeSVD and Factorization Machines.

Multi-Layer Perceptron (MLP) Activation function: ReLU > tanh > sigmoid NCF can endow more nonlinearities to learn the interaction function: Layer 1: Remaining Layers:

Can we fuse two models to get MF vs. MLP MF uses an inner product as the interaction function: Latent factors are independent with each other; It empirically has good generalization ability for CF modelling MLP uses nonlinear functions to learn the interaction function: Latent factors are not independent with each other; The interaction function is learnt from data, which conceptually has a better representation ability. However, its generalization ability is unknown as it is seldom explored in recommender literature/challenge. Can we fuse two models to get a more powerful model? By generalization ability, we mean a model’s prediction performance on the unknown, test data. By representation ability, we mean a model’s ability to fit the training data.

An Intuitive Solution – Neural Tensor Network MF model: MLP model (1 linear layer): The Neural Tensor Network* naturally assumes MF and MLP share the same embeddings, and combines their latent space by an addition: However, we find NTN does not significantly improve over MF: A possible reason is due to the limitation of the shared embeddings. * Socher Richard, et al. NIPS 2013 "Reasoning with neural tensor networks for knowledge base completion"

Our Fusion of GMF and MLP We propose a new Neural Matrix Factorization (NeuMF) model, which fuses GMF and MLP by allowing them learn different sets of embeddings:

Learning NCF Models For explicit feedback (e.g., ratings 1-5): Regression loss: For implicit feedback (e.g., watches, 0/1): Classification loss: Optimization is done by SGD (adaptive learning rate variants: Adagrad, Adam, RMSprop…)

Experimental Setup Two public datasets from MovieLens and Pinterest: Transform MovieLens ratings to 0/1 implicit case Evaluation protocols: Leave-one-out: holdout the latest rating of each user as the test Top-K evaluation The ranked list are evaluated by Hit Ratio and NDCG (@10).

Baselines ItemPop. ItemKNN [Sarwar et al, WWW’01] Items are ranked by their popularity. ItemKNN [Sarwar et al, WWW’01] The standard item-based CF method. BPR [Rendle et al, UAI’09] Bayesian Personalized Ranking optimizes MF model with a pairwise ranking loss, which is tailored for implicit feedback and item recommendation. eALS [He et al, SIGIR’16] The state-of-the-art CF method for implicit data. It optimizes MF model with a varying-weighted regression loss.

Performance vs. Embedding Size Three-layer MLP and NeuMF. 1. NeuMF outperforms eALS and BPR with about 5% relative improvement. 2. Of the three NCF methods: NeuMF > GMF > MLP (lower training loss but higher test loss) 3. Three MF methods with different objective functions: GMF (log loss) >= eALS (weighted regression loss) > BPR (pairwise ranking loss)

Convergence Behavior Most effective updates are occurred in the first 10 iterations; More iterations may make NeuMF overfit the data. Trade-off between representation ability and generalization ability of a model.

Is Deeper Helpful? Even for models with the same capability (i.e., same number of predictive factors), stacking more nonlinear layers improves the performance. - Note: stacking linear layers degrades the performance. But the improvement gradually diminishes for more layers - Optimization difficulties (same observation with K. He et al, CVPR 2016) Kaiming He et al. CVPR 2016. “Deep residual learning for image recognition”

Conclusion We explored neural architectures for collaborative filtering. Devised a general framework NCF; Presented three instantiations GMF, MLP and NeuMF. Experiments show promising results: Deeper models are helpful. Combining deep models with MF in the latent space leads to better results. Future work: Tackle the optimization difficulties for deeper NCF models (e.g., by Residual learning and Highway networks). Extend NCF to model more rich features, e.g., user attributes, contexts and multi-media items. Since most existing recommenders use shallow models, we believe this work opens up a new avenue of research possibilities for recommendation based on deep learning.

Thanks! Codes: https://github.com/hexiangnan/neural_collaborative_filtering