He Xiangnan Research Fellow National University of Singapore

Slides:



Advertisements
Similar presentations
Prediction Modeling for Personalization & Recommender Systems Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Advertisements

A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
Data Visualization STAT 890, STAT 442, CM 462
Rubi’s Motivation for CF  Find a PhD problem  Find “real life” PhD problem  Find an interesting PhD problem  Make Money!
Customizable Bayesian Collaborative Filtering Denver Dash Big Data Reading Group 11/19/2007.
Distributed Representations of Sentences and Documents
1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.
Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Google News Personalization: Scalable Online Collaborative Filtering
Xutao Li1, Gao Cong1, Xiao-Li Li2
Data Mining and Decision Support
NTU & MSRA Ming-Feng Tsai
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.
Collaborative Deep Learning for Recommender Systems
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Matrix Factorization and Collaborative Filtering
Neural Collaborative Filtering
Recommender Systems 11/04/2017
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Recommendation in Scholarly Big Data
Data Mining: Concepts and Techniques
Collaborative Filtering for Implicit Feedback
Applying Deep Neural Network to Enhance EMPI Searching
Kuifei Yu, Baoxian Zhang, Hengshu Zhu,Huanhuan Cao, and Jilei Tian
Sentiment analysis algorithms and applications: A survey
An Artificial Intelligence Approach to Precision Oncology
AI Powered ADS A STEP BY STEP GUIDE TO EXTREME PERSONALIZATION
Intro to Machine Learning
It’s All About Me From Big Data Models to Personalized Experience
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
CS101 Introduction to Computing Lecture 19 Programming Languages
کاربرد نگاشت با حفظ تنکی در شناسایی چهره
Personalized Social Image Recommendation
Multimodal Learning with Deep Boltzmann Machines
E-Commerce Theories & Practices
Are End-to-end Systems the Ultimate Solutions for NLP?
Machine Learning Ali Ghodsi Department of Statistics
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
Learning to Rank Shubhra kanti karmaker (Santu)
NBA Draft Prediction BIT 5534 May 2nd 2018
Efficient Estimation of Word Representation in Vector Space
Fenglong Ma1, Jing Gao1, Qiuling Suo1
Agenda Motivation. Components. Deep Learning Approach.
CMPT 733, SPRING 2016 Jiannan Wang
Q4 : How does Netflix recommend movies?
Overview of Machine Learning
Word Embedding Word2Vec.
Machine Learning Interpretability
Recommender Systems Copyright: Dietmar Jannah, Markus Zanker and Gerhard Friedrich (slides based on their IJCAI talk „Tutorial: Recommender Systems”)
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Graph Neural Networks Amog Kamsetty January 30, 2019.
Introduction to Object Tracking
Word embeddings (continued)
Recommendation Systems
Attention for translation
Human-object interaction
Modeling IDS using hybrid intelligent systems
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Relational Collaborative Filtering:
Peng Cui Tsinghua University
GhostLink: Latent Network Inference for Influence-aware Recommendation
Credit Card Fraudulent Transaction Detection
Presentation transcript:

He Xiangnan Research Fellow National University of Singapore Cross-Modal Recommendation Moving from Shallow Learning to Deep Learning He Xiangnan Research Fellow National University of Singapore

Motivation Recommender System Netflix: 60+% of the movies watched are recommended. Google News: RS generates 38+% click-throughs Amazon: 35% sales are from recommendations In the current age of information overloading, recommender system plays an important role to help users seek their desirable information. Many online systems that interact with users are actually in the form of recommender system. Together with online advertising, they serve as the main way to earn money for many websites, such as in Netflix over 60% movie traffic are recommended, and in Amazon, over 35% sales are from recommendations. Statistics come from Xavier Amatriain

Motivation Cross-Modal Recommendation Rich cross-modal information: User-item Interactions (ratings, likes, clicks, purchases...) User profiles (ages, genders …) Item profiles (descriptions, images…) Textual reviews Contexts (locations, time …) The “Recommender problem”: Estimate a scoring function that predicts how a user will like an item, based on the available information. Besides transaction records, there are a lot of rich side information, such as the user-item interactions like ratings and clicks, user demographics, item attributes, textual reviews and various contexts. These data are available in the form of multiple modalities, such as categorical variables, texts, images, videos and so on. [click] The recommender problem is to estimate a scoring function that predicts how much a user will like an item. So the key research for cross-modal recommendation is how to effectively fuse all these available information to better estimate the scoring function. [Zhang et al KDD 2016. CKE.]

Collaborative Filtering “Traditional” view of collaborative filtering (CF): “CF makes predictions (filtering) about a user’s interest by collecting preferences information from many users (collaborating)” 1. Memory-based: Predict by memorizing similar users’ ratings 2. Model-based: Predict by inferring from an underlying model. Collaborative Filtering is the default technique for modern recommender systems. The basic idea is that to predict a user’s interest, not only his own history is considered, but also the histories of other similar users. Typically the data for CF are user-item interaction history, for example a table includes user ID, item ID and the rating sore. And the CF task can be formulated as estimating the missing entries of the user-item rating matrix. E.g., MF learns latent vector for each user, item: Score between ‘u’ and ‘i’:

Recommendation as a Learning Problem “Standard” supervised learning view of CF: Matrix / Tensor data can be represented by a design matrix (feature vectors): ML methods: - Logistic Regression - SVM - Decision Trees - Bayesian Networks - Neural Networks …… One-hot encoding [Rendle, ICDM 2010]

A Generic Solution for Cross-Modal Data E.g., location, time, weather, mood … context data E.g., user gender, age, occupation personality … user data rating data Input Features: 1. Categorical features: user/item ID, bag-of-words, historical features… 2. Real-valued features: textual/visual embeddings, converted features (e.g. TFIDF, GBDT)… item data E.g., item category, description, image … One-hot encoding Predictive Sparse ML Models (recommender)

Advantages of such Generic Solution One model for all: Regardless of applications, all practitioners need to do is feature engineering and model hyper-parameter tuning. Controllable complexity: Only non-zero features in the design matrix matter. More efficient than tensor methods. What models work?

Requirements for a Good Model Key properties to capture: 1. Collaborative Filtering effect: user ID + item ID 2. Cross Feature effect: e.g., female in ages 20 like pink. gender x age x visual 3. Strong generalization ability: All feature combinations in testing have never seen in training. In the next… Shallow Methods: Logistic Regression Factorization Machines Deep Methods: Wide&Deep Neural Factorization Machines (our recent work)

Shallow Methods – Logistic Regression GBDT Features: LR LR is a single-layer Neural Network Pros: - Simple & Easy to Interpret Cons: - Features are mutually independent. (need to manually design cross features) GBDT can extract non-linear feature interactions. CF effect can be captured by embedding features of MF. FaceBook CTR solution in 2014. [He et al. ADKDD 2014]

Shallow Method – Factorization Machine Model of FM: Example: Another example: S = wESPN + wNike + <vESPN,vNike> S = wESPN + wNike + wGender + <vESPN,vNike> + < vESPN,vMale > + < vNike,vMale > Pros: - Feature embeddings allow strong generalization. - Feature interactions are learned automatically. Cons: Only 2-order feature interactions. (inefficient for higher order interactions) Only linear interactions.

Deep Methods – Wide&Deep Google’s App Recommender Solution in 2016: Concatenation Pros: - Feature embeddings allow strong generalization. - Deep part can learn any-order feature interactions (implicitly). Cons: - Feature interactions learned by hidden layers are “black-box” - Deep part is easy to over-generalize. [Cheng et al. DLRS 2016]

Deep Methods – Neural FM Our work in SIGIR 2017 and IJCAI 2017. Learn 2-order interactions with FM and explain them with attention. Learn high-order interactions with Deep Neural Network. Explain a recommendation by identifying most predictive interactions: <Female, Age 20> <Age 20, iPhone> <Female, Color Pink> …… Outperform FM by 7% Outperform Google’s Wide&Deep by 3% Our deep recommendation solution perform representation learning on features, and most importantly, is self-explainable. [click] The core design of our solution is the attention-augmented pairwise pooling. It allows explaining recommendation by identify the most predictive interactions, such as we recommend iPhone Rose Gold to a user because she is a female of age 20, and the people of similar profile tend to by iPhone Rose Gold. Our solution outperforms factorization machine by 7%, and better than Google’s Wide&Deep solution by 3% on recommendation and CTR evaluation. second-order interactions high-order interactions [He and Chua. SIGIR 2017,Xiao et al. IJCAI 2017]

Personal Thoughts on Deep Recommendation Generic models that allow easy feature engineering are more preferable in industry. However, most research papers only propose a specific model for a specific domain with certain inputs. Shallow models are still dominant. E.g. linear, factorization and tree models. Directly apply existing DL methods may not work. The key reason: strong representation => over generalization. Future research should focus on designing better and explainable neural components that can meet the properties of a specific task.

Thanks!