He Xiangnan Research Fellow National University of Singapore

Slides:

Advertisements

Similar presentations

Prediction Modeling for Personalization & Recommender Systems Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Advertisements

A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.

Data Visualization STAT 890, STAT 442, CM 462

Rubi’s Motivation for CF  Find a PhD problem  Find “real life” PhD problem  Find an interesting PhD problem  Make Money!

Customizable Bayesian Collaborative Filtering Denver Dash Big Data Reading Group 11/19/2007.

Distributed Representations of Sentences and Documents

1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.

Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.

Cao et al. ICML 2010 Presented by Danushka Bollegala.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.

Google News Personalization: Scalable Online Collaborative Filtering

Xutao Li1, Gao Cong1, Xiao-Li Li2

Data Mining and Decision Support

NTU & MSRA Ming-Feng Tsai

Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.

Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.

Collaborative Deep Learning for Recommender Systems

Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Matrix Factorization and Collaborative Filtering

Neural Collaborative Filtering

Recommender Systems 11/04/2017

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Recommendation in Scholarly Big Data

Data Mining: Concepts and Techniques

Collaborative Filtering for Implicit Feedback

Applying Deep Neural Network to Enhance EMPI Searching

Kuifei Yu, Baoxian Zhang, Hengshu Zhu,Huanhuan Cao, and Jilei Tian

Sentiment analysis algorithms and applications: A survey

An Artificial Intelligence Approach to Precision Oncology

AI Powered ADS A STEP BY STEP GUIDE TO EXTREME PERSONALIZATION

Intro to Machine Learning

It’s All About Me From Big Data Models to Personalized Experience

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

CS101 Introduction to Computing Lecture 19 Programming Languages

کاربرد نگاشت با حفظ تنکی در شناسایی چهره

Personalized Social Image Recommendation

Multimodal Learning with Deep Boltzmann Machines

E-Commerce Theories & Practices

Are End-to-end Systems the Ultimate Solutions for NLP?

Machine Learning Ali Ghodsi Department of Statistics

Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.

Learning to Rank Shubhra kanti karmaker (Santu)

NBA Draft Prediction BIT 5534 May 2nd 2018

Efficient Estimation of Word Representation in Vector Space

Fenglong Ma1, Jing Gao1, Qiuling Suo1

Agenda Motivation. Components. Deep Learning Approach.

CMPT 733, SPRING 2016 Jiannan Wang

Q4 : How does Netflix recommend movies?

Overview of Machine Learning

Word Embedding Word2Vec.

Machine Learning Interpretability

Recommender Systems Copyright: Dietmar Jannah, Markus Zanker and Gerhard Friedrich (slides based on their IJCAI talk „Tutorial: Recommender Systems”)

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Graph Neural Networks Amog Kamsetty January 30, 2019.

Introduction to Object Tracking

Word embeddings (continued)

Recommendation Systems

Attention for translation

Human-object interaction

Modeling IDS using hybrid intelligent systems

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

Relational Collaborative Filtering:

Peng Cui Tsinghua University

GhostLink: Latent Network Inference for Influence-aware Recommendation

Credit Card Fraudulent Transaction Detection

Presentation transcript:

He Xiangnan Research Fellow National University of Singapore Cross-Modal Recommendation Moving from Shallow Learning to Deep Learning He Xiangnan Research Fellow National University of Singapore

Motivation Recommender System Netflix: 60+% of the movies watched are recommended. Google News: RS generates 38+% click-throughs Amazon: 35% sales are from recommendations In the current age of information overloading, recommender system plays an important role to help users seek their desirable information. Many online systems that interact with users are actually in the form of recommender system. Together with online advertising, they serve as the main way to earn money for many websites, such as in Netflix over 60% movie traffic are recommended, and in Amazon, over 35% sales are from recommendations. Statistics come from Xavier Amatriain

Motivation Cross-Modal Recommendation Rich cross-modal information: User-item Interactions (ratings, likes, clicks, purchases...) User profiles (ages, genders …) Item profiles (descriptions, images…) Textual reviews Contexts (locations, time …) The “Recommender problem”: Estimate a scoring function that predicts how a user will like an item, based on the available information. Besides transaction records, there are a lot of rich side information, such as the user-item interactions like ratings and clicks, user demographics, item attributes, textual reviews and various contexts. These data are available in the form of multiple modalities, such as categorical variables, texts, images, videos and so on. [click] The recommender problem is to estimate a scoring function that predicts how much a user will like an item. So the key research for cross-modal recommendation is how to effectively fuse all these available information to better estimate the scoring function. [Zhang et al KDD 2016. CKE.]

Collaborative Filtering “Traditional” view of collaborative filtering (CF): “CF makes predictions (filtering) about a user’s interest by collecting preferences information from many users (collaborating)” 1. Memory-based: Predict by memorizing similar users’ ratings 2. Model-based: Predict by inferring from an underlying model. Collaborative Filtering is the default technique for modern recommender systems. The basic idea is that to predict a user’s interest, not only his own history is considered, but also the histories of other similar users. Typically the data for CF are user-item interaction history, for example a table includes user ID, item ID and the rating sore. And the CF task can be formulated as estimating the missing entries of the user-item rating matrix. E.g., MF learns latent vector for each user, item: Score between ‘u’ and ‘i’:

Recommendation as a Learning Problem “Standard” supervised learning view of CF: Matrix / Tensor data can be represented by a design matrix (feature vectors): ML methods: - Logistic Regression - SVM - Decision Trees - Bayesian Networks - Neural Networks …… One-hot encoding [Rendle, ICDM 2010]

A Generic Solution for Cross-Modal Data E.g., location, time, weather, mood … context data E.g., user gender, age, occupation personality … user data rating data Input Features: 1. Categorical features: user/item ID, bag-of-words, historical features… 2. Real-valued features: textual/visual embeddings, converted features (e.g. TFIDF, GBDT)… item data E.g., item category, description, image … One-hot encoding Predictive Sparse ML Models (recommender)

Advantages of such Generic Solution One model for all: Regardless of applications, all practitioners need to do is feature engineering and model hyper-parameter tuning. Controllable complexity: Only non-zero features in the design matrix matter. More efficient than tensor methods. What models work?

Requirements for a Good Model Key properties to capture: 1. Collaborative Filtering effect: user ID + item ID 2. Cross Feature effect: e.g., female in ages 20 like pink. gender x age x visual 3. Strong generalization ability: All feature combinations in testing have never seen in training. In the next… Shallow Methods: Logistic Regression Factorization Machines Deep Methods: Wide&Deep Neural Factorization Machines (our recent work)

Shallow Methods – Logistic Regression GBDT Features: LR LR is a single-layer Neural Network Pros: - Simple & Easy to Interpret Cons: - Features are mutually independent. (need to manually design cross features) GBDT can extract non-linear feature interactions. CF effect can be captured by embedding features of MF. FaceBook CTR solution in 2014. [He et al. ADKDD 2014]

Shallow Method – Factorization Machine Model of FM: Example: Another example: S = wESPN + wNike + <vESPN,vNike> S = wESPN + wNike + wGender + <vESPN,vNike> + < vESPN,vMale > + < vNike,vMale > Pros: - Feature embeddings allow strong generalization. - Feature interactions are learned automatically. Cons: Only 2-order feature interactions. (inefficient for higher order interactions) Only linear interactions.

Deep Methods – Wide&Deep Google’s App Recommender Solution in 2016: Concatenation Pros: - Feature embeddings allow strong generalization. - Deep part can learn any-order feature interactions (implicitly). Cons: - Feature interactions learned by hidden layers are “black-box” - Deep part is easy to over-generalize. [Cheng et al. DLRS 2016]

Deep Methods – Neural FM Our work in SIGIR 2017 and IJCAI 2017. Learn 2-order interactions with FM and explain them with attention. Learn high-order interactions with Deep Neural Network. Explain a recommendation by identifying most predictive interactions: <Female, Age 20> <Age 20, iPhone> <Female, Color Pink> …… Outperform FM by 7% Outperform Google’s Wide&Deep by 3% Our deep recommendation solution perform representation learning on features, and most importantly, is self-explainable. [click] The core design of our solution is the attention-augmented pairwise pooling. It allows explaining recommendation by identify the most predictive interactions, such as we recommend iPhone Rose Gold to a user because she is a female of age 20, and the people of similar profile tend to by iPhone Rose Gold. Our solution outperforms factorization machine by 7%, and better than Google’s Wide&Deep solution by 3% on recommendation and CTR evaluation. second-order interactions high-order interactions [He and Chua. SIGIR 2017,Xiao et al. IJCAI 2017]

Personal Thoughts on Deep Recommendation Generic models that allow easy feature engineering are more preferable in industry. However, most research papers only propose a specific model for a specific domain with certain inputs. Shallow models are still dominant. E.g. linear, factorization and tree models. Directly apply existing DL methods may not work. The key reason: strong representation => over generalization. Future research should focus on designing better and explainable neural components that can meet the properties of a specific task.

Thanks!