Transfer Learning for Link Prediction

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.

Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.

A Survey on Transfer Learning Sinno Jialin Pan Department of Computer Science and Engineering The Hong Kong University of Science and Technology Joint.

Item Based Collaborative Filtering Recommendation Algorithms

Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.

Personalized Query Classification Bin Cao, Qiang Yang, Derek Hao Hu, et al. Computer Science and Engineering Hong Kong UST.

Multi-Label Prediction via Compressed Sensing By Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (NIPS 2009) Presented by: Lingbo Li ECE, Duke University.

1 RegionKNN: A Scalable Hybrid Collaborative Filtering Algorithm for Personalized Web Service Recommendation Xi Chen, Xudong Liu, Zicheng Huang, and Hailong.

Learning to Recommend Hao Ma Supervisors: Prof. Irwin King and Prof. Michael R. Lyu Dept. of Computer Science & Engineering The Chinese University of Hong.

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

Transfer Learning for WiFi-based Indoor Localization

Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.

Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.

Vincent W. Zheng †, Bin Cao †, Yu Zheng ‡, Xing Xie ‡, Qiang Yang † † Hong Kong University of Science and Technology ‡ Microsoft Research Asia This work.

1 Transfer Learning for Link Prediction Qiang Yang, 楊強，香港科大 Hong Kong University of Science and Technology Hong Kong, China

Training and future (test) data follow the same distribution, and are in same feature space.

Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.

Item-based Collaborative Filtering Recommendation Algorithms

Performance of Recommender Algorithms on Top-N Recommendation Tasks

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Cao et al. ICML 2010 Presented by Danushka Bollegala.

A NON-IID FRAMEWORK FOR COLLABORATIVE FILTERING WITH RESTRICTED BOLTZMANN MACHINES Kostadin Georgiev, VMware Bulgaria Preslav Nakov, Qatar Computing Research.

1 Towards Heterogeneous Transfer Learning Qiang Yang Hong Kong University of Science and Technology Hong Kong, China

Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.

Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.

A Hybrid Recommender System: User Profiling from Keywords and Ratings Ana Stanescu, Swapnil Nagar, Doina Caragea 2013 IEEE/WIC/ACM International Conferences.

Training and Testing of Recommender Systems on Data Missing Not at Random Harald Steck at KDD, July 2010 Bell Labs, Murray Hill.

User Profiling based on Folksonomy Information in Web 2.0 for Personalized Recommender Systems Huizhi (Elly) Liang Supervisors: Yue Xu, Yuefeng Li, Richi.

EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.

Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu

1 Towards Heterogeneous Transfer Learning Qiang Yang Hong Kong University of Science and Technology Hong Kong, China

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

Online Learning for Collaborative Filtering

Source-Selection-Free Transfer Learning

EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan.

Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl

Matching Users and Items Across Domains to Improve the Recommendation Quality Created by: Chung-Yi Li, Shou-De Lin Presented by: I Gde Dharma Nugraha 1.

Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.

1 Collaborative Filtering & Content-Based Recommending CS 290N. T. Yang Slides based on R. Mooney at UT Austin.

EigenRank: A ranking oriented approach to collaborative filtering By Nathan N. Liu and Qiang Yang Presented by Zachary 1.

Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.

Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.

HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.

Cold Start Problem in Movie Recommendation JIANG CAIGAO, WANG WEIYAN Group 20.

Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun

Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.

Recommender Systems with Social Regularization Hao Ma, Dengyong Zhou, Chao Liu Microsoft Research Michael R. Lyu The Chinese University of Hong Kong Irwin.

Wenyuan Dai, Ou Jin, Gui-Rong Xue, Qiang Yang and Yong Yu Shanghai Jiao Tong University & Hong Kong University of Science and Technology.

Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.

ICONIP 2010, Sydney, Australia 1 An Enhanced Semi-supervised Recommendation Model Based on Green’s Function Dingyan Wang and Irwin King Dept. of Computer.

Self-taught Clustering – an instance of Transfer Unsupervised Learning † Wenyuan Dai joint work with ‡ Qiang Yang, † Gui-Rong Xue, and † Yong Yu † Shanghai.

Online Evolutionary Collaborative Filtering RECSYS 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University.

Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

The Wisdom of the Few Xavier Amatrian, Neal Lathis, Josep M. Pujol SIGIR’09 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

Collaborative Deep Learning for Recommender Systems

Collaborative Filtering With Decoupled Models for Preferences and Ratings Rong Jin 1, Luo Si 1, ChengXiang Zhai 2 and Jamie Callan 1 Language Technology.

Hao Ma, Dengyong Zhou, Chao Liu Microsoft Research Michael R. Lyu

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

Matrix Factorization and Collaborative Filtering

WSRec: A Collaborative Filtering Based Web Service Recommender System

Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.

Bin Cao Microsoft Research Asia

Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.

Advanced Artificial Intelligence

RECOMMENDER SYSTEMS WITH SOCIAL REGULARIZATION

Presentation transcript:

Transfer Learning for Link Prediction Qiang Yang, 楊強，香港科大 Hong Kong University of Science and Technology Hong Kong, China http://www.cse.ust.hk/~qyang

We often find it easier …

We often find it easier…

We often find it easier…

We often find it easier… 自行车三轮车摩托车飞机 5

Transfer Learning? 迁移学习… People often transfer knowledge to novel situations Chess  Checkers C++  Java Physics  Computer Science Transfer Learning: The ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks (or new domains) Seems the changes is involved in moving to new tasks is more radical than moving to new domains.

But, direct application will not work Machine Learning: Training and future (test) data follow the same distribution, and are in same feature space

When distributions are different Classification Accuracy (+, -) Training: 90% Testing: 60% Move to front? As distribution. Animation: applictation/tracking 8

When features and distributions are different Classification Accuracy (+, -) Training: 90% Testing: 10% Move to front? As distribution. Animation: applictation/tracking 9 9

Cross-Domain Sentiment Classification Training Test Classifier 82.55% DVD DVD Training Test Classifier 71.85% Electronics DVD

When distributions are different Wireless sensor networks Different time periods, devices or space Move to front? As distribution. Animation: applictation/tracking Device, Space, or Time Day time Night time Device 2 Device 1 11

When Features are different Heterogeneous: different feature spaces Training: Text Future: Images The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family Rosaceae ... Apples Banana is the common name for a type of fruit and also the herbaceous plants of the genus Musa which produce this commonly eaten fruit ... Bananas

Transfer Learning: Source Domains Input Output Source Domains Source Domain Target Domain Training Data Labeled/Unlabeled Test Data Unlabeled

Inductive Transfer Learning 目標數據 An overview of various settings of transfer learning Self-taught Learning 源數據 Case 1 No labeled data in a source domain Inductive Transfer Learning 目標數據 Labeled data are available in a source domain Labeled data are available in a target domain Source and target tasks are learnt simultaneously Multi-task Learning Case 2 Transfer Learning Labeled data are available only in a source domain Assumption: different domains but single task Transductive Transfer Learning Domain Adaptation No labeled data in both source and target domain Assumption: single domain and single task Unsupervised Transfer Learning Sample Selection Bias /Covariance Shift

Feature-based Transfer Learning (Dai, Yang et al. ACM KDD 2007) Source: Many labeled instances Target: All unlabeled instances Distributions Feature spaces can be different, but have overlap Same classes P(X,Y): different! CoCC Algorithm (Co-clustering based) bridge

Document-word co-occurrence Di Source Knowledge transfer Target Do

Co-Clustering based Classification (on 20 News Group dataset) Using Transfer Learning

Talk Outline Transfer Learning: A quick introduction Link prediction and collaborative filtering problems Transfer Learning for Sparsity Problem Codebook Method CST Method Collective Link Prediction Method Conclusions and Future Works 18

Network Data Social Networks Information Networks Biological Networks Facebook, Twitter, Live messenger, etc. Information Networks The Web (1.0 with hyperlinks and 2.0 with tags) Citation network, Wikipedia, etc. Biological Networks Gene network, etc. Other Networks

A Real World Study [Leskovec-Horvitz WWW ‘08] Who talks to whom on MSN messenger Network: 240M nodes, 1.3 billion edges Conclusions: Average path length is 6.6 90% of nodes is reachable <8 steps

Global Network Structures Small world Six degrees of separation [Milgram 60s] Clustering and communities Evolutionary patterns Long tail distributions However, the understanding of the global network structures do help us on working with local network structures.

Local Network Structures Link Prediction A form of Statistical Relational Learning (Taskar and Getoor) Object classification: predict category of an object based on its attributes and links Is this person a student? Link classification: predict type of a link Are they co-authors? Link existence: predict whether a link exists or not (credit: Jure Leskovec, ICML ‘09)

Link Prediction Task: predict missing links in a network Main challenge: Network Data Sparsity

Long Tail in Era of Recommendation Help users discover novel/rare items The long-tail  recommendation systems

Product Recommendation (Amazon.com) 25 25

Collaborative Filtering (CF) Wisdom of the crowd, word of mouth,… Your taste in music/products/movies/… is similar to that of your friends (Maltz & Ehrlich, 1995) Key issue: who are your friends? There are many popular . People help people. Reference other people. There may be tons of web pages a particular user havn’t read, or tons of produces not purchased.

Taobao.com

CF: Neighborhood Based Models User-based Model Item-based Model

Matrix Factorization model for Link Prediction We are seeking a low rank approximation for our target matrix Such that the unknown value can be predicted by ? 1 Low rank Approximation x m x k k x n m x n

Examples: Collaborative Filtering 1 3 5 2 4 ? Ideal Setting Training Data: Dense Rating Matrix Density >=2.0%, Test Data: Rating Matrix CF model RMSE:0.8721

Data Sparsity in Collaborative Filtering 1 5 3 4 ? Realistic Setting Training Data: Sparse Rating Matrix Density <=0.6%, Test Data: Rating Matrix CF model RMSE:0.9748 10%

Neighborhood Based Models User-based Model Item-based Model Unified User-Item Model

CF as Matrix Factorization Approximate each rating by the product between the k-dimensional latent feature vector of each user and item Learning the user and item latent features via minimizing regularized approximation error:

Sparsity Problem in Collaborative Filtering However, in real-world recommender systems ... Users can rate a limited number of ratings So rating matrix is too sparse to cluster well

Transfer Learning for Collaborative Filtering? IMDB Database Amazon.com 36 36

Codebook Transfer Bin Li, Qiang Yang, Xiangyang Xue. Can Movies and Books Collaborate? Cross-Domain Collaborative Filtering for Sparsity Reduction. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI '09), Pasadena, CA, USA, July 11-17, 2009.

Main Idea Observations - What “Relatedness” Approach - How to “Relate” Take MOVIES & BOOKS for example Users are related in Interest; Items are related in Genre Approach - How to “Relate” To MATCH user/item-groups across domains

Transfer CF: Problem Setting Goal Alleviate the sparsity problem in the target domain by transferring rating knowledge from a related auxiliary (dense) domain Target task: A sparse p × q rating matrix Xtgt Auxiliary task: A dense n × m rating matrix Xaux Xtgt and Xaux are related in user/item categories Learn informative & compact rating patterns (codebook) from Xaux Transfer the learned rating patterns to Xaux

Codebook Construction Definition 2.1 (Codebook). A k × l matrix which compresses the cluster-level rating patterns of k user clusters and l item clusters. Codebook: User prototypes rate on item prototypes Encoding: Find prototypes for users and items and get indices Decoding: Recover rating matrix based on codebook and indices

Knowledge Sharing via Cluster-Level Rating Matrix Source (Dense): Encode cluster-level rating patterns Target (Sparse): Map users/items to the encoded prototypes

Step 1: Codebook Construction Co-cluster rows (users) and columns (items) in Xaux Get user/item cluster indicators Uaux ∈ {0, 1}n×k, Vaux ∈ {0, 1}m×l

Step 2: Codebook Transfer Objective Expand target matrix, while minimizing the difference between Xtgt and the reconstructed one User/item cluster indicators Utgt and Vtgt for Xtgt Binary weighting matrix W for observed ratings in Xtgt Alternate greedy searches for Utgt and Vtgt to a local minimum

Codebook Transfer Each user/item in Xtgt matches to a prototype in B Duplicate certain rows & columns in B to reconstruct Xtgt Codebook is indeed a two-sided data representation

Experimental Setup Data Sets Compared Methods Evaluation Protocol EachMovie (Auxiliary): 500 users × 500 movies MovieLens (Target): 500 users × 1000 movies Book-Crossing (Target): 500 users × 1000 books Compared Methods Pearson Correlation Coefficients (PCC) Scalable Cluster-based Smoothing (CBS) Weighted Low-rank Approximation (WLR) Codebook Transfer (CBT) Evaluation Protocol First 100/200/300 users for training; last 200 users for testing Given 5/10/15 observable ratings for each test user

Experimental Results (1): Books  Movies MAE Comparison on MovieLens average over 10 sampled test sets Lower is better

Experimental Results (2) MAE Comparison on Book-Crossing average over 10 sampled test sets Lower is better

Limitations of Codebook Transfer Same rating range Source and target data must have the same range of ratings [1, 5] Homogenous dimensions User and item dimensions must be similar In reality Range of ratings can be 0/1 or [1,5] User and item dimensions may be very different

Coordinate System Transfer Weike Pan, Evan Xiang, Nathan Liu and Qiang Yang. Transfer Learning in Collaborative Filtering for Sparsity Reduction. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI-10). Atlanta, Georgia, USA. July 11-15, 2010.

Types of User Feedback Explicit feedback: express preferences explicitly 1-5: Rating ?: Missing Implicit feedback: express preferences implicitly 1: Observed browsing/purchasing 0: Unobserved browsing/purchasing

One auxiliary domain (two or more data sources) Target domain R: explicit ratings One auxiliary domain (two or more data sources) : user u has implicitly browsed/purchased item j

Problem Formulation One target domain , n users and m items. One auxiliary domain (two data sources) user side: , shares n common users with R. item side: , shares m common items with R. Goal: predict the missing values in R.

Our Solution: Coordinate System Transfer Step 1: Coordinate System Construction ( ) Step 2: Coordinate System Adaptation

Step 1: Coordinate System Construction Coordinate System Transfer Step 1: Coordinate System Construction Sparse SVD on auxiliary data where the matrices give the top d principle coordinates. Definition (Coordinate System): A coordinate system is a matrix with columns of orthonormal bases (principle coordinates), where the columns are located in descending order according to their corresponding eigenvalues.

Step 1: Coordinate System Adaptation Coordinate System Transfer Step 1: Coordinate System Adaptation Adapt the discovered coordinate systems from the auxiliary domain to the target domain, The effect from the auxiliary domain Initialization: take as seed model in target domain, Regularization:

Coordinate System Transfer Algorithm

Data Sets and Evaluation Metrics Experimental Results Data Sets and Evaluation Metrics Data sets (extracted from MovieLens and Netflix) Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), Where and are the true and predicted ratings, respectively, and is the number of test ratings.

Baselines and Parameter Settings Experimental Results Baselines and Parameter Settings Baselines Average Filling Latent Matrix Factorization (Bell and Koren, ICDM07) Collective Matrix Factorization (Singh and Gordon, KDD08) OptSpace (Keshavan, Montanari and Oh, NIPS10) Average results over 10 random trials are reported

Results(1/2) Observations: CST performs significantly better (t-test) than all baselines at all sparsity levels, Transfer learning methods (CST, CMF) beat two non-transfer learning methods (AF, LFM).

Results(2/2) Experimental Results Observations: CST consistently improves as d increases, demonstrating the usefulness in avoid overfitting using the auxiliary knowledge, The improvement (CST vs. OptSpace) is more significant with fewer observed ratings, demonstrating the effectiveness of CST in alleviating data sparsity.

Related Work (Transfer Learning) Cross-domain collaborative filtering Domain adaptation methods One-sided: Two-sided:

Limitation of CST and CBT Different source domains are related to the target domain differently Book to Movies Food to Movies Rating bias Users tend to rate items that they like Thus there are more rating = 5 than rating = 2

Our Solution: Collective Link Prediction (CLP) Jointly learn multiple domains together Learning the similarity of different domains consistency between domains indicates similarity. Introduce a link function to correct the bias

Bin Cao, Nathan Liu and Qiang Yang. Transfer Learning for Collective Link Prediction in Multiple Heterogeneous Domains. In Proceedings of 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel. June 2010.

Recommendation with Heterogeneous Domains Items span multiple, heterogeneous domains for many recommendation systems. ebay, GoogleReader, douban, etc.

Example: Collective Link Prediction 1 5 3 4 2 3 1 5 4 Sparse Training Movie Rating Matrix Dense Auxiliary Book Rating Matrix Training Different Rating Domains Transfer Learning 1 Different User behaviors ? RMSE:0.8855 Predicting Dense Auxiliary Movie Tagging Matrix Test Data: Rating Matrix

Inter-task Similarity Based on Gaussian process models Key part is the kernel modeling user relation as well as task relation Darkness indicates the similarity 67

Making Prediction Similar to memory-based approach Similarity between items Mean of prediction Similarity between tasks

Experimental Settings Datasets Movielens, Book-crossing: 5 largest categories Douban: 3 domains (movie+book+music) Evaluation metric Mean Average Precision Baselines I-GP: treating domains independently M-GP: treating domains as one CMF: Collective Matrix Factorization [Singh & Gordon’08]

Experimental Results (I)

Experimental Results The left figure is the case where all task are sparse and evaluated on all tasks; the right figure is the case only the target task is sparse and evaluated on the target task. 71

Conclusions and Future Work Transfer Learning (舉一反三 ) Link prediction is an important task in graph/network data mining Key Challenge: sparsity Transfer learning from other domains helps improve the performance

Source Information Network Target Information Network Future Work Scenario 1: Our objective is to use the dense source network to help reduce the sparsity of the entire target network Sparse Dense Source Information Network Target Information Network

Source Information Network Target Information Network Future Work Scenario 2: To use a dense source network as a bridge to help connect parts of the target network. Dense Dense Source Information Network Target Information Network Sparse

Future: Negative Transfer Helpful: positive transfer Harmful: negative transfer 自行车三轮车摩托车飞机 Neutral: zero transfer 75

Future: Negative Transfer “To Transfer or Not to Transfer” Rosenstein, Marx, Kaelbling and Dietterich Inductive Transfer Workshop, NIPS 2005. (Task: meeting invitation and acceptance)

Acknowledgement HKUST: Shanghai Jiaotong University: Visiting Students Sinno J. Pan, Huayan Wang, Bin Cao, Evan Wei Xiang, Derek Hao Hu, Nathan Nan Liu, Vincent Wenchen Zheng Shanghai Jiaotong University: Wenyuan Dai, Guirong Xue, Yuqiang Chen, Prof. Yong Yu, Xiao Ling, Ou Jin. Visiting Students Bin Li (Fudan U.), Xiaoxiao Shi (Zhong Shan U.), Group Pictures…