Recommend User to Group in Flickr Zhe Zhao 4-29 2010.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Google News Personalization: Scalable Online Collaborative Filtering
Ouyang Ruofei Topic Model Latent Dirichlet Allocation Ouyang Ruofei May LDA.
Information retrieval – LSI, pLSI and LDA
Clustering Basic Concepts and Algorithms
Hierarchical Dirichlet Processes
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.
Dong Liu Xian-Sheng Hua Linjun Yang Meng Weng Hong-Jian Zhang.
Probabilistic Clustering-Projection Model for Discrete Data
What is Statistical Modeling
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
Caimei Lu et al. (KDD 2010) Presented by Anson Liang.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Li-Jia Li Yongwhan Lim Li Fei-Fei Chong Wang David M. Blei B UILDING AND U SING A S EMANTIVISUAL I MAGE H IERARCHY CVPR, 2010.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Algebraic Functions of Views for 3D Object Recognition CS773C Advanced Machine Intelligence Applications Spring 2008: Object Recognition.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
1. Social-Network Analysis Using Topic Models 2. Web Event Topic Analysis by Topic Feature Clustering and Extended LDA Model RMBI4310/COMP4332 Big Data.
Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04/03 | SDM 09 / Travel-Time Prediction Travel-Time Prediction using Gaussian Process.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Presented By Wanchen Lu 2/25/2013
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
User Interests Imbalance Exploration in Social Recommendation: A Fitness Adaptation Authors : Tianchun Wang, Xiaoming Jin, Xuetao Ding, and Xiaojun Ye.
Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Latent Dirichlet Allocation
Gaussian Processes For Regression, Classification, and Prediction.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.
Matrix Factorization and its applications By Zachary 16 th Nov, 2010.
Automatic Labeling of Multinomial Topic Models
Unsupervised Streaming Feature Selection in Social Media
Online Evolutionary Collaborative Filtering RECSYS 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Automatic Labeling of Multinomial Topic Models Qiaozhu Mei, Xuehua Shen, and ChengXiang Zhai DAIS The Database and Information Systems Laboratory.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Online Multiscale Dynamic Topic Models
Multimodal Learning with Deep Boltzmann Machines
Clustering (3) Center-based algorithms Fuzzy k-means
Step-By-Step Instructions for Miniproject 2
Matching Words with Pictures
CSCI 5822 Probabilistic Models of Human and Machine Learning
Michal Rosen-Zvi University of California, Irvine
Dynamic Supervised Community-Topic Model
Topic Models in Text Processing
Restructuring Sparse High Dimensional Data for Effective Retrieval
Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement Rie Kubota Ando. Latent semantic space: Iterative.
Presentation transcript:

Recommend User to Group in Flickr Zhe Zhao

What I am going to present A problem seldom being studied in social media recommendation: – Recommend Flickr Group to User

What I am going to present A problem seldom being studied in social media recommendation: Why does this problem matters

What I am going to present A problem seldom being studied in social media recommendation: Why does this problem matters How to make use of meaningful information – A matrix factorization perspective to view the problem – A Topic Model Based Solution

What I am going to present A problem seldom being studied in social media recommendation: Why does this problem matters How to make use of meaningful information At last, Something about implementation

Recommend User to Group Background: – User Activity: Upload and favor photos, add contacts, and join groups, based on his/her interests and everyday life.

Recommend User to Group Our Problem: – Recommend Relevant Group to User user relevant to a group means that the topic and interests the group focused on is similar to the user’s interests, shown by the similarity of the content between the photos from the user and photos from the group pool.

Recommend User to Group Related Work – Problems: The first few works to recommend Flickr group to user, using content, social relations and collaborative information. – Approaches: Recommender systems. Expert Finding.

Our Proposed Solution Intuition: – Find User’s interests and Group’s topics/Interests, similar interests indicate user is relevant to Group. Solution: – Latent Interests Dimensions can be found by matrix factorization and graphical model. Considered Information(Interests are reflected in) – User Upload and Favor photos – Group collect photos in pool. – User join Group. – User add contacts.

Our Proposed Solution Modeling Interests via Matrix Factorization – Mining Latent Interests from origin feature space – Used Information: User Upload and Favor photos Group collect photos in pool. User join Group. User add contacts. A probabilistic solution on equivalent graphical model. Learning the model & Implementation

Modeling Interests via Matrix Factorization (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) Feature Space (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) ….. Photo1 Photo2 Photo3 Photot ….. Photo4 (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) User u

Modeling Interests via Matrix Factorization (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) Feature Space (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) ….. Photo1 Photo2 Photo3 Photot ….. Photo4 (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) =C u User u

Modeling Interests via Matrix Factorization (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) Feature Space (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) ….. Photo1 Photo2 Photo3 Photot ….. Photo4 (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) =C u User u C u ≈ F ×I u ’ = M C u Each row represent the latent interests of user in each photo

Modeling Interests via Matrix Factorization (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) Feature Space (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) ….. Photo1 Photo2 Photo3 Photot ….. Photo4 (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) =C u User u C 1 ≈ F ×I 1 ’ = M C 1 C 2 ≈ F ×I 2 ’ = M C 2 C n ≈ F ×I n ’ = M C n … For n Users

Modeling Interests via Matrix Factorization (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) Feature Space (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) ….. Photo1 Photo2 Photo3 Photot ….. Photo4 (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) =P g P g ≈ F ×T g ’ = M P g Each row represent the latent topics of group in each photo Group g

Modeling Interests via Matrix Factorization (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) Feature Space (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) ….. Photo1 Photo2 Photo3 Photot ….. Photo4 (f 1, f 2, f 3, f 4, f 5, f 6, f 7, …, f d ) =P g Group g P 1 ≈ F ×T 1 ’ = M P 1 P 2 ≈ F ×T 2 ’ = M P 2 P m ≈ F ×T m ’ = M P m … For m Groups

Modeling Interests via Matrix Factorization (v1, v2, v3, v4, v5, v6, v7, …, vn) Group1 Group2 Group3 Group4 Groupm ….. All Groups =R (v1, v2, v3, v4, v5, v6, v7, …, vn) User1User2User3User4Usern …… All Users R gu = |C u ∩ P g | / |C u |

Modeling Interests via Matrix Factorization (v1, v2, v3, v4, v5, v6, v7, …, vn) Group1 Group2 Group3 Group4 Groupm ….. All Groups =R (v1, v2, v3, v4, v5, v6, v7, …, vn) User1User2User3User4Usern …… All Users R ≈ f(LT ×LI’) = M TI Each row represent the latent topics of group Each row represent the latent interests of user ȣ LT ×LI’

Modeling Interests via Matrix Factorization Till now, our model can be written as: R ≈ f(LT ×LI’) = M TI C u ≈ F ×I u ’ = M C u P g ≈ F ×T g ’ = M P g ȣ LT ×LI’ n m

Modeling Interests via Matrix Factorization Till now, our model can be written as: – Constrains of User Contacts: Minimize the sum of Dis( I u1, I u2 ) = |I u1, I u2 | Euc where User u1 calls User u2 as contact. R ≈ f(LT ×LI’) = M TI C u ≈ F ×I u ’ = M C u P g ≈ F ×T g ’ = M P g ȣ LT ×LI’ n m

Modeling Interests via Matrix Factorization Used Information: – User Upload and Favor photos – Group collect photos in pool. – User join Group. – User add contacts.

Our Proposed Solution Modeling Interests via Matrix Factorization: A probabilistic solution on equivalent graphical model. – Several Assumptions – Equivalent Graphical Model – Calculating the joint probability Learning the model & Implementation

A probabilistic solution on equivalent graphical model Several Assumptions Our Proposed Matrix-Factorization Model R ≈ f(LT ×LI’) = M TI C u ≈ F ×I u ’ = M C u P g ≈ F ×T g ’ = M P g ȣ LT ×LI’ n m

A probabilistic solution on equivalent graphical model Several Assumptions Rewrite the Model in row and entry form r gu ≈ f(lt g ×li u ’) c u i ≈ F ×i u i ’ p g j ≈ F ×t g j ’ ȣ lt g ×li u ’ Σ u |C u | Σ g |P g | m*n R ≈ f(LT ×LI’) = M TI C u ≈ F ×I u ’ = M C u P g ≈ F ×T g ’ = M P g ȣ LT ×LI’ n m

A probabilistic solution on equivalent graphical model Several Assumptions – i u i and t g j are hidden random variables. – lt g and li u are hidden random variables. Rewrite the Model in row and entry form R ≈ f(LT ×LI’) = M TI C u ≈ F ×I u ’ = M C u P g ≈ F ×T g ’ = M P g ȣ LT ×LI’ n m r gu ≈ f(lt g ×li u ’) c u i ≈ F ×i u i ’ p g j ≈ F ×t g j ’ ȣ lt g ×li u ’ Σ u |C u | Σ g |P g | m*n

A probabilistic solution on equivalent graphical model Several Assumptions – i u i and t g j are hidden random variables. – lt g and li u are hidden random variables. Add Gaussian noise to the right of the equations r gu = f(lt g ×li u ’) + ε c u i = F ×i u i ’ + ε c p g j = F ×t g j ’ + ε p ȣ lt g ×li u ’ Σ u |C u | Σ g |P g | m*n R ≈ f(LT ×LI’) = M TI C u ≈ F ×I u ’ = M C u P g ≈ F ×T g ’ = M P g ȣ LT ×LI’ n m

A probabilistic solution on equivalent graphical model Several Assumptions – i u i and t g j are hidden random variables. – lt g and li u are hidden random variables. – r gu are random varibles based on lt g and li u. – c u i and p g j are random variables based on i ui, F and t gj, F respectively – i u i and t g j are based on lt u and li g The revised model r gu = f(lt g ×li u ’) + ε c u i = F ×i u i ’ + ε c p g j = F ×t g j ’ + ε p ȣ lt g ×li u ’ Σ u |C u | Σ g |P g | m*n

A probabilistic solution on equivalent graphical model Several Assumptions – r gu |lt g,li u ~ N(f(lt g ×li u ’), δI) – c u i | i ui,F ~ N(F×i ui ’, δ c I) – c g j | t gj,F ~ N(F×t gj ’, δ p I) The revised model r gu = f(lt g ×li u ’) + ε c u i = F ×i u i ’ + ε c p g j = F ×t g j ’ + ε p ȣ lt g ×li u ’ Σ u |C u | Σ g |P g | m*n

A probabilistic solution on equivalent graphical model Several Assumptions – r gu |lt g,li u ~ N(f(lt g ×li u ’), δI) – c u i | i ui,F ~ N(F×i ui ’, δ c I) – c g j | t gj,F ~ N(F×t gj ’, δ p I) – i ui | li u ~ Bernoulli (Multinomial, Exponential) – t gj | lt g ~ Bernoulli (Multinomial, Exponential) The revised model r gu = f(lt g ×li u ’) + ε c u i = F ×i u i ’ + ε c p g j = F ×t g j ’ + ε p ȣ lt g ×li u ’ Σ u |C u | Σ g |P g | m*n

A probabilistic solution on equivalent graphical model Several Assumptions – r gu |lt g,li u ~ N(f(lt g ×li u ’), δI) – c u i | i ui,F ~ N(F×i ui ’, δ c I) – c g j | t gj,F ~ N(F×t gj ’, δ p I) – i ui | li u ~ Bernoulli (Multinomial, Exponential) – t gj | lt g ~ Bernoulli (Multinomial, Exponential) – i ui ~ Conjugate prior of i ui | li u – t gj ~ Conjugate prior of t gj | lt g The revised model r gu = f(lt g ×li u ’) + ε c u i = F ×i u i ’ + ε c p g j = F ×t g j ’ + ε p ȣ lt g ×li u ’ Σ u |C u | Σ g |P g | m*n

A probabilistic solution on equivalent graphical model 0,1,0,0 Latent interests 1,0,1,0 1,1,0,1 Photo1 Photo2 Photo3 Photo4 1,1,1,0 User u Good color Cute animal Sony Camera Politics 0,1,0,0 1,0,1,0 0,1,0,0 Photo1 Photo2 Photo3 Photo4 0,1,1,0 Group g 0.4, 0.2, 0.1, , 0.2, 0.7, 0.0 li u lt g i u1 i u2 i u3 i u4 t g1 t g2 t g3 t g4 r gu ȣ 0.16

A probabilistic solution on equivalent graphical model Equivalent Graphical Model: Topic Model based Recommendation(TMR)

A probabilistic solution on equivalent graphical model Equivalent Graphical Model c u i = F ×i u i ’ + ε c Σ u |C u | p g j = F ×t g j ’ + ε p Σ g |P g | r gu = f(lt g ×li u ’) + ε ȣ lt g ×li u ’ m*n

Our Proposed Solution Modeling Interests via Matrix Factorization: A probabilistic solution on equivalent graphical model. Learning the model & Implementation – Gibbs Sampling based – User recommendation for group

Learning the model & Implementation Our task: – Predict r gu for user u and group g

Learning the model & Implementation Our task: – Predict r gu for user u and group g Our method: – Gibbs Sampling for the model Sample each iui and tgj in the model Chose the rgu based on pdf conditioned by iui and tgj

Learning the model & Implementation Gibbs sampling in our model – The joint probability of the model

Learning the model & Implementation Gibbs sampling in our model – The joint probability of the model

Learning the model & Implementation Gibbs sampling in our model – The joint probability of the model

Learning the model & Implementation Gibbs sampling in our model – The joint probability of the model – Sampling based on equations:

Learning the model & Implementation Implementation – Data structure and preprocessing Visual word extraction – Hierarchical clustering on 100k subset get 1019 centers Filter out high and low frequent tags – Tags appear in 90% photos or less than 2 times tags Build Hash table for User and Photo and Inverted Index for tags on a 30 group subset Use DBMS to store the 200 group dataset

Learning the model & Implementation Implementation – Sampling: 0. randomly select 20% of the rgu matrix as test set, user the rest as training set. 1. get a 5000 samples photos subset to perform svd to reduce dimensionality for tags ( > 1000) 2. get a 5000 sampled photos subset after svd to perform svd to get the prior \miu in the model (2019->10, latent dimension set to be 10) 3. Init Iui for each photo of each user and init tgj for each photo of each group. 4. perform sampling in 1000 iterations (currently, 1 iteration cost 22 s) 5. select the sampling result having the max joint probability 6. predict rgu based on the result and relational function

Recent Works Problem in the Graphical Model – Photo feature is the sum of latent interest features Not a good/proper fitting for the feature

Recent Works Problem in the Graphical Model – Photo feature is the sum of latent interest features Not a good/proper fitting for the feature Note that, different from LDA: – LDA is document-word model – TMR is document-feature model – Different fitting schema – TMR is not linking of two LDA

Recent Works Problem in the Graphical Model Revised Model – Weighted TMR – Multiple(l)-interest TMR – Hierarchical LDA

Recent Works Problem in the Graphical Model Revised Model – Weighted TMR Weighting Parameters on User/Group Level

Recent Works Problem in the Graphical Model Revised Model – Multiple(l)-interest TMR Photo Interest formed by multiple basic interests

Recent Works Problem in the Graphical Model Revised Model – Hierarchical LDA Related Work: Blei NIPS04 hierarchical LDA

Recent Works Problem in the Graphical Model – Other problems: Multiple Sources of Feature – tags & visual Currently, not considering User Contact. – Solution: refer to Blei10 study on link prediction Implementation Problem: – Slow speed for sampling – Noise & data structure building

To sum up Recommend User to Group – First work in social media sharing websites, using content, social relations and collaborative information – Proposed Solution: Modeling the interests by matrix factorization A Probabilistic Approach on equivalent Graphical Model. Gibbs Sampling based Parameter Tuning – Future Work Efficient Implementation & Experiment Thank you!