Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook INSTRUCTOR: DONGCHUL KIM ANUSHA BOOTHPUR
INTRODUCTION Active users converse with their social neighbors via social activities such as posting comments one after another. Social correlation, researchers have proposed solutions to inferring not only user attributes like geographic location and schools attended but also a user’s interests in social networks. we explore how we can formulate a method of inferring user interests by combining both familiarity and topic similarity with social neighbors.
System Workflow
Inferring User Interest Using Topic Structure We formally define Interest-Score for interest i k of user u i as: Correlation i,j,k , the strength of correlation between u i and u j for i k, is defined as:
Correlation-Weight We compute Correlation-Weight w i,j,k by estimating similarity between the two topic distribution vectors and averaging them h = 1 to H(the number of total social activities between u i and u j ) = a topic distribution vector of each social content = a topic distribution vector of each interest content
Latent Dirichlet allocation Latent Dirichlet allocation ( LDA ) is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. The basic idea is that the documents are represented as random mixtures over latent topics, where a topic is characterized by a distribution over words. we want to put a distribution on multinomials. That is, k-tuples of non-negative numbers that sum to one. The space is of all of these multinomials has a nice geometric interpretation as a (k-1)-simplex, which is just a generalization of a triangle to (k-1) dimensions
Latent Dirichlet Allocation(LDA) The parameters and are corpus-level parameters. The variables are document-level variables The variables z dn and w dn are word-level variables and are sampled once for each word in each document
Dirichlet Distributions Useful Facts: This distribution is defined over a (k-1)-simplex. That is, it takes k non-negative arguments which sum to one. Consequently it is a natural distribution to use over multinomial distributions. In fact, the Dirichlet distribution is the conjugate prior to the multinomial distribution. (This means that if our likelihood is multinomial with a Dirichlet prior, then the posterior is also Dirichlet!) The Dirichlet parameter i can be thought of as a prior count of the i th class.
Online Familiarity We formulate Online Familiarity f i,j as: Freq(i,j) is defined as: P i,j = u i writes a posting into u j ’s wall C i,j = u i writes comment(s) in u j ’s posting L i,j = u i likes u j ’s posting.
Dataset A user writes a posting on his social neighbor’s wall or a posting is written in his wall by the social neighbor. A user writes comment(s) in his social neighbor’s posting or comment(s) is written in his posting by the social neighbor. A user likes his social neighbor’s posting or his posting is liked by the social neighbor.(In Facebook, a user expresses his/her preference about a post by pressing the “Like” button.)
Based on Questionnaire
Online Familiarity
Evaluation Based on User Explicit Interest EXP is set of the user’s explicit interests INF N is a set of top-N inferred interests ordered by Interest-Score Based on Questionnaire
Result
Conclusion We consider topic similarity between communication contents and interest descriptions as well as the degree of familiarity We plan to extend the proposed scheme by using not only spatial aspects such as a user’s location, trace history or characteristics of a place, but also temporal context like time slots
THANK YOU