Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook INSTRUCTOR: DONGCHUL KIM ANUSHA BOOTHPUR 20303325.

Slides:

Advertisements

Similar presentations

Statistics for Improving the Efficiency of Public Administration Daniel Peña Universidad Carlos III Madrid, Spain NTTS 2009 Brussels.

Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei

Topic models Source: Topic models, David Blei, MLSS 09.

Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.

Ouyang Ruofei Topic Model Latent Dirichlet Allocation Ouyang Ruofei May LDA.

Information retrieval – LSI, pLSI and LDA

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

Hierarchical Dirichlet Processes

Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.

Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook Department of Computer Science, KAIST Dabi Ahn, Taehun Kim,

Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.

Title: The Author-Topic Model for Authors and Documents

Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,

Probabilistic Clustering-Projection Model for Discrete Data

Statistical Topic Modeling part 1

2. Introduction Multiple Multiplicative Factor Model For Collaborative Filtering Benjamin Marlin University of Toronto. Department of Computer Science.

Caimei Lu et al. (KDD 2010) Presented by Anson Liang.

Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.

Personalized Search Result Diversification via Structured Learning

Latent Dirichlet Allocation a generative model for text

CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )

Modeling User Rating Profiles For Collaborative Filtering

. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.

Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a generative framework for spectral clustering Amos Storkey, Tom.

Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.

Chapter Two Probability Distributions: Discrete Variables

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

A Thousand Words in a Scene P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez and T. Tuytelaars PAMI, Sept

Introduction to Machine Learning for Information Retrieval Xiaolong Wang.

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

Example 16,000 documents 100 topic Picked those with large p(w|z)

Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.

Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.

Developing Trust Networks based on User Tagging Information for Recommendation Making Touhid Bhuiyan et al. WISE May 2012 SNU IDB Lab. Hyunwoo Kim.

Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.

Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]

The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang

Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.

Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.

LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor ： Dr. Koh Jia-Ling Speaker ： Tu.

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.

by Ryan P. Adams, Iain Murray, and David J.C. MacKay (ICML 2009)

CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.

Using Social Annotations to Improve Language Model for Information Retrieval Shengliang Xu, Shenghua Bao, Yong Yu Shanghai Jiao Tong University Yunbo Cao.

Link Distribution on Wikipedia [0407]KwangHee Park.

Web-Mining Agents Topic Analysis: pLSI and LDA

Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)

Topic Modeling for Short Texts with Auxiliary Word Embeddings

Online Multiscale Dynamic Topic Models

Community-based User Recommendation in Uni-Directional Social Networks

Frontiers of Computer Science, 2015, 9(4):608–622

Stochastic Optimization Maximization for Latent Variable Models

Michal Rosen-Zvi University of California, Irvine

CS246: Latent Dirichlet Analysis

Junghoo “John” Cho UCLA

Topic models for corpora and for graphs

Topic Models in Text Processing

Parametric Methods Berlin Chen, 2005 References:

Jinwen Guo, Shengliang Xu, Shenghua Bao, and Yong Yu

GhostLink: Latent Network Inference for Influence-aware Recommendation

Presentation transcript:

Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook INSTRUCTOR: DONGCHUL KIM ANUSHA BOOTHPUR

INTRODUCTION  Active users converse with their social neighbors via social activities such as posting comments one after another.  Social correlation, researchers have proposed solutions to inferring not only user attributes like geographic location and schools attended but also a user’s interests in social networks.  we explore how we can formulate a method of inferring user interests by combining both familiarity and topic similarity with social neighbors.

System Workflow

Inferring User Interest Using Topic Structure  We formally define Interest-Score for interest i k of user u i as:  Correlation i,j,k ， the strength of correlation between u i and u j for i k, is defined as:

Correlation-Weight  We compute Correlation-Weight w i,j,k by estimating similarity between the two topic distribution vectors and averaging them  h = 1 to H(the number of total social activities between u i and u j )  = a topic distribution vector of each social content  = a topic distribution vector of each interest content

Latent Dirichlet allocation  Latent Dirichlet allocation ( LDA ) is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.  The basic idea is that the documents are represented as random mixtures over latent topics, where a topic is characterized by a distribution over words.  we want to put a distribution on multinomials. That is, k-tuples of non-negative numbers that sum to one.  The space is of all of these multinomials has a nice geometric interpretation as a (k-1)-simplex, which is just a generalization of a triangle to (k-1) dimensions

Latent Dirichlet Allocation(LDA) The parameters and are corpus-level parameters. The variables are document-level variables The variables z dn and w dn are word-level variables and are sampled once for each word in each document

Dirichlet Distributions Useful Facts: This distribution is defined over a (k-1)-simplex. That is, it takes k non-negative arguments which sum to one. Consequently it is a natural distribution to use over multinomial distributions. In fact, the Dirichlet distribution is the conjugate prior to the multinomial distribution. (This means that if our likelihood is multinomial with a Dirichlet prior, then the posterior is also Dirichlet!) The Dirichlet parameter  i can be thought of as a prior count of the i th class.

Online Familiarity  We formulate Online Familiarity f i,j as:  Freq(i,j) is defined as:  P i,j = u i writes a posting into u j ’s wall  C i,j = u i writes comment(s) in u j ’s posting  L i,j = u i likes u j ’s posting.

Dataset  A user writes a posting on his social neighbor’s wall or a posting is written in his wall by the social neighbor.  A user writes comment(s) in his social neighbor’s posting or comment(s) is written in his posting by the social neighbor.  A user likes his social neighbor’s posting or his posting is liked by the social neighbor.(In Facebook, a user expresses his/her preference about a post by pressing the “Like” button.)

Based on Questionnaire

Online Familiarity

Evaluation  Based on User Explicit Interest  EXP is set of the user’s explicit interests  INF N is a set of top-N inferred interests ordered by Interest-Score  Based on Questionnaire

Result

Conclusion  We consider topic similarity between communication contents and interest descriptions as well as the degree of familiarity  We plan to extend the proposed scheme by using not only spatial aspects such as a user’s location, trace history or characteristics of a place, but also temporal context like time slots

THANK YOU