Popularity-Aware Topic Model for Social Graphs Junghoo “John” Cho UCLA.

Slides:



Advertisements
Similar presentations
Topic models Source: Topic models, David Blei, MLSS 09.
Advertisements

Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.
Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:
One Theme in All Views: Modeling Consensus Topics in Multiple Contexts Jian Tang 1, Ming Zhang 1, Qiaozhu Mei 2 1 School of EECS, Peking University 2 School.
Dong Liu Xian-Sheng Hua Linjun Yang Meng Weng Hong-Jian Zhang.
Content Management & Hashtag Recommendation IN P2P OSN By Keerthi Nelaturu.
Title: The Author-Topic Model for Authors and Documents
Statistical Topic Modeling part 1
A Joint Model of Text and Aspect Ratings for Sentiment Summarization Ivan Titov (University of Illinois) Ryan McDonald (Google Inc.) ACL 2008.
WIMS 2014, Thessaloniki, June 2014 A soft frequent pattern mining approach for textual topic detection Georgios Petkos, Symeon Papadopoulos, Yiannis Kompatsiaris.
Caimei Lu et al. (KDD 2010) Presented by Anson Liang.
Topic Modeling with Network Regularization Md Mustafizur Rahman.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Latent Dirichlet Allocation a generative model for text
A probabilistic approach to semantic representation Paper by Thomas L. Griffiths and Mark Steyvers.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
1. Social-Network Analysis Using Topic Models 2. Web Event Topic Analysis by Topic Feature Clustering and Extended LDA Model RMBI4310/COMP4332 Big Data.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Dongyeop Kang1, Youngja Park2, Suresh Chari2
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará
Example 16,000 documents 100 topic Picked those with large p(w|z)
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Building Face Dataset Shijin Kong. Building Face Dataset Ramanan et al, ICCV 2007, Leveraging Archival Video for Building Face DatasetsLeveraging Archival.
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
User Profiling based on Folksonomy Information in Web 2.0 for Personalized Recommender Systems Huizhi (Elly) Liang Supervisors: Yue Xu, Yuefeng Li, Richi.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
27. May Topic Models Nam Khanh Tran L3S Research Center.
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
Anant Pradhan PET: A Statistical Model for Popular Events Tracking in Social Communities Cindy Xide Lin, Bo Zhao, Qiaozhu Mei, Jiawei Han (UIUC)
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
PCI th Panhellenic Conference in Informatics Clustering Documents using the 3-Gram Graph Representation Model 3 / 10 / 2014.
Clustering More than Two Million Biomedical Publications Comparing the Accuracies of Nine Text-Based Similarity Approaches Boyack et al. (2011). PLoS ONE.
 Goal recap  Implementation  Experimental Results  Conclusion  Questions & Answers.
Topic Modeling using Latent Dirichlet Allocation
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014.
Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.
Recommend User to Group in Flickr Zhe Zhao
Link Distribution on Wikipedia [0407]KwangHee Park.
Analysis of Social Media MLD , LTI William Cohen
Evaluating Event Credibility on Twitter Presented by Yanan Xie College of Computer Science, Zhejiang University 2012.
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois.
Jointly Modeling Aspects, Ratings and Sentiments for Movie Recommendation (JMARS) Authors: Qiming Diao, Minghui Qiu, Chao-Yuan Wu Presented by Gemoh Mal.
Collaborative Deep Learning for Recommender Systems
TribeFlow Mining & Predicting User Trajectories Flavio Figueiredo Bruno Ribeiro Jussara M. AlmeidaChristos Faloutsos 1.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
The topic discovery models
Mining Topics in Documents: Standing on the Shoulders of Big Data
Multimodal Learning with Deep Boltzmann Machines
Representing Documents Through Their Readers
The topic discovery models
Latent Dirichlet Analysis
The topic discovery models
Personalized Celebrity Video Search Based on Cross-space Mining
Learning Probabilistic Graphical Models Overview Learning Problems.
CS246: Latent Dirichlet Analysis
Junghoo “John” Cho UCLA
Topic Models in Text Processing
Jinwen Guo, Shengliang Xu, Shenghua Bao, and Yong Yu
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

Popularity-Aware Topic Model for Social Graphs Junghoo “John” Cho UCLA

Grouping Users Facebook friend recommendation 2

Grouping Music Youtube “similar to” 이 밤을 다시 한번 3

Grouping Words Results from 37,000 passages of TASA corpus Topic-based word clustering

Core Issue How can we group “objects” that are similar to each other? Probabilistic topic model has been very effective for this task in textual data – Particularly, Latent Dirichlet Analysis (LDA)

Topic Models for Graphs Can we use LDA for data from other domains? – Graph representation of data – “Cluster” nodes in a graph by their topics Any problem? DocsWords money bank river doc 1 doc 2 doc 3 Contains Users Movies Love Actually Twilight Batman alice bob eve Watches Users barack obama hugh grant robert pattinson Follows

Curse of “Popularity Noise” Example result – LDA is applied to the Twitter follow graph

Curse of “Popularity Noise” LDA requires that all words appear roughly at the same frequency – “Solution”: Remove too frequent or too infrequent words – This “hack” works fine for textual data because too frequent words are function words without much meaning But in data from other domains – Frequent items are often items of interest in other domains – Cannot simply remove frequent items from data

Overview Introduction to LDA – Document generation model – LDA inference Introduction to popularity-aware topic model – Popularity path – Inference – Experimental results

Document Generation Model How do we write a document? 1.Pick a topic 2.Write words related to the topic

Probabilistic Topic Model There exists T number of topics For each topic, decide the words that are more likely to be used given the topic. – Topic to word vector P(w j |z i ) Then for every document d, – The user decides the topics to write on Document to topic probability vector P(z i |d) – For each word in d The user selects a topic z i with probability P(z i |d) The user selects a word w j with probability P(w j |z i )

Probabilistic Document Model Topic 1 Topic 2 DOC 1 DOC 2 DOC P(w|z)P(z|d ) river 2 stream 2 river 2 bank 2 stream 2... money 1 river 2 bank 1 stream 2 bank 2... moneyloanbank bank 1 money 1 …

Plate Notation of LDA T M N w z P(z|d) P(w|z)   Often,  50/T,  = 200/W

How Is the Model Used for the Task? Given the document corpus, identify the hidden parameters of the document generation model that “fits” best with the corpus – Model-based inferencing

Generative Model vs Inference (1) Topic 1 Topic 2 DOC 1 DOC 2 DOC P(w|z)P(z|d ) money 1 bank 1 loan 1 bank 1 money 1... river 2 stream 2 river 2 bank 2 stream 2... money 1 river 2 bank 1 stream 2 bank 2...

Generative Model vs Inference (2) Topic 1 Topic 2 DOC 1 DOC 2 DOC 3 ? ? ? ? money ? bank ? loan ? bank ? money ?... river ? stream ? river ? bank ? stream ?... money ? river ? bank ? stream ? bank ?...

Addressing Popularity Noise How to eliminate noise from popular nodes? – Many models tried: multiplication model, polya- urn model, two-path model, … Why does a Twitter user follow Justin Bieber? – Because the user is interested in pop music – Because Justin Bieber is a celebrity “Two-path” for following other users – Popularity path (because the user is “popular”) – Topic path (because of the interest in the user’s topic)

Plate Notation T M N w z P(z|d) P(w|z)   p  P(p|d)  

Model Inferencing by Gibbs Sampling

Twitter Dataset 10 million edges from the Twitter user follow graph (crawled in 2010) Non-popular writer group (Edges to non-popular writers) Popular writer group (Edges to popular writers)

Perplexity How well does “new” data fit with the model? – Lower is better

Survey “Coherence” of 23 random topic groups were evaluated by 14 participants Relevant Irrelevant Relevant Irrelevant # of followers 8 true positives 2 false positives

Quality Human perceived quality of each topic group from survey results weight true/false positive

Example Topic Groups Popular and related users in each group

Conclusion Popularity-bias problem in graphs Popularity-aware topic models – 2-path model Experiments on Twitter dataset – Low perplexity – High quality

Thank You Any questions?