Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu Arizona State University.

Slides:



Advertisements
Similar presentations
Community Detection and Graph-based Clustering
Advertisements

ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Community Detection and Evaluation
Dimensionality Reduction PCA -- SVD
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
Nodes, Ties and Influence
3.3 Network-Centric Community Detection
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Data Mining and Machine Learning Lab Document Clustering via Matrix Representation Xufei Wang, Jiliang Tang and Huan Liu Arizona State University.
Tagging Systems Austin Wester. Tags A keywords linked to a resource (image, video, web page, blog, etc) by users without using a controlled vocabulary.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
Kyle Heath, Natasha Gelfand, Maks Ovsjanikov, Mridul Aanjaneya, Leo Guibas Image Webs Computing and Exploiting Connectivity in Image Collections.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Social Network Analysis via Factor Graph Model
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Tag-based Social Interest Discovery
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Hao-Shang Ma and Jen-Wei Huang Knowledge and Information Discovery Lab, Dept. of Electrical Engineering, National Cheng Kung University The 7th Workshop.
Community Detection by Modularity Optimization Jooyoung Lee
Lecture 18 Community structures Slides modified from Huan Liu, Lei Tang, Nitin Agarwal.
Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.
Subject : Discovering Overlapping Groups in Social Media Professor : Dr. sh.Esmaili The Student’s Identifiers : Mr. Hossien Sadrizadeh(Slides 3 to 55)
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
Data Mining and Machine Learning Lab Unsupervised Feature Selection for Linked Social Media Data Jiliang Tang and Huan Liu Computer Science and Engineering.
Co-clustering Documents and Words Using Bipartite Spectral Graph Partitioning Jinghe Zhang 10/28/2014 CS 6501 Information Retrieval.
On Finding Fine-Granularity User Communities by Profile Decomposition Seulki Lee, Minsam Ko, Keejun Han, Jae-Gil Lee Department of Knowledge Service Engineering.
Uncovering Overlap Community Structure in Complex Networks using Particle Competition Fabricio A. Liang
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Social Network Analysis. Outline l Background of social networks –Definition, examples and properties l Data in social networks –Data creation, flow and.
Network Community Behavior to Infer Human Activities.
Community Detection Algorithms: A Comparative Analysis Authors: A. Lancichinetti and S. Fortunato Presented by: Ravi Tiwari.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
About Me Swaroop Butala  MSCS – graduating in Dec 09  Specialization: Systems and Databases  Interests:  Learning new technologies  Application of.
Mining information from social media
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Overlapping Community Detection in Networks
Efficient Semi-supervised Spectral Co-clustering with Constraints
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 8. Text Clustering.
Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB.
Unsupervised Streaming Feature Selection in Social Media
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
3.3 Network-Centric Community Detection  Network-Centric Community Detection –consider the global topology of a network. –It aims to partition nodes of.
Scalable Learning of Collective Behavior Based on Sparse Social Dimensions Lei Tang, Huan Liu CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/02/01.
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Location-based Social Networks 6/11/20161 CENG 770.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Graph clustering to detect network modules
by Hyunwoo Park and Kichun Lee Knowledge-Based Systems 60 (2014) 58–72
Greedy Algorithm for Community Detection
Community detection in graphs
Noémi Gaskó, Rodica Ioana Lung, Mihai Alexandru Suciu
3.3 Network-Centric Community Detection
GANG: Detecting Fraudulent Users in OSNs
Analysis of Large Graphs: Overlapping Communities
Presentation transcript:

Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu Arizona State University

Contact Information Xufei Wang, Huiji Gao, and Huan Liu, Arizona State University Lei Tang, Yahoo! Labs /xufei/ 2

Social Media Facebook – 500 million active users – 50% of users log on to Facebook everyday Twitter – 100 million users – 300, 000 new users everyday – 55 million tweets everyday Flickr – 12 million members – 5 billion photos 3

Social Media online-social-media-and-games-dominate-activity/ 4

Activities in Social Media Connect with others to form “ Friends ” Interact with others (comment, discussion, messaging) Bookmark websites/URLs (StumbleUpon, Delicious) Join groups if explicitly exist (Flickr, YouTube) Write blogs (Wordpress,Myspace) Update status (Twitter, Facebook) Share content (Flickr, YouTube, Delicious) 5

Community Structure Behavior Studying – Individual ? Too many users – Site level ? Lose too much details – Community level. Yes, provide information with vary granularity 6

Overlapping Communities 7 Cite this figure here!

Overlapping Communities 8 Colleagues Family Neighbors

Related Work Disjoint Community Detection – Modularity Maximization – Based on Link Structure, (how to understand ?) Overlapping Community Detection – Soft Clustering (Clustering is dense) – CFinder (Efficiency and Scalability) Co-clustering – Disjoint – Understanding groups by words (tags) 9

Problem Statement Given a User-Tag subscription matrix M, and the number of clusters k, find k overlapping communities which consist of both users and tags. u3 t2 u1 u2 t1 t4 u4 u5 t3 10

Our Contributions Extracting overlapping communities that better reflect reality Clustering on a user-tag graph. Tags are informative in identifying user interests Understanding groups by looking at tags within each group 11

u3 t2 u1 u2 t1 t4 u4 u5 t3 Edge-centric View Cluster edges instead of nodes into disjoint groups – One node can belong to multiple groups – One edge belongs to one group u3 t2 u1 u2 t1 t4 u4 u5 t3 12

Edge-centric View In an Edge-centric view edgeu1u2u3u4u5t1t2t3t4 e e e e e e e e e e

Clustering Edges We can use any clustering algorithms (e.g., k-means) to group similar edges together Different similarity schemes 14

Defining Edge Similarity Similarity between two edges e and e’ can be defined, but not limited, by ui uj tp tq α is set to 0.5, which suggests the equal importance of user and tag Define user-user and tag-tag similarity 15

Independent Learning Assume users are independent, tags are independent 16

Normalized Learning Differentiate nodes with varying degrees by normalizing each node with its nodal degree 17

Correlational Learning Tags are semantically close – Tags cars, automobile, autos, car reviews are used to describe a blog written by sid0722 on BlogCatalog u Х tu Х k Compute user-user and tag-tag cosine similarity in the latent space 18

Spectral Clustering Perspective Graph partition can be solved by the Generalized Eigenvalue problem 19

Spectral Clustering Perspective Plug in L,W,Z, we obtain U and V are the right and left singular vectors corresponding to the top k largest singular values of user-tag matrix M 20

Synthetic Data Sets Synthetic data sets – Number of clusters, users, and tags – Inner-cluster density and Inter-cluster density (1% of total user-tag links) – Normalized mutual Information Between 0 and 1 The higher, the better 21

Synthetic Performance We fix the number of users, tags, and density, but vary the number of clusters 22

Synthetic Performance We fixed the number of users, tags, and clusters, but vary the inner-cluster density 23

Social Media Data Sets BlogCatalog – Tags describing each blog – Category predefined by BlogCatalog for each blog Delicious – Tags describing each bookmark – Select the top 10 most frequently used tags for each person 24

Inferring Personal Interests Category information reveals personal interests, view group affiliation as features to infer personal interests via cross-validation 25

Connectivity Study The correlation between the number of co- occurrence of two users in different affiliations and their connectivity in real networks. The larger the co-occurrence of two users, the more likely they are connected 26

Understanding Groups via Tag Cloud Tag cloud for Category Health 27

Understanding Groups via Tag Cloud Tag cloud for Cluster Health 28

Understanding Groups via Tag Cloud Tag cloud for Cluster Nutrition 29

Conclusions and Future Work Overlapping communities on a User-Tag graph Propose an edge-centric view and define edge similarity – Independent Learning – Normalized Learning – Correlational Learning Evaluate results in synthetic and real data sets Many applications: link prediction, Scalability 30

References I. S. Dhillon, “Co-clustering documents and words using bipartite spectral graph partitioning,” in KDD ’01, NY, USA L. Tang and H. Liu, “Scalable learning of collective behavior based on sparse social dimensions,” in CIKM’09, NY, USA. L. Tang and H. Liu, “Community Detection and Mining in Social Media,” Morgan & Claypool Publishers, Synthesis Lectures on Data Mining and Knowledge Discovery, G. Palla, I. Dernyi, I. Farkas, and T. Vicsek, “Uncovering the overlapping community structure of complex networks in nature and society,” Nature’05, vol.435, no.7043, p.814 K. Yu, S. Yu, and V. Tresp, “Soft clustering on graphs,” in NIPS, p. 05, U. Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Phys. Rev. E, vol. 69, no. 2, p , Feb S. Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, no. 3-5, pp. 75 – 174,

Contact the Authors Xufei Wang – – Arizona State University Lei Tang – – Yahoo! Labs 32