Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu Arizona State University.

Similar presentations


Presentation on theme: "Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu Arizona State University."— Presentation transcript:

1 Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu xufei.wang@asu.edu Arizona State University

2 Contact Information Xufei Wang, Huiji Gao, and Huan Liu, Arizona State University Lei Tang, Yahoo! Labs xufei.wang@asu.edu http://dmml.asu.edu/users /xufei/ 2

3 Social Media Facebook – 500 million active users – 50% of users log on to Facebook everyday Twitter – 100 million users – 300, 000 new users everyday – 55 million tweets everyday Flickr – 12 million members – 5 billion photos 3

4 Social Media http://blog.nielsen.com/nielsenwire/online_mobile/what-americans-do- online-social-media-and-games-dominate-activity/ 4

5 Activities in Social Media Connect with others to form “ Friends ” Interact with others (comment, discussion, messaging) Bookmark websites/URLs (StumbleUpon, Delicious) Join groups if explicitly exist (Flickr, YouTube) Write blogs (Wordpress,Myspace) Update status (Twitter, Facebook) Share content (Flickr, YouTube, Delicious) 5

6 Community Structure Behavior Studying – Individual ? Too many users – Site level ? Lose too much details – Community level. Yes, provide information with vary granularity 6

7 Overlapping Communities 7 Cite this figure here!

8 Overlapping Communities 8 Colleagues Family Neighbors

9 Related Work Disjoint Community Detection – Modularity Maximization – Based on Link Structure, (how to understand ?) Overlapping Community Detection – Soft Clustering (Clustering is dense) – CFinder (Efficiency and Scalability) Co-clustering – Disjoint – Understanding groups by words (tags) 9

10 Problem Statement Given a User-Tag subscription matrix M, and the number of clusters k, find k overlapping communities which consist of both users and tags. u3 t2 u1 u2 t1 t4 u4 u5 t3 10

11 Our Contributions Extracting overlapping communities that better reflect reality Clustering on a user-tag graph. Tags are informative in identifying user interests Understanding groups by looking at tags within each group 11

12 u3 t2 u1 u2 t1 t4 u4 u5 t3 Edge-centric View Cluster edges instead of nodes into disjoint groups – One node can belong to multiple groups – One edge belongs to one group u3 t2 u1 u2 t1 t4 u4 u5 t3 12

13 Edge-centric View In an Edge-centric view edgeu1u2u3u4u5t1t2t3t4 e1 1 0000 1 000 e2 1 00000 1 00 e30 1 000 1 000 e40 1 0000 1 00 e500 1 000 1 00 e600 1 0000 1 0 e7000 1 000 1 0 e8000 1 0000 1 e90000 1 00 1 0 e100000 1 000 1 13

14 Clustering Edges We can use any clustering algorithms (e.g., k-means) to group similar edges together Different similarity schemes 14

15 Defining Edge Similarity Similarity between two edges e and e’ can be defined, but not limited, by ui uj tp tq α is set to 0.5, which suggests the equal importance of user and tag Define user-user and tag-tag similarity 15

16 Independent Learning Assume users are independent, tags are independent 16

17 Normalized Learning Differentiate nodes with varying degrees by normalizing each node with its nodal degree 17

18 Correlational Learning Tags are semantically close – Tags cars, automobile, autos, car reviews are used to describe a blog written by sid0722 on BlogCatalog u Х tu Х k Compute user-user and tag-tag cosine similarity in the latent space 18

19 Spectral Clustering Perspective Graph partition can be solved by the Generalized Eigenvalue problem 19

20 Spectral Clustering Perspective Plug in L,W,Z, we obtain U and V are the right and left singular vectors corresponding to the top k largest singular values of user-tag matrix M 20

21 Synthetic Data Sets Synthetic data sets – Number of clusters, users, and tags – Inner-cluster density and Inter-cluster density (1% of total user-tag links) – Normalized mutual Information Between 0 and 1 The higher, the better 21

22 Synthetic Performance We fix the number of users, tags, and density, but vary the number of clusters 22

23 Synthetic Performance We fixed the number of users, tags, and clusters, but vary the inner-cluster density 23

24 Social Media Data Sets BlogCatalog – Tags describing each blog – Category predefined by BlogCatalog for each blog Delicious – Tags describing each bookmark – Select the top 10 most frequently used tags for each person 24

25 Inferring Personal Interests Category information reveals personal interests, view group affiliation as features to infer personal interests via cross-validation 25

26 Connectivity Study The correlation between the number of co- occurrence of two users in different affiliations and their connectivity in real networks. The larger the co-occurrence of two users, the more likely they are connected 26

27 Understanding Groups via Tag Cloud Tag cloud for Category Health 27

28 Understanding Groups via Tag Cloud Tag cloud for Cluster Health 28

29 Understanding Groups via Tag Cloud Tag cloud for Cluster Nutrition 29

30 Conclusions and Future Work Overlapping communities on a User-Tag graph Propose an edge-centric view and define edge similarity – Independent Learning – Normalized Learning – Correlational Learning Evaluate results in synthetic and real data sets Many applications: link prediction, Scalability 30

31 References I. S. Dhillon, “Co-clustering documents and words using bipartite spectral graph partitioning,” in KDD ’01, NY, USA L. Tang and H. Liu, “Scalable learning of collective behavior based on sparse social dimensions,” in CIKM’09, NY, USA. L. Tang and H. Liu, “Community Detection and Mining in Social Media,” Morgan & Claypool Publishers, Synthesis Lectures on Data Mining and Knowledge Discovery, 2010. G. Palla, I. Dernyi, I. Farkas, and T. Vicsek, “Uncovering the overlapping community structure of complex networks in nature and society,” Nature’05, vol.435, no.7043, p.814 K. Yu, S. Yu, and V. Tresp, “Soft clustering on graphs,” in NIPS, p. 05, 2005. U. Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007. M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Phys. Rev. E, vol. 69, no. 2, p. 026113, Feb 2004. S. Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, no. 3-5, pp. 75 – 174, 2010. 31

32 Contact the Authors Xufei Wang – xufei.wang@asu.edu xufei.wang@asu.edu – Arizona State University Lei Tang – ltang@yahoo-inc.com ltang@yahoo-inc.com – Yahoo! Labs 32


Download ppt "Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu Arizona State University."

Similar presentations


Ads by Google