Affiliation Network Models of Clusters in Networks Jure Leskovec Stanford University Joint work with Jaewon Yang
Behind each of these complex systems there is a network, that defines the interactions between the components And thus basically everything around us is a network of interconnected parts. And we have been dealing and living with networks all our lives Networks
Network Clusters Networks are not uniform/homogeneous They exhibit clusters! Why clusters? What do they correspond to? Blogosphere [Adamic&Glance] 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
From Clusters to Communities Clusters form communities Cluster: A set of appropriately connected nodes Community: Nodes with a shared latent property Many reasons for why communities form: World Wide Web Citation networks Social networks Metabolic networks Blogosphere [Adamic&Glance] 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Basis for Community Formation How and why do communities form? Granovetter’s Strength of weak ties suggest and the models of small-world suggest: Strong ties are well embedded in the network Weak ties span long-ranges Given a network, how to find communities? 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
SFI collaboration network [Newman] Finding Communities Q: Given a network, how to find communities? A: Find weak ties and identify communities Girvan-Newman’s betweenness centrality, Modularity, and Graph partitioning methods are all based on this idea SFI collaboration network [Newman] 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Overlapping Communities Non-overlapping vs. overlapping communities 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Overlaps of Social Circles [Palla et al., ‘05] Overlaps of Social Circles A node belongs to many social circles 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Clique Percolation Method (CPM) [Palla et al., ‘05] Clique Percolation Method (CPM) Two nodes belong to the same community if they can be connected through adjacent k-cliques: k-clique: Fully connected graph on k nodes Adjacent k-cliques: overlap in k-1 nodes k-clique community Set of nodes that can be reached through a sequence of adjacent k-cliques k-clique (k=3) adjacent cliques (k=3) A A C D B B D C k=3 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Mixed-Membership Block Models Mixed-Membership Stochastic Block Models [Airoldi et al.] 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Step Back: Community Detection (1) Take a dataset (2) Represent it as a graph (3) Identify communities (really, clusters) (4) Interpret clusters as “real” communities Nodes that have “something” in common C A B D E H F G C A B D E H F G work in the same area publish in same journals 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Ground-Truth Networks with a an explicit notion of Ground-Truth: Collaborations: Conferences & Journals as proxies for scientific areas Social Networks: People join to groups, create lists Information Networks: Users create topic based groups C A B D E H F G C A B D E H F G work in the same area publish in same journals 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Examples of Ground-Truth LiveJournal social network: Users create and join to groups created around culture, entertainment, expression, fandom, life/style, life/support, gaming, sports, student life and technology TuDiabetes network Groups form around specific types of diabetes, different age groups, emotional and social support, arts and crafts groups, different geo regions A node can be a member of 0 or more groups 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Networks with Ground-Truth N … # of nodes E … # of edges C … # of ground-truth communities S … average community size A … memberships per node For example: … fans of Real Madrid ... subscribe to Lady Gaga videos … follow Volvo Ocean Race Youtube social network 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Ground-Truth: Consequences ≈ Ground-truth groups Inferred communities How groups map on the network? Insights for better models How to evaluate and interpret? “Accuracy” of methods 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Consequence: Overlaps are DENSER! Edge Probability Nodes u and v share k groups What is edge prob. P(edge | k) as a func. of k? P(edge | k) Consequence: Overlaps are DENSER! k 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Communities in Networks Does it matter at all? Ganovetter and all non-overlapping methods Palla et al., MMSB and overlapping methods as well 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Detecting Dense Overlaps? Can present community detection methods detect dense overlaps? No! (Mixed-Membership) Stochastic block model 𝐸[𝑃 𝑖,𝑗 ]= 𝑐 𝑃 𝑖,𝑐 𝑃 𝑗,𝑐 𝑃 𝑒 (𝑐) Clique percolation 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Dense Overlaps: Natural Model Communities, C Memberships, M Nodes , V Community-Affiliation Graph Model B(V,C,M,{pc}) Provably generates power-law degree distributions and other patterns real-world networks exhibit [Lattanzi, Sivakumar, STOC ‘09] 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Community-Affiliation Graph Model AGM is flexible and can express variety of network structures: Non-overlapping, Nested, Overlapping 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Model-based Community Detection H F G Given a Graph, find the Model Affiliation graph B Number of communities Parameters pc Yes, we can! 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
MAG Model Fitting Task: Optimization problem (MLE) How to solve? Given network G(V,E). Find B(V, C, M, {pc}) Optimization problem (MLE) How to solve? Approach: Coordinate ascent (1) Stochastic search over B, while keeping {pc} fixed (2) Optimize {pc}, while keeping B fixed (convex!) Works well in practice! C A B D E H F G 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Experimental Setup Communities! Fit C A B D E H F G Evaluate Evaluation: How well do inferred communities correspond to ground-truth? F1 score, Ω-index, NMI, … Algorithms for comparison: MMSB [Airoldi et al., JMLR] Link Clustering [Ahn et al., Nature] Clique Percolation [Palla et al., Nature] 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Example: Facebook Ego-Net High-school Workplace Stanford, Squash 89% accuracy Stanford, Basketball 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Experiments: Ground-truth Overall (only overlaps) AGM improves (F1≈0.6) 57% (21%) over Link clustering 48% (22%) over CPM 10% (26%) over MMSB 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Experiments: Meta-data Based Evaluation based on node metadata [Ahn et al. ‘10] Similar level of improvement 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Conclusion & Reflections Ground-Truth Communities Overlaps are denser Present methods can’t detect such overlaps Community-Affiliation Graph Model Model-based Community Detection Outperforms state-of-the-art 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
Connections: Nested Core-Periphery Denser and denser core of the network Core contains 60% node and 80% edges Whiskers are responsible for good communities Nested Core-Periphery (jellyfish, octopus)
Communities & Core-Periphery 5/27/2019 Jure Leskovec
Explanation 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
THANKS! http://snap.stanford.edu 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks
References J. Yang, J. Leskovec. Structure and Overlaps of Communities in Networks. http://arxiv.org/abs/1205.6228 J. Yang, J. Leskovec. Defining and Evaluating Network Communities based on Ground-truth. http://arxiv.org/abs/1205.6233 J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. In Internet Mathematics. http://arxiv.org/abs/0810.1355 5/27/2019 Jure Leskovec: Affiliation Network Models of Clusters in Networks