Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research
Communities: Sets of nodes with lots of connections inside and few to outside (the rest of the network) Assumption: Networks are (hierarchically) composed of communities (modules) Communities, clusters, groups, modules
How community like is a set of nodes? Want a measure that corresponds to intuition Conductance (normalized cut): Φ(S) = # edges cut / # edges inside Small Φ(S) corresponds to more community-like sets of nodes
Score: Φ(S) = # edges cut / # edges inside
Bad community Φ=5/7 = 0.7
Score: Φ(S) = # edges cut / # edges inside Better community Φ=5/7 = 0.7 Bad community Φ=2/5 = 0.4
Score: Φ(S) = # edges cut / # edges inside Better community Φ=5/7 = 0.7 Bad community Φ=2/5 = 0.4 Best community Φ=2/8 = 0.25
We define: Network community profile (NCP) plot Plot the score of best community of size k Search over all subsets of size k and find best: Φ(k=5) = 0.25 NCP plot is intractable to compute Use approximation algorithm
Dolphin social network Two communities of dolphins NCP plot Network
Zachary’s university karate club social network During the study club split into 2 The split (squares vs. circles) corresponds to cut B NCP plotNetwork
Collaborations between scientists in Networks NCP plotNetwork
Hierarchical network Geometric (grid-like) network – Small social networks – Geometric and – Hierarchical network have downward NCP plot
Previously researchers examined community structure of small networks (~100 nodes) We examined more than 70 different large social and information networks Large real-world networks look completely different!
Typical example: General relativity collaboration network (4,158 nodes, 13,422 edges)
Better and better communities Worse and worse communities Best community has 100 nodes Community score Community size
Whiskers are responsible for downward slope of NCP plot Whisker is a set of nodes connected to the network by a single edge NCP plot Largest whisker
Each new edge inside the community costs more NCP plot Φ=2/1 = 2 Φ=8/3 = 2.6 Φ=64/11 = 5.8 Each node has twice as many children
Network structure: Core-periphery, jellyfish, octopus Whiskers are responsible for good communities Denser and denser core of the network Core contains 60% node and 80% edges
What if we allow cuts that give disconnected communities? Cut all whiskers and compose communities out of them
Community score Community size We get better community scores when composing disconnected sets of whiskers Connected communities Bag of whiskers
Rewired network: random network with same degree distribution
What is a good model that explains such network structure? None of the existing models work Pref. attachment Small World Geometric Pref. Attachment Flat Down and Flat Flat and Down
Forest Fire: connections spread like a fire New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively As community grows it blends into the core of the network
rewired network Bag of whiskers
Whiskers: Largest whisker has ~100 nodes Independent of network size Dunbar number: a person can maintain social relationship to 150 people Bond vs. identity communites Core: Newman et al. analyzed 400k node product network ▪ Largest community has 50% nodes ▪ Community was labeled “miscelaneous”
NCP plot is a way to analyze network community structure Our results agree with previous work on small networks But large networks are fundamentally different Large networks have core-periphery structure Small well isolated communities blend into the core of the networks as they grow