Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical properties of network community structure

Similar presentations


Presentation on theme: "Statistical properties of network community structure"— Presentation transcript:

1 Statistical properties of network community structure
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

2 Network communities Communities: Assumption:
Sets of nodes with lots of connections inside and few to outside (the rest of the network) Assumption: Networks are (hierarchically) composed of communities (modules) Our question: Are large networks really like this? Communities, clusters, groups, modules

3 Community score (quality)
How community like is a set of nodes? Need a natural intuitive measure Conductance (normalized cut) S’ Φ(S) = # edges cut / # edges inside Small Φ(S) corresponds to more community-like sets of nodes

4 Community score (quality)
What is “best” community of 5 nodes? Score: Φ(S) = # edges cut / # edges inside

5 Community score (quality)
What is “best” community of 5 nodes? Bad community Φ=5/6 = 0.83 Score: Φ(S) = # edges cut / # edges inside

6 Community score (quality)
What is “best” community of 5 nodes? Bad community Φ=5/7 = 0.7 Better community Φ=2/5 = 0.4 Score: Φ(S) = # edges cut / # edges inside

7 Community score (quality)
What is “best” community of 5 nodes? Bad community Φ=5/7 = 0.7 Best community Φ=2/8 = 0.25 Better community Φ=2/5 = 0.4 Score: Φ(S) = # edges cut / # edges inside

8 Network Community Profile Plot
We define: Network community profile (NCP) plot Plot the score of best community of size k Search over all subsets of size k and find best: Φ(k=5) = 0.25 NCP plot is intractable to compute Use approximation algorithm

9 NCP plot: Small Social Network
Dolphin social network Two communities of dolphins NCP plot Network

10 NCP plot: Zachary’s karate club
Zachary’s university karate club social network During the study club split into 2 The split (squares vs. circles) corresponds to cut B Network NCP plot

11 NCP plot: Network Science
Collaborations between scientists in Networks Network NCP plot

12 Geometric and Hierarchical graphs
Geometric (grid-like) network – Small social networks – Geometric and – Hierarchical network have downward NCP plot Hierarchical network

13 Our work: Large networks
Previously researchers examined community structure of small networks (~100 nodes) We examined more than 70 different large social and information networks Large real-world networks look very different!

14 Example of our findings
Typical example: General relativity collaboration network (4,158 nodes, 13,422 edges)

15 NCP: LiveJournal (N=5M, E=42M)
Community score Community size Better and better communities Best communities get worse and worse Best community has 100 nodes

16 Explanation: Downward part
NCP plot Whiskers are responsible for downward slope of NCP plot Largest whisker Whisker is a set of nodes connected to the network by a single edge

17 Explanation: Upward part
NCP plot Each new edge inside the community costs more Φ=1/3 = 0.33 Φ=2/4 = 0.5 Φ=8/6 = 1.3 Φ=64/14 = 4.5 Each node has twice as many children

18 Suggested network structure
Denser and denser core of the network Core contains 60% node and 80% edges Whiskers are responsible for good communities Network structure: Core-periphery (jellyfish, octopus)

19 Caveat: Bag of whiskers
What if we allow cuts that give disconnected communities? Cut all whiskers Compose communities out of whiskers How good “community” do we get?

20 Communities made of whiskers
We get better community scores when composing disconnected sets of whiskers Community score Connected communities Bag of whiskers Community size

21 Comparison to rewire network
Take a real network G Rewire edges for a long time We obtain a random graph with same degree distribution as the real network G

22 Comparison to a rewired network
Rewired network: random network with same degree distribution

23 What is a good model? What is a good model that explains such network structure? None of the existing models work Pref. attachment Flat Flat and Down Down and Flat Small World Geometric Pref. Attachment

24 Forest Fire model works
Forest Fire: connections spread like a fire New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively As community grows it blends into the core of the network

25 Forest Fire NCP plot rewired network Bag of whiskers

26 Conclusion and connections
Whiskers: Largest whisker has ~100 nodes Independent of network size Dunbar number: a person can maintain social relationship to at most 150 people Bond vs. identity communities Core: Core has little structure (hard to cut) Still more structure than the random network

27 Conclusion and connections
NCP plot is a way to analyze network community structure Our results agree with previous work on small networks (that are commonly used for testing community finding algorithms) But large networks are different Large networks Whiskers + Core structure Small well isolated communities blend into the core of the networks as they grow


Download ppt "Statistical properties of network community structure"

Similar presentations


Ads by Google