Download presentation
Presentation is loading. Please wait.
Published bySilvester Ross Modified over 9 years ago
1
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta, Michael Mahoney Yahoo! Research
2
Big data Study emerging behaviors How are small networks different from large 2
3
Communities (groups, clusters, modules): Sets of nodes with lots of connections inside and few to outside (the rest of the network) 3 Communities, clusters, groups, modules
4
Nodes represent proteins Edges represent interactions/associations Proteins with same function interact more Can use network to discover functional groups 4 Yeast transcriptional regulatory modules [Bar-Joseph et al., 2003]
5
Clusters correspond to social communities, organizational units (e.g., departments) 5 Zachary’s Karate club network During the study the club split into 2 The split corresponds to min-cut ( ● vs. ■ )
6
6 [Adamic-Glance 2005] Democrat vs. Republican blogs
7
7 Citations Collaborations [Newman 2003]
8
Nested communities: modular structure of networks is hierarchically organized 8 CS Math DramaMusic Science Arts University
9
Recursive hierarchical network 9 (a) N=5, E=8 (b) N=25, E=56 (c) N=125, E=344
10
Intuition: Find nodes that can be easily separated from the rest of the network Various objective functions Min-cut Normalized-cut Centrality, Modularity Various algorithms Spectral clustering (random walks) Girvan-Newman (centrality) Metis (contraction based) 10 Girvan-Newman: 1) Betweenness centrality: number of shortest paths passing through an edge. 2) Remove edges by decreasing centrality
11
11
12
Statistical properties of community structure Instead of searching for communities we measure well how expressed are communities Questions What is the community structure of real world networks? How to measure and quantify this? What does this tell us about network structure? What is a good model (intuition)? What are consequences for clustering/partitioning algorithms? 12
13
How community like is a set of nodes? Need a natural intuitive measure Conductance (normalized cut) Φ(S) = # edges cut / # edges inside Small Φ(S) corresponds to more community-like sets of nodes S S’ 13
14
Score: Φ(S) = # edges cut / # edges inside What is “best” community of 5 nodes? 14
15
Score: Φ(S) = # edges cut / # edges inside Bad community Φ=5/6 = 0.83 What is “best” community of 5 nodes? 15
16
Score: Φ(S) = # edges cut / # edges inside Better community Φ=5/7 = 0.7 Bad community Φ=2/5 = 0.4 What is “best” community of 5 nodes? 16
17
Score: Φ(S) = # edges cut / # edges inside Better community Φ=5/7 = 0.7 Bad community Φ=2/5 = 0.4 Best community Φ=2/8 = 0.25 What is “best” community of 5 nodes? 17
18
We define: Network community profile (NCP) plot Plot the score of best community of size k Search over all subsets of size k and find best: Φ(k=5) = 0.25 NCP plot is intractable to compute 18
19
We define: Network community profile (NCP) plot Plot the score of best community of size k 19 Community size, log k log Φ(k) k=5, Φ(k)=0.25 k=7, Φ(k)=0.18
20
20 Community size, log k Community score, log Φ(k)
21
Local spectral clustering algorithm Pick a seed node Slowly diffuse mass around it (via PageRank like random walk) Find the bottleneck Repeat many times Many seed nodes for very local walks Less seed nodes for more global (longer) walks 21
22
22
23
Dolphin social network Two communities of dolphins NCP plot Network 23
24
Zachary’s university karate club social network During the study club split into 2 The split (squares vs. circles) corresponds to cut B NCP plotNetwork 24
25
Collaborations between scientists in Networks NCP plotNetwork 25
26
26 NCP plot Network
27
27 NCP plot Network
28
Manifold learning dataset (Hands) 28 NCP plot Network
29
Eastern US power grid: 29
30
30 NCP plot Network – Small social networks – Geometric and – Hierarchical network have downward NCP plot What about large networks?
31
31
32
Previously researchers examined community structure of small networks (~100 nodes) We examined more than 70 different large networks Large real-world networks look very different! 32
33
Typical example: General relativity collaboration network (4,158 nodes, 13,422 edges) 33
34
Community score Community size Better and better communities Best communities get worse and worse Best community has 100 nodes 34
35
Whiskers are responsible for downward slope of NCP plot Whisker is a set of nodes connected to the network by a single edge NCP plot Largest whisker 35
36
Each new edge inside the community costs more NCP plot Φ=2/4 = 0.5 Φ=8/6 = 1.3 Φ=64/14 = 4.5 Each node has twice as many children Φ=1/3 = 0.33 36
37
Take a real network G Rewire edges for a long time We obtain a random graph with same degree distribution as the real network G 37
38
38 Rewired network: random network with same degree distribution
39
39 Whiskers in real networks are larger than expected
40
40 Whiskers in real networks are non-trivial (richer than trees) Edge to cut
41
What if we allow cuts that give disconnected communities? Cut all whiskers Compose communities out of whiskers How good “communities” do we get? 41
42
Community score Community size We get better community scores when composing disconnected sets of whiskers Connected communities Bag of whiskers 42
43
43 Nothing happens! Now we have 2-edge connected whiskers to deal with.
44
44 Connected communities Bag of whiskers Rewired network
45
Network structure: Core-periphery (jellyfish, octopus) Whiskers are responsible for good communities Denser and denser core of the network Core contains 60% node and 80% edges 45
46
46
47
(Sparse) Random graph: Start with N nodes Pick pairs of nodes uniformly at random and connect 47 Flat (long random connections) Theorem (works for any degree distribution) Sparsity does not explain our observation
48
48 Preferential attachment [Price 1965, Albert & Barabasi 1999]: Add a new node, create m out-links Probability of linking a node k i is proportional to its degree Based on Herbert Simon’s result Power-laws arise from “Rich get richer” (cumulative advantage) Flat (connections to hubs – no locality)
49
Let’s exploit local connections 49 Down (locally network looks like a mesh) and Flat (at large scale network looks random)
50
Geometric preferential attachment: Place nodes at random in 2D Pick a node Pick nodes in a radius Connect preferentially 50 Flat (locally network is random) and Down (globally network is a mesh – union of local expanders)
51
Forest Fire: connections spread like a fire New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively As community grows it blends into the core of the network 51
52
rewired network Bag of whiskers 52
53
Whiskers: Largest whisker has ~100 nodes Independent of network size Dunbar number: a person can maintain social relationship to at most 150 people Core: Core has little structure (hard to cut) Still more structure than the random network 53
54
Other researchers examined small networks so they did not hit the Dunbar’s limit Small evidence: 400k nodes Amazon co-purchasing network [Clauset et al. 2004] ▪ Largest community has 50% of all nodes ▪ It was labeled “Miscelaneous” Karate club has no significant community structure [Newman et al. 2007] 54
55
Bond vs. identity communities Multiple hierarchies that blur the community boundaries 55
56
Ground truth Yes, use attributes, better link semantics 56
57
NCP plot is a way to analyze network community structure Our results agree with previous work on small networks (that are commonly used for testing community finding algorithms) But large networks are different Large networks Whiskers + Core structure Small well isolated communities blend into the core of the networks as they grow 57
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.