Presentation is loading. Please wait.

Presentation is loading. Please wait.

DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer.

Similar presentations


Presentation on theme: "DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer."— Presentation transcript:

1 DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer Science Dep., University of Pisa, Italy {rossetti,pedre}@di.unipi.it 2 ISTI - CNR KDDLab, Pisa, Italy {fosca.giannotti, giulio.rossetti}@isti.cnr.it 3 Harvard Kennedy School, Cambridge, MA, US michele_coscia@hks.harvard.edu April 23th 2013

2 Outline Problem Definition What is a community? Community Discovery Communities and complex (social) networks A matter of perspective DEMON Algorithm(s) Properties Experiments Extension Conclusions

3 What is a community? Unfortunately does not exist a completely shared definition of what a community is. A general idea is that a community represent: “A set of entities where each entity is closer, in the network sense, to the other entities within the community than to the entities outside it.” or “A set of nodes tightly connected within each other than with nodes belonging to other sets.”

4 Community Discovery The aim of CD algorithms is to identify communities hidden into complex network structure Why Community Discovery? “Cluster” homogeneous nodes relying on topological information (Clustering networked entities) Major Problems: Each algorithm models different properties of real world communities Comparison and evaluation of different methodologies is not trivial Found an acceptable compromise between number of communities and their sizes Context Dependency

5 Community Discovery Approaches Given the complexity of the problem a number of different typologies of approaches where proposed, analyzing: Directed\Undirected edges Weighted\Unweighted edges Top-Down\Bottom-Up partitioning Multidimensionality Overlap among Communities Hierarchical Communities … DEMON: Undirected, Bottom-Up, Overlapping (with Directed, Weighted, Hierarchical extensions)

6 Outline Problem Definition What is a community? Community Discovery Communities and complex (social) networks A matter of perspective DEMON Algorithm(s) Properties Experiments Extension Conclusions

7 Communities in (Social) Networks Communities can be seen as the basic bricks of a (social) network In simple, small, networks it is easy identify them by looking at the structure..

8 …but real world networks are not “simple” We can’t identify easily different communities Too many nodes and edges

9 Are they two different phenomena? No!

10 A Matter of Perspective The only difference is in the scale Locally, for each node the structure makes sense Globally, we are tangled in complex overlaps Idea: a bottom-up approach!

11 Outline Problem Definition What is a community? Community Discovery Communities and complex (social) networks A matter of perspective DEMON Algorithm(s) Properties Experiments Extension Conclusions

12 Reducing the complexity Real Networks are Complex Objects Can we make them “simpler”? Ego-Networks (networks builded upon a focal node, the "ego”, and the nodes to whom ego is directly connected to plus the ties, if any, among the alters)

13 DEMON Algorithm For each node n: 1. Extract the Ego Network of n 2. Remove n from the Ego Network 3. Perform a Label Propagation 1 4. Insert n in each community found 5. Update the raw community set C For each raw community c in C 1. Merge with “similar” ones in the set (given a threshold) (i.e. merge iff at most the ε% of the smaller one is not included in the bigger one) 1 Usha N. Raghavan, R ́eka Albert, and Soundar Kumara. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E

14 Each node has an unique label (i.e. its id) In the first (setup) iteration each node, with probability α, change its label to one of the labels of its neighbors; At each subsequent iteration each node adopt as label the one shared (at the end of the previous iteration) by the majority of its neighbors; We iterate untill consensus is reached. Label Propagation – The idea

15 Label Propagation – Discussion Why Label Propagation? Quasi-linear algorithm Share our idea of what a community is Problem: Ping-Pong effect (the algorithm is non-deterministic) Solution Multilabel allowed (we need overlapping communities after all…)

16 DEMON - Two nice properties Incrementality: Given a graph G, an initial set of communities C and an incremental update ∆G consisting of new nodes and new edges added to G, where ∆G contains the entire ego networks of all new nodes and of all the preexisting nodes reached by new links, then Those property makes the algorithm highly parallelizable: it can run independently on different fragments of the overall network with a relatively small combination work Compositionality: Consider any partition of a graph G into two subgraphs G1, G2 such that, for any node v of G, the entire ego network of v in G is fully contained either in G1 or G2. Then, given an initial set of communities C: DEMON(G 1 ∪ G 2,C) = Max(DEMON(G 1,C), DEMON(G 2,C)) DEMON(∆G ∪ G,C) = DEMON(∆ G, DEMON(G,C))

17 Experiments  Networks (with metadata) : Congress (nodes US politicians, connected if they co-sponsor the same bills) IMDb (nodes Actors, connected if they play in the same movies) Amazon (nodes Products, connected if they were purchased together)  Compared Algorithms: Infomap, non-overlapping state-of-the-art  Rosvall and Bergstrom “Maps of random walks on complex networks reveal community structure”, PNAS, 2008 HLC, overlapping state-of-the-art  Ahn, Bagrow and Lehmann “Link communities reveal multiscale complexity in networks”, Nature, 2010

18 Quality Evaluation – Community size number of communities average community size Amazon

19 Quality Evaluation - Label Prediction Multilabel Classificator (BRL, Binary Relevance Learner) Community memberships of a node as known attributes, real world labels (qualitative attributes) target to be predicted; IMDbCongress

20 Quality Evaluation - Community Cohesion How good is our community partition in describing real world knowledge about the clustered entities? “Similar nodes share more qualitative attributes than dissimilar nodes” Iff CQ(P)>1 we are grouping together similar nodes

21 HDemon – Hierarchical merge Why Hierarchical merge? 1. Classic DEMON Merge function did not scale well Complexity issue (~O(|C| 2 )) Bottleneck for huge networks (such as social graphs) 2. We need to find the right granularity for the communities Extensions needed for Label Propagation Algorithm: Weighted networks Directed networks (not yet used)

22 HDemon – Hierarchical merge HDemon(Graph G) Cc = connectedComponent(G) C = ExtractCommunities(G) while (|C|>Cc) For c in C: N <- N ∪ make_node(c) For (n,m) in N: If (n share nodes with m): E <- E ∪ (n,m) C <- ExtractCommunities(new Graph(N,E)) ExtractCommunities(Graph G) Egos <- EgoNetworks(G) for e in Egos: C = C ∪ LabelPropagation(e) return C 1 1 1 1 1 1 1 2 1 1 1 1

23 Outline Problem Definition What is a community? Community Discovery Communities and complex (social) networks A matter of perspective DEMON Algorithm(s) Properties Experiments Extension Conclusions

24 Future works – Framework structure HFDemon(Graph G) Cc ← |connectedComponent(G)| C ← ExtractCommunities(G) while (|C|>Cc) For c in C: N ← N ∪ make_node(c) For (n,m) in N: If n share nodes with m E ← E ∪ (n,m) C ← ExtractCommunities(new Graph(N,E)) ExtractCommunities(Graph G) C ← return C Different scenarios may require requires alternative communities “definitions”. Framework for Bottom-up (and overlapping) CD Regular vs. Hierarchical FDemon(Graph G) C ← Forall c in C(v) C ← Merge(C,c,merging_function) return C

25 Future Works – Social Community Evolution Thesis proposal “Evolution in Social Networks” Idea: 1. Social networks are not static objects Nodes, Edges can appear and disappear The same interaction could occur multiple times Communities changes consequently Major Problems 1. Size and granularity of the communities influence hevily evolutive models Hierarchical merging? 2. Which are the nodes prone to leave\join a communities? Role identification 3. How “strong” is a community? Community strength measure Community life-cycle

26 Conclusions DEMON approaches the community discovery problem trough the analysis of simple network sub-structures (ego-networks) Overlapping and Hierarchical algorithms are guided by a social perspective DEMON outperforms state-of-the-art methodologies Possible parallel implementation: high scalability

27 Bibliography

28 Thanks! Questions ? Code available @ http://kdd.isti.cnr.it/~giulio/demon/http://kdd.isti.cnr.it/~giulio/demon/ (extensions coming soon!)


Download ppt "DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer."

Similar presentations


Ads by Google