Overview of Communities in networks

Slides:



Advertisements
Similar presentations
Class 12: Communities Network Science: Communities Dr. Baruch Barzel.
Advertisements

Community Detection and Graph-based Clustering
ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.
Analysis and Modeling of Social Networks Foudalis Ilias.
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research.
Relationship Mining Network Analysis Week 5 Video 5.
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta, Michael Mahoney Yahoo! Research.
Spectrum Based RLA Detection Spectral property : the eigenvector entries for the attacking nodes,, has the normal distribution with mean and variance bounded.
CONNECTIVITY “The connectivity of a network may be defined as the degree of completeness of the links between nodes” (Robinson and Bamford, 1978).
Mining and Searching Massive Graphs (Networks)
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Fast algorithm for detecting community structure in networks.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
The Shortest Path Problem
Models of Influence in Online Social Networks
Spectral coordinate of node u is its location in the k -dimensional spectral space: Spectral coordinates: The i ’th component of the spectral coordinate.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Eric Horvitz, Michael Mahoney,
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Communities. Questions 1.What is a community (intuitively)? Examples and fundamental hypothesis 2.What do we really mean by communities? Basic definitions.
Network Community Behavior to Infer Human Activities.
Jure Leskovec Kevin J. Lang Anirban Dasgupta Michael W. Mahoney WWW’ 2008 Statistical Properties of Community Structure in Large Social and Information.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Informatics tools in network science
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Analysis of Massive Data Sets Prof. dr. sc. Siniša Srbljić Doc. dr. sc. Dejan Škvorc Doc. dr. sc. Ante Đerek Faculty of Electrical Engineering and Computing.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Department of Computer and IT Engineering University of Kurdistan Social Network Analysis Communities By: Dr. Alireza Abdollahpouri.
Ke (Kevin) Wu1,2, Philip Watters1, Malik Magdon-Ismail1
Graph clustering to detect network modules
Social Media Analytics
Spectral Methods for Dimensionality
Cohesive Subgraph Computation over Large Graphs
Finding Dense and Connected Subgraphs in Dual Networks
Groups of vertices and Core-periphery structure
MEIKE: Influence-based Communities in Networks
Data Mining K-means Algorithm
Modularity Maximization
Department of Computer and IT Engineering University of Kurdistan
Greedy Algorithm for Community Detection
IDENTIFICATION OF DENSE SUBGRAPHS FROM MASSIVE SPARSE GRAPHS
Graph Analysis by Persistent Homology
Generative models preserving community structure
Community detection in graphs
Finding Communities by Clustering a Graph into Overlapping Subgraphs
Groups of vertices and Core-periphery structure
Centralities (2) Ralucca Gera,
Distributed Representations of Subgraphs
Peer-to-Peer and Social Networks
Apache Spark & Complex Network
Community Detection: Overlapping Communities
Discovering Functional Communities in Social Media
Centralities (4) Ralucca Gera,
Generative models preserving community structure
Statistical properties of network community structure
3.3 Network-Centric Community Detection
Practical Applications Using igraph in R Roger Stanton
Centralities Using Gephi and Python Prof. Ralucca Gera,
Degree Distribution Ralucca Gera,
Affiliation Network Models of Clusters in Networks
Analyzing Massive Graphs - ParT I
Presentation transcript:

Overview of Communities in networks Ralucca Gera, Naval Postgraduate School Monterey, California rgera@nps.edu

What is a community? A community ~ a group of people with common characteristic or shared interests What do they correspond to? Why do they form?

What is a community? A community in a network is a subset of nodes that share common or similar characteristics, based on which they are grouped. In a social network it might indicate a circle of friends, In the World Wide Web it might indicate a group of pages on closely related topics, In a network of emails it may indicate groups of emails that have similar patterns or domain or belong to individuals that correspond on a regular basis. Community detection: partitioning the nodes into communities

What might influence a community? Homophily: similar nodes cluster together, for example based on Language or maybe based on degree (for degree homophily) __________________________________________________________________________ Virality Prediction and Community Structure in Social Networks Yong-Yeol “YY” Ahn

Community Detection in Network Science Communities are features that naturally appear in real networks, and they are generally captured through the structural properties of the network: nodes tend to cluster based on common intrerests. The amount of research since 2002 in this area is massive, Based on its usefulness, community detection became one of the most prominent directions of research in network science. It is one of the common analysis tools in understanding networks

Overview Fundamental concepts for clustering Based on density and topological structures Overview

Adjacency matrices of different types of networks General way of viewing an adjacency matrix for large networks: Dark = 1 (or weights) Gray = 0 Rarely found in real networks Commonly found in real networks Nodes of two types Commonly found in real networks Figure: (a) good spectral clustering (b) core-periphery structure (c) unstructured, (d) either way Ref: “Think locally, act locally: Detection of small, medium-sized, and large communities in large networks” by Jeub et al, 2015

Adjacency matrices (some overlapping communities) From Jure Leskovec: https://www.youtube.com/watch?v=htWQWN1xAZQ

Reality: Maybe dense overlapping communities (2 or 3 comms) From Jure Leskovec: https://www.youtube.com/watch?v=htWQWN1xAZQ

General methodology (1) General methodology from Leskovec’s paper (Stanford): Data is modeled by an “interaction graph” (2) Hypothesis: networks have communities that interact strongly amongst themselves than with the outside world (more “internal edges” to each community, than “cut edges” connecting the comm. to the rest of the world). (3) A objective function or metric is chosen to formalize this idea of groups with more intra-group than intergroup connectivity.

General methodology (2) (4) An algorithm is then selected to find communities that optimize the function. (5) The communities are then evaluated in some way. For example, one may map the sets of nodes back to the real world to see whether they appear to make intuitive sense as a plausible social community. Alternatively, one may have labeled data (or ground truth) to compared with it. How can one identify communities?

Clustering methodologies Nonoverlapping Overlapping Louvain Method Girvan-Newman algorithm Minimum-cut method Modularity maximization Clique Percolation

Non-overlapping communities (node partitioning into communities)

Modularity Define modularity as: Q = # edges within communities - expected # edge of a null model network (same size), Where “expected” come from a “null model” to compare our network against: networks with the same n and m, where edges are placed at random (like ER, Config.) Modularity is a scale value between -1 and 1 that measures the density of edges inside communities to edges outside communities Larger values of Q indicating stronger community structure. Goal: assign nodes to community to maximize Q

Louvain method (partition the nodes) Goal: optimize modularity  theoretically results in the best possible grouping of the nodes of a given network (it depends on the function of the network, the reason behind clustering) The Louvain Method of community detection: find small communities by optimizing modularity locally on all nodes, then each small community is grouped into one node then the first step is repeated Visualization: https://www.youtube.com/watch?v=dGa-TXpoPz8

Louvain method (2) Simple, efficient and easy-to-implement (implemented in NetworkX, Matlab, C++, and Gephi) For community detection in large networks For sizes up to 100 million nodes and billions of links. The analysis of a typical network of 2 million nodes takes 2 minutes on a standard PC. The method unveils hierarchies of communities and allows to zoom within communities to discover sub-communities, sub-sub-communities, etc. It is today one of the most widely used method for detecting communities in large networks.

Girvan Newman’s method (partition the nodes) The Girvan–Newman algorithm detects communities by progressively removing edges (with high betweeness centrality) from the original network. These edges are believed connect communities Algorithm stops when there are no edges between the identified communities. Implemented in R and python

Girvan Newman’s method (2)

Overlapping communities

Cliques Clique: a maximum complete subgraph in which all nodes are adjacent to each other NP-hard to find the maximum clique in a network Straightforward implementation to find cliques is very expensive in time complexity Nodes 5, 6, 7 and 8 form a clique 20

Clique Percolation Method (CPM) Normally use cliques as a core or a seed to find larger communities Clique Percolation Method to find overlapping communities (diagram on next page) Input A parameter k, and a network Procedure Find out all cliques of size k in a given network Construct a clique graph: two cliques are adjacent if they share k-1 nodes The nodes depicted in the labels of each connected components in the clique graph form a community 21

CPM Example Parameter = 3 Clique graph Communities: {1, 2, 3, 4} Cliques of size 3: {1, 2, 3}, {1, 3, 4}, {4, 5, 6}, {5, 6, 7}, {5, 6, 8}, {5, 7, 8}, {6, 7, 8} Clique graph Communities: {1, 2, 3, 4} {4, 5, 6, 7, 8} 22

Community Detection evaluation

Community detection evaluation Map the sets of nodes back to the real world to see whether they appear to make intuitive sense as a plausible social community. Acquire some form of ground truth, in which case the set of nodes output by the algorithm may be compared with it (compare it using Normalized Mutual Index). Modularity and Conductance are the popular theoretical metric to evaluate the quality of the communities: Network Community Profile: identifies the best community among all the communities of the same size Create an application and use the derived community structure

Network Community Profile The network community profile, introduced in Ref. [1]. Given a community “quality” score—i.e., a formalization of the idea of a “good” community NCP plots the score of the best community of a given size as a function of community size Conductance = min{ 𝑠 𝑒 , where s = the number of edges between the community and its complement, e is the sum of the degrees in S} “Think locally, act locally: Detection of small, medium-sized, and large communities in large networks” by Jeub et al, 2015

Network Community Profile

Generative models preserving community structure

ReCoN: Christian L. Staudt, Aleksejs Sazonovs, Henning Meyerhenke: NetworKit: A Tool Suite for Large-scale Complex Network Analysis. Network Science, to appear 2016. https://networkit.iti.kit.edu/

ReCoN Algorithm Example https://networkit.iti.kit.edu/

ReCoN Algorithm Example https://networkit.iti.kit.edu/

ReCoN Algorithm Example https://networkit.iti.kit.edu/

ReCoN Algorithm Example https://networkit.iti.kit.edu/

ReCoN Algorithm Example https://networkit.iti.kit.edu/

ReCoN Algorithm Example https://networkit.iti.kit.edu/

ReCoN Algorithm Example https://networkit.iti.kit.edu/

Main references Some text and pictures in this presentation were taken from: [1] “Statistical Properties of Community Structure in Large Social and Information Networks” by Jure Leskovec∗ Kevin J. Lang† Anirban Dasgupta† Michael W. Mahoney [2] Conversations and PPT from Mason Porter, Oxford. [3] https://networkit.iti.kit.edu/

Main references [1] Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J.P., Moreno, Y. and Porter, M.A., 2014. Multilayer networks. Journal of complex networks, 2(3), pp.203-271. [2] Lucas G. S. Jeub, Prakash Balachandran, Mason A. Porter, Peter J. Mucha, and Michael W. Mahoney, “Think locally, act locally: Detection of small, medium-sized, and large communities in large networks” PHYSICAL REVIEW E 91, 012821 (2015) [3] J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney, Internet Math. 6, 29 (2009). [4] M. E. Newman “Finding community structure in networks using the eigenvectors of matrices” PHYSICAL REVIEW E 74, 036104 (2006) [5] Aggarwal, Charu C., and Haixun Wang. "Graph data management and mining: A survey of algorithms and applications." Managing and Mining Graph Data. Springer US, 2010. 13-68.

Surveys Malliaros, Fragkiskos D., and Michalis Vazirgiannis. "Clustering and community detection in directed networks: A survey." Physics Reports 533.4 (2013): 95-142. Social Media: http://link.springer.com/article/10.1007/s10618-011-0224-z#page-1 Graph mining and management (clustering networks):Aggarwal, Charu C., and Haixun Wang. "Graph data management and mining: A survey of algorithms and applications." Managing and Mining Graph Data. Springer US, 2010. 13-68. Encyclopedia of Distances

General reference papers Porter, Mason A., Jukka-Pekka Onnela, and Peter J. Mucha. "Communities in networks." Notices of the AMS 56.9 (2009): 1082-1097. Vishwanathan, S. Vichy N., et al. "Graph Kernels" The Journal of Machine Learning Research 11 (2010): 1201-1242. Fast computing random walk kernels: Borgwardt, Karsten M., Nicol N. Schraudolph, and S. V. N. Vishwanathan. "Fast computation of graph kernels." Advances in neural information processing systems. 2006. An alternative to kernels using graphlets: Shervashidze, Nino, et al. "Efficient graphlet kernels for large graph comparison." International conference on artificial intelligence and statistics. 2009. Karsten M. Borgwardt and Hans-Peter Kriege Shortest path kernels, IEEE International Conference on Data Mining (ICDM’05) 2005 Robustness in Modular structure Relative centrality and local community