Community detection in graphs

Slides:



Advertisements
Similar presentations
Class 12: Communities Network Science: Communities Dr. Baruch Barzel.
Advertisements

Fast algorithm for detecting community structure in networks M. E. J. Newman Department of Physics and Center for the Study of Complex Systems, University.
Social network partition Presenter: Xiaofei Cao Partick Berg.
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.
Introduction to Graph Cluster Analysis. Outline Introduction to Cluster Analysis Types of Graph Cluster Analysis Algorithms for Graph Clustering  k-Spanning.
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Peer-to-Peer and Social Networks Centrality measures.
Graph Partitioning Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
 Graph Graph  Types of Graphs Types of Graphs  Data Structures to Store Graphs Data Structures to Store Graphs  Graph Definitions Graph Definitions.
V4 Matrix algorithms and graph partitioning
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Fast algorithm for detecting community structure in networks.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Community Structure in Social and Biological Network
Class 2: Graph theory and basic terminology Learning the language Network Science: Graph Theory 2012 Prof. Albert-László Barabási Dr. Baruch Barzel, Dr.
7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Network Community Behavior to Infer Human Activities.
Online Social Networks and Media Community detection 1.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Walks, Paths and Circuits. A graph is a connected graph if it is possible to travel from one vertex to any other vertex by moving along successive edges.
Graph Theory. undirected graph node: a, b, c, d, e, f edge: (a, b), (a, c), (b, c), (b, e), (c, d), (c, f), (d, e), (d, f), (e, f) subgraph.
Community detection via random walk Draft slides.
Class 2: Graph Theory IST402. Can one walk across the seven bridges and never cross the same bridge twice? Network Science: Graph Theory THE BRIDGES OF.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Class 2: Graph Theory IST402.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Selected Topics in Data Networking Explore Social Networks:
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Analysis of Massive Data Sets Prof. dr. sc. Siniša Srbljić Doc. dr. sc. Dejan Škvorc Doc. dr. sc. Ante Đerek Faculty of Electrical Engineering and Computing.
Online Social Networks and Media
Social Networks Strong and Weak Ties
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Modular organization.
Graph clustering to detect network modules
Social Media Analytics
Computational Molecular Biology
Graphs and Social Networks
Groups of vertices and Core-periphery structure
School of Computing Clemson University Fall, 2012
Minimum Spanning Tree 8/7/2018 4:26 AM
Greedy Algorithm for Community Detection
Network analysis.
Online Social Networks and Media
CS120 Graphs.
Network Science: A Short Introduction i3 Workshop
Peer-to-Peer and Social Networks
Finding modules on graphs
Graphs Representation, BFS, DFS
Michael L. Nelson CS 495/595 Old Dominion University
Why Social Graphs Are Different Communities Finding Triangles
Graph Operations And Representation
Shortest Path Trees Construction
CSE 373: Data Structures and Algorithms
CS 584 Project Write up Poster session for final Due on day of final
Weighted Graphs & Shortest Paths
Social Network Analysis - Lecture 2 - *
Warm Up – Tuesday Find the critical times for each vertex.
Special Graphs: Modeling and Algorithms
Lecture 10 Graph Algorithms
Chapter 9 Graph algorithms
Analysis of Large Graphs: Overlapping Communities
For Friday Read chapter 9, sections 2-3 No homework
More Graphs Lecture 19 CS2110 – Fall 2009.
Presentation transcript:

Community detection in graphs DATA MINING LECTURE 12 Community detection in graphs

Communities Real-life graphs are not random E.g., in a social network people pick their friends based on their common interests and activities We expect that the nodes in a graph will be organized in communities Groups of vertices which probably share common properties and/or play similar roles within the graph How do we find them? Nodes in communities will be densely connected to each other, and sparsely connected with other communities Sounds familiar?

Can we identify node groups? (communities, modules, clusters) NCAA Football network Can we identify node groups? (communities, modules, clusters) Nodes: Football Teams Edges: Games played

NCAA conferences Nodes: Football Teams Edges: Games played

Protein-Protein interaction networks Can we identify functional modules? Nodes: Proteins Edges: Physical interactions

Functional modules Nodes: Proteins Edges: Physical interactions

Stanford Facebook network Can we identify social communities? Nodes: Facebook Users Edges: Friendships

Social communities High school Stanford (Basketball) Stanford (Squash) Summer internship Stanford (Basketball) Stanford (Squash) Nodes: Facebook Users Edges: Friendships

Community types Overlapping communities vs non-overlapping communities

Non-Overlapping communities Dense connectivity within the community, sparse across communities Nodes Nodes Adjacency matrix Network

Overlapping communities

Community detection as clustering In many ways community detection is just clustering on graphs. We can apply clustering algorithms on the adjacency matrix (e.g., k-means) We can define a distance or similarity measure between nodes in the graph and apply other algorithms (e.g., hierarchical clustering) Similarity using jaccard similarity on the neighbors sets Distance using shortest paths or random walks. There are also algorithms that are specific to graphs

The Girvan-Newman method Hierarchical divisive method Start with the whole graph Find edges whose removal “partitions” the graph Repeat with each subgraph until single vertices Which edge to remove?

The Girvan-Newman method Select cut-edges (a.k.a. bridge edges): edges that when removed they disconnect the graph There may be many of those

The Girvan-Newman method Select cut-edges (a.k.a. bridge edges): edges that when removed they disconnect the graph Or, more often, there may be none

The Girvan-Newman method Select cut-edges (a.k.a. bridge edges): edges that when removed they disconnect the graph Or, more often, there may be none

Edge importance We need a measure of how important an edge is in keeping the graph connected Edge betweenness: Number of shortest paths that pass through the edge

Edge Betweeness Betweeness of edge (𝑎,𝑏) (𝐵(𝑎,𝑏)): For each pair of nodes 𝑥,𝑦 compute the number of shortest paths that include (𝑎,𝑏) There may be multiple shortest paths between 𝑥,𝑦 (𝑆𝑃(𝑥,𝑦)). Compute the fraction of those that pass through (𝑎,𝑏) Assumes a unit of traffic flow between (𝑥,𝑦) 𝐵 𝑎,𝑏 = 𝑥,𝑦∈𝑉 |𝑆𝑃 𝑥,𝑦 𝑡ℎ𝑎𝑡 𝑖𝑛𝑐𝑙𝑢𝑑𝑒 𝑎,𝑏 | |𝑆𝑃 𝑥,𝑦 | Betweenness computes the probability of an edge to occur on a randomly chosen shortest path between two randomly chosen nodes.

Examples 3x11 = 33 1x12 = 12 7x7 = 49 1 D A F E H G B C b=16 b=7.5

The Girvan Newman Algorithm Given an undirected unweighted graph: Repeat until no edges are left: Compute the edge betweeness for all edges Remove the edge with the highest betweeness At each step of the algorithm, the connected components are the communities Gives a hierarchical decomposition of the graph into communities

Girvan Newman method: An example Betweenness(7, 8)= 7x7 = 49 Betweenness(1, 3) = 1X12=12 Betweenness(3, 7) = Betweenness(6, 7) = Betweenness(8, 9) = Betweenness(8, 12)= 3X11=33

Girvan-Newman: Example 12 1 33 49 Need to re-compute betweenness at every step

Girvan Newman method: An example Betweenness(1, 3) = 1X5=5 Betweenness(3,7) = Betweenness(6,7) = Betweenness(8,9) = Betweenness(8,12) = 3X4=12

Girvan Newman method: An example Betweenness of every edge = 1

Girvan Newman method: An example

Girvan-Newman: Example Step 1: Step 2: Hierarchical network decomposition: Step 3:

Another example 5X5=25

Another example 5X6=30 5X6=30

Another example

Girvan-Newman: Results Zachary’s Karate club: Hierarchical decomposition

Girvan-Newman: Results Communities in physics collaborations

How to Compute Betweenness? Want to compute betweenness of paths starting from node 𝐴

Computing Betweenness Perform a BFS starting from A Determine the number of shortest path from A to each other node Based on these numbers, determine the amount of flow from A to all other nodes that uses each edge

Computing Betweenness: step 1 Initial network BFS from A

Computing Betweenness: step 2 Count how many shortest paths from A to a specific node Level 1 Level 2 Top-down Level 3 Level 4

Computing Betweeness: Step 3 Compute betweenness by working up the tree: For every node there is a unit of flow destined for that node that it is divided fractionally to the edges that reach that node There is a unit of flow to K that reaches K through edges (I,K) and (J,K) Since there are 3 paths from I to K and 3 from J, each edge gets ½ of the flow: Betweeness ½ Bottom-up

Computing Betweeness: Step 3 Compute betweenness by working up the tree: If the node has descendants in the BFS DAG, we also need to take into account the flow that passes from that node towards the descendants For node I, there is a unit of flow to I from A, but also ½ of flow that passes from I towards K (we have computed that as the betweeness of edge (I,K)): Total flow 3/2 There are 2 paths from F to I and 1 path from G to I edge (F,I) gets 2/3 of the total flow: Betweeness 2/3*3/2 = 1 Edge (G,I) gets 1/3 of the total flow: Betweeness 2/3*3/2 = 1 Bottom-up

Computing Betweeness Repeat the process for all nodes and take the sum

Example

Example

Computing Betweenness Issues Scalability Test for connectivity? Re-compute all paths, or only those affected Parallel computation Sampling