School of Information University of Michigan SI 614 Finding communities in networks Lecture 18.

Slides:



Advertisements
Similar presentations
Class 12: Communities Network Science: Communities Dr. Baruch Barzel.
Advertisements

Fast algorithm for detecting community structure in networks M. E. J. Newman Department of Physics and Center for the Study of Complex Systems, University.
Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.
Social network partition Presenter: Xiaofei Cao Partick Berg.
Clustering.
Network Matrix and Graph. Network Size Network size – a number of actors (nodes) in a network, usually denoted as k or n Size is critical for the structure.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Modularity and community structure in networks
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Informetric methods seminar Tutorial 2: Using Pajek for network properties Qi Yu.
Graph Partitioning Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta, Michael Mahoney Yahoo! Research.
V4 Matrix algorithms and graph partitioning
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Fast algorithm for detecting community structure in networks.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Clustering Unsupervised learning Generating “classes”
The Erdös-Rényi models
Overview Granovetter: Strength of Weak Ties What are ‘weak ties’? why are they ‘strong’? Burt: Structural Holes What are they? What do they do? How do.
School of Information University of Michigan Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Sunbelt XXIV, Portorož, Pajek Workshop Vladimir Batagelj Andrej Mrvar Wouter de Nooy.
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Pajek – Program for Large Network Analysis Vladimir Batagelj and Andrej Mrvar.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
tch?v=Y6ljFaKRTrI Fireflies.
School of Information University of Michigan Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution.
Mathematics of Networks (Cont)
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Slides are modified from Lada Adamic
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Communities. Questions 1.What is a community (intuitively)? Examples and fundamental hypothesis 2.What do we really mean by communities? Basic definitions.
Network Community Behavior to Infer Human Activities.
Connectivity1 Connectivity and Biconnectivity connected components cutvertices biconnected components.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
University at BuffaloThe State University of New York Detecting Community Structure in Networks.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
CS 590 Term Project Epidemic model on Facebook
Selected Topics in Data Networking Explore Social Networks: Center and Periphery.
CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.
Finding community structure in very large networks
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Informatics tools in network science
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Selected Topics in Data Networking Explore Social Networks:
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Graph Search Applications, Minimum Spanning Tree
Graph clustering to detect network modules
Groups of vertices and Core-periphery structure
School of Computing Clemson University Fall, 2012
CS 3343: Analysis of Algorithms
Community detection in graphs
CS 3343: Analysis of Algorithms
Network Science: A Short Introduction i3 Workshop
Peer-to-Peer and Social Networks
Finding modules on graphs
Michael L. Nelson CS 495/595 Old Dominion University
CSE 373: Data Structures and Algorithms
Presentation transcript:

School of Information University of Michigan SI 614 Finding communities in networks Lecture 18

Outline Review: identifying motifs k-cores max-flow/min-cut Hierarchical clustering Block models Community finding based on removal of high betweenness edges (slow) Clustering based on modularity, spectral methods Bridges, brokers, bi-cliques and structural holes If there’s time: Mark Newman’s spectral clustering methods (extra slides)

Motifs Given a particular structure, search for it in the network, e.g. complete triads advantage: motifs an correspond to particular functions, e.g. in biological networks disadvantage: don’t know if motif is part of a larger cohesive community

k-cores Each node within a group is connected to k other nodes in the group 3 core 4 core but even this is too stringent of a requirement for identifying natural communities 2 core 4 core

Min cut – max flow The maximum flow between vertices A and B in a graph is exactly the weight of the smallest set of edges to partition the graph in two with A and B in different components Advantage: works on directed graphs Disadvantage, need to know how to pick source and sink in two different communities or reformulate the problem Don’t know the number of partitions desired ahead of time AB

Community finding vs. other approaches Social and other networks have a natural community structure We want to discover this structure rather than impose a certain size of community or fix the number of communities Without “looking”, can we discover community structure in an automated way?

Especially where the community structure isn’t apparent or the networks are large is there community structure?

Edges: teams that played each other Football conferences

Traditional methods: hierarchical clustering Compute weights W ij for each pair of vertices choices # of node independent paths between vertices equal to the minimum number of vertices that must be removed from the graph to disconnect i and j from one another W ij = 2 # all paths between vertices (weighted by length of path,  L,  )

Hierarchical clustering Process: after calculating the weights W for all pairs of vertices start with all n vertices disconnected add edges between pairs one by one in order of decreasing weight result: nested components, where one can take a ‘slice’ at any level of the tree

An example we’ve seen already Razvasz et al: Hierarchical modularity W ij = topological overlap Wij = J n (i,j)/[min(k i,k j ) where J n (i,j) = # of nodes that both i and j link to (+1 for linking to each other) k i is the degree of node i Topological overlap -> regular equivalence (more on this and block modeling in a bit)

Hierarchical clustering in Pajek Procedure generate a complete cluster using Cluster->Create Complete Cluster compute the dissimilarity matrix run Operations->Dissimilarity select “d1/All” to consider network as a binary matrix select “Corrected Euclidean” or “Corrected Manhattan” distance for valued networks the above will use the dissimilarity matrix to hierarchically cluster nodes and output a dissimilarity matrix EPS picture of the dendrogram permutation of vertices according to the dendrogram hierarchy representing hierarchical clustering to visualize: Edit->Show Subtree Select nodes (Edit->Change Type or Ctrl+T) transform the hierarchy into a partition (Hierarchy->Make Partition)

Blockmodeling Identify clusters of nodes that share structural characteristics Partition nodes and their relations into blocks Goal: reduce a large network to a smaller number of comprehensible units Disadvantage – need to know number of classes (which may correspond to core & periphery, age, gender, ethnicity, etc…)

Example of core-periphery structure metal trade by country

Equivalence Structural equivalence: equivalent nodes have the same connection pattern to the same neighbors blocks are completely full or empty Regular equivalence: equivalent nodes have the same or similar connection patterns to (possibly different neighbors) e.g. teachers at different universities fulfill the same role ideal core- periphery structure imperfect core- periphery structure

Hierarchical clustering: issues using path counts as weights tends to separate out peripheral nodes whose path counts are always low but leaf nodes should belong to the community of their neighbor

Example: Zachary Karate Club

Example: Zachary karate club data Cores of communities (vertices 1, 2 & 3) and (33 & 34) are correctly identified, but the divisive structure is not captured Zachary karate club data hierarchical clustering tree using edge-independent path counts

Girvan & Newman: betweenness clustering Algorithm compute the betweenness of all edges while (betweenness of any edge < threshold): remove edge with lowest betweenness recalculate betweenness Betweenness needs to be recalculated at each step removal of an edge can impact the betweenness of another edge very expensive: all pairs shortest path – O(N 3 ) may need to repeat up to N times does not scale to more than a few hundred nodes, even with the fastest algorithms

illustration of the algorithm

+ deletion of the edge 2-3 separation complete

betweenness clustering algorithm & the karate club data set

betweenness clustering and the karate club data 8 clusters 12 clusters better partitioning, but also create some isolates

as Spectroscopy: Automated Discovery of Community Structure within Organizations Joshua R. Tyler, Dennis M. Wilkinson, Bernardo A. Huberman Communities and technologies (2003) Modifications of Girvan-Newman betweenness clustering algorithm stopping criterion: stop removing edges before disconnecting a leaf node smallest graph w/ 2 viable communities cut is not made randomness is introduced by calculating shortest paths from only a subset of nodes and running the entire algorithm several times nodes that border several communities fall in different communities on different runs distinguishes between brokers and single-community nodes

inter-community nodes Example of network structure, where one node B, could arguably belong to either community With “noisy” algorithm, can keep track of % of time B ends up in A’s community or C’s community

spectroscopy: results data: HP labs network (~ 400 nodes, 3 months, mass mailings removed, 30 message threshold) giant component of 434 nodes 66 communities, 49 correspond exactly to organizational units other 17 contain individuals from 2 or more organizational units within the company Field interviews confirmed accuracy of algorithm: individuals identified their communities, divisions in formal groups, and overlaps in interest on joint projects

Finding community structure in very large networks Authors: Aaron Clauset, M. E. J. Newman, Cristopher Moore 2004Aaron ClausetM. E. J. NewmanCristopher Moore Consider edges that fall within a community or between a community and the rest of the network Define modularity: probability of an edge between two vertices is proportional to their degrees if vertices are in the same community adjacency matrix For a random network, Q = 0 the number of edges within a community is no different from what you would expect

Finding community structure in very large networks Authors: Aaron Clauset, M. E. J. Newman, Cristopher Moore 2004Aaron ClausetM. E. J. NewmanCristopher Moore Algorithm start with all vertices as isolates follow a greedy strategy: successively join clusters with the greatest increase  Q in modularity stop when the maximum possible  Q <= 0 from joining any two successfully used to find community structure in a graph with > 400,000 nodes with > 2 million edges Amazon’s people who bought this also bought that… alternatives to achieving optimum  Q: simulated annealing rather than greedy search

Extensions to weighted networks Betweenness clustering? Will not work – strong ties will have a disproportionate number of short paths, and those are the ones we want to keep Modularity (Analysis of weighted networks, M. E. J. Newman) reuters new articles keywords weighted edge

Extensions to weighted networks Voltage clustering A physics approach to finding communities in linear time Fang Wu and Bernardo Huberman apply voltages to different parts of the network largest voltage drops occur between communities related to spectral partitioning

Reminder of how modularity can help us visualize large networks

Bridges Bridge – an edge, that when removed, splits off a community Bridges can act as bottlenecks for information flow bridges younger & Spanish speaking network of striking employees younger & English speaking older & English speaking union negotiators

Cut-vertices and bi-components Removing a cut-vertex creates a separate component bi-component: component of minimum size 3 that does contain a cut-vertex (vertex that would split the component) bi-component cut-vertex Pajek: Net>Components>Bi-Components (treats the network as undirected) see chapter 7 identifies vertices belonging to exactly one component and isolates identifies # of bridges or bi-components to which a vertex belongs identifies bridges (components of size 2)

Ego-networks and constraint ego-network: a vertex, all its neighbors, and connections among the neighbors Alejandro’s ego-centered network Alejandro is a broker between contacts who are not directly connected Constraint: # of complete triads involving two people Low-constraint – many structural holes that may be exploited High-constraint – removing a tie to any one of the vertices means that others will act as brokers for that contact

Proportional strength of ties Strength of tie ~ 1/(# connections for the person) asymmetrical dyadic constraint: measure of strength of direct and indirect ties to a person

Structural holes with Pajek Net>Vector>Structural Holes computes the dyadic constraint for all edges and for the network in aggregate To visualize Options>Values of Lines>Similarities (in the Draw screen) Use an energy layout – high dyadic constraint vertices will be closer together

Brokerage roles in and between groups

Available tools: Pajek: hierarchical clustering, bi-components, and block models Guess: weak component clustering (need to threshold first) and betweenness clustering (slow) Jung: betweenness, voltage, blockmodels, bi- components Mark Newman’s homepage – fast clustering for very large graphs using modularity

An aside spectroscopy: network centrality corresponds to position in the organizational hierarchy