Resolution Limit in Community Detection

Slides:



Advertisements
Similar presentations
CSE 211 Discrete Mathematics
Advertisements

How to Schedule a Cascade in an Arbitrary Graph F. Chierchetti, J. Kleinberg, A. Panconesi February 2012 Presented by Emrah Cem 7301 – Advances in Social.
On the Density of a Graph and its Blowup Raphael Yuster Joint work with Asaf Shapira.
Orthogonal Drawing Kees Visser. Overview  Introduction  Orthogonal representation  Flow network  Bend optimal drawing.
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
1 Maximal Independent Set. 2 Independent Set (IS): In a graph G=(V,E), |V|=n, |E|=m, any set of nodes that are not adjacent.
Modularity and community structure in networks
Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
GOLOMB RULERS AND GRACEFUL GRAPHS
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
1 Internet Networking Spring 2006 Tutorial 6 Network Cost of Minimum Spanning Tree.
Bayesian Networks Clique tree algorithm Presented by Sergey Vichik.
1 Internet Networking Spring 2004 Tutorial 6 Network Cost of Minimum Spanning Tree.
CTIS 154 Discrete Mathematics II1 8.2 Paths and Cycles Kadir A. Peker.
1 Internet Networking Spring 2002 Tutorial 6 Network Cost of Minimum Spanning Tree.
Packing Element-Disjoint Steiner Trees Mohammad R. Salavatipour Department of Computing Science University of Alberta Joint with Joseph Cheriyan Department.
K-Coloring k-coloring: A k-coloring of a graph G is a labeling f: V(G)  S, where |S|=k. The labels are colors; the vertices of one color form a color.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Introduction to Graph Theory
The Erdös-Rényi models
Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.
Finding dense components in weighted graphs Paul Horn
Lecture 22 More NPC problems
Greedy Approximation Algorithms for finding Dense Components in a Graph Paper by Moses Charikar Presentation by Paul Horn.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
1 Maximal Independent Set. 2 Independent Set (IS): In a graph G=(V,E), |V|=n, |E|=m, any set of nodes that are not adjacent.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
1 12/2/2015 MATH 224 – Discrete Mathematics Formally a graph is just a collection of unordered or ordered pairs, where for example, if {a,b} G if a, b.
Concept Switching Azadeh Shakery. Concept Switching: Problem Definition C1C2Ck …
Efficient Computing k-Coverage Paths in Multihop Wireless Sensor Networks XuFei Mao, ShaoJie Tang, and Xiang-Yang Li Dept. of Computer Science, Illinois.
Power Series Section 9.1a.
COMMUNITY DISCOVERY PART 1: A (BRIEF) INTRODUCTION Giulio Rossetti WMA - 4 May 2015.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
Data Structures and Algorithms in Parallel Computing
Graph Data Management Lab, School of Computer Science Personalized Privacy Protection in Social Networks (VLDB2011)
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Selected Topics in Data Networking Explore Social Networks:
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Distributed cooperation and coordination using the Max-Sum algorithm
The normal approximation for probability histograms.
Network Topology Single-level Diversity Coding System (DCS) An information source is encoded by a number of encoders. There are a number of decoders, each.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
CHAPTER SIX T HE P ROBABILISTIC M ETHOD M1 Zhang Cong 2011/Nov/28.
Dynamic Graph Partitioning Algorithm
Applications of graph theory in complex systems research
Minimum Spanning Tree 8/7/2018 4:26 AM
Random walks on complex networks
Greedy Algorithm for Community Detection
Maximal Independent Set
ECE 358 Examples #1 Xuemin (Sherman) Shen Office: EIT 4155
Five Number Summary and Box Plots
Graph Operations And Representation
Introduction Wireless Ad-Hoc Network
Overcoming Resolution Limits in MDL Community Detection
Degree Distributions.
3.3 Network-Centric Community Detection
Five Number Summary and Box Plots
Log-periodic oscillations due to discrete effects in complex networks
Graph Operations And Representation
Degree Distribution Ralucca Gera,
Affiliation Network Models of Clusters in Networks
Constructing a m-connected k-Dominating Set in Unit Disc Graphs
GRAPHS.
Locality In Distributed Graph Algorithms
Integer and fractional packing of graph families
Presentation transcript:

Resolution Limit in Community Detection Yilin Shen 02/18/2009

Community Structure Definition: Applications: Given a network (Graph G=(V,E)), A COMMUNITY is a subgraph of a network whose nodes are more tightly connected with each other than with nodes outside the subgraph. Applications: Social Networks Biochemical networks Internet Food webs

Modularity Optimization Definition: A quantitative measure to essentially compare the number of links inside a given module with the expected value for a randomized graph of the same size and degree sequence. Objective: Maximize the Modularity: The number of links inside a given module - The expected value for a randomized graph of the same size and degree sequence

Mathematical Equation ls : # links inside module s L : # links in the network ds : The total degree of the nodes in module s : Expected # of links in module s

Mathematical Equation (cont) Probability that a stub, randomly selected, ends in module s =

Mathematical Equation (cont) Probability that the link is internal to module s = Expected number of links in module s =

Subgraph S is a module Since The Relation: Subgraph S is a module Since

The Relation: (cont) Consider “weak” definition for a community Since , Therefore for each , holds.

The Most Modular Network A network made of m identical complete graphs (or ‘cliques’) (actually the m connected components are not necessarily cliques), disjoint from each other. which converges to 1 when the number of cliques goes to infinity.

The Most Modular Network (cont) A connected network with N nodes and L links which maximizes modularity. where

The Most Modular Network (cont) For fixed m, we easily know that Q reaches maximum when For variable m, The corresponding number of links in each module is .

The Limit of Modularity The crucial point here is that modularity seems to have some intrinsic scale of order , which constrains the number and the size of the modules. For a given total number of nodes and links we could build many more than modules, but the corresponding network would be less “modular”, namely with a value of the modularity lower than the maximum

The Resolution Limit Since M1 and M2 are constructed modules, we have

The Resolution Limit (cont) Let’s consider the following case QA : M1 and M2 are separate modules QB : M1 and M2 is a single module Since both M1 and M2 are modules by construction, we need That is,

The Resolution Limit (cont) Now let’s see how it contradicts the constructed modules M1 and M2 We consider the following two scenarios: ( ) The two modules have a perfect balance between internal and external degree (a1+b1=2, a2+b2=2), so they are on the edge between being or not being communities, in the weak sense. The two modules have the smallest possible external degree, which means that there is a single link connecting them to the rest of the network and only one link connecting each other (a1=a2=b1=b2=1/l).

Scenario 1 (cont) When and , the right side of can reach the maximum value In this case, may happen.

Scenario 2 (cont) a1=a2=b1=b2=1/l

Schematic Examples

Schematic Examples (cont) For example, p=5, m=20 The maximal modularity of the network corresponds to the partition in which the two smaller cliques are merged

The Upper Limit of Internal Links Any two interconnected modules, fuzzy or not, are merged if the number of links inside each of them does not exceed . If modularity optimization finds a module S with lS internal links, it may be that the latter is a combination of two or more smaller communities. The upper limit of lS can be much larger than , if the substructures are on average more interconnected with each other.

Experimental Results

Summary Modularity is actually not consistent with its optimization which may favor network partitions with groups of modules combined into larger communities. The resolution limit of modularity does not rely on particular network structures, but only on the comparison between the sizes of interconnected communities and that of the whole network, where the sizes are measured by the number of links. An increase of the number of modules does not necessarily correspond to an increase in modularity because the modules would be smaller and so would be each term of the sum. Quality functions are still helpful, but their role should be probably limited to the comparison of partitions with the same number of modules.