Download presentation
Presentation is loading. Please wait.
1
Computational Molecular Biology
Community Structures
2
What is Community Structure
Definition: A community is a group of nodes in which: There are more edges (interactions) between nodes within the group than to nodes outside of it My T. Thai
3
Why Community Structure (CS)?
Many systems can be expressed by a network, in which nodes represent the objects and edges represent the relations between them: Social networks: collaboration, online social networks Technological networks: IP address networks, WWW, software dependency Biological networks: protein interaction networks, metabolic networks, gene regulatory networks My T. Thai
4
Why CS? Yeast Protein interaction networks My T. Thai
5
Why CS? IP address network My T. Thai
6
Why Community Structure?
Nodes in a community have some common properties Communities represent some properties of a networks Examples: In social networks, represent social groupings based on interest or background In citation networks, represent related papers on one topic In metabolic networks, represent cycles and other functional groupings My T. Thai
7
How to detect a community?
My T. Thai
8
Early Work Using hierarchical clustering Overview of this method:
For each pair (u,v), calculate weight wuv which represents how closely connected u and v are Initialize G = (V, emptyset) At each iteration, add an edge with the strongest weight My T. Thai
9
Early Work My T. Thai
10
Early Work How to define the weight wuv
Many different methods have been proposed: Number disjoint paths between u and v Number of possible paths between u and v Disadvantages: Tendency to separate the boundary vertices from the communities (to which they should belong) My T. Thai
11
An Overview of Recent Work
Disjoint CS Overlapping CS Centralized Approach Define the quantity of modularity and use the greedy algorithms, IP, SDP Spectral clustering Random Walk, Clique Percolation Localized Approach My T. Thai
12
Edge Betweeness Focus on the edges which are least central, i.e.,, the edges which are most “between” communities Instead of adding edge to G = (V, emptyset), progressively removing edges from an original graph G = (V,E) My T. Thai
13
Edge Betweeness Definition:
For each edge (u,v), the edge betweeness of (u,v) is defined as the number of shortest paths between any pair of nodes in a network that run through (u,v) betweeness(u,v) = | { Pxy | x, y in V, Pxy is a shortest path between x and y, and (u,v) in Pxy}| My T. Thai
14
Why Edge Betweeness My T. Thai
15
Algorithm Initialize G = (V,E) representing a network
while E is not empty Calculate the betweeness of all edges in G Remove the edge e with the highest betweeness, G = (V, E – e) Indeed, we just need to recalculate the betweeness of all edges affected by the removal My T. Thai
16
Time Complexity Let |V| = n and |E| = m
Calculate the betweeness of all edges: O(mn) Since we need to recalculate each time we remove an edge: O(m2n) My T. Thai
17
An Example My T. Thai
18
Disadvantages/Improvements
Can we improve the time complexity? The communities are in the hierarchical form, can we find the disjoint communities? My T. Thai
19
Define the quantity (measurement) of modularity Q and find an approximation algorithm to maximize Q
My T. Thai
20
How to define Q Let A be an adjacency matrix of the network
Fraction of edges that fall within communities where My T. Thai
21
How to define Q What is the problem? If we try to maximize the above equation, then we may put all nodes into one single community How to fix it? My T. Thai
22
How to define Q? Let kv be the degree of node v
The term kv kw /2m represents the probability of an edge existing between vertices v and w if connections are made random but respecting vertex degrees My T. Thai
23
Greedy Algorithm Initially, we have n communities (each node is a community) At each step, join two communities whose the hierarchical tree has the largest increase in Q Stop when we left with a single community (run n -1 steps) My T. Thai
24
Disadvantages It is still a hierarchical approach
Cannot escape a suboptimal maximum How to avoid the suboptimal maximum? My T. Thai
25
Local Communities My T. Thai
26
Overview Find communities based on local information (not information of entire network) Two ways: Detect the communities Define local modularity, then greedily optimize this function My T. Thai
27
What We Have Learnt Can we use this Q?
How can we “twist” it to make it work? My T. Thai
28
Consider this Figure Suppose we have perfect knowledge of some subgraphs C Then we should know some neighbors on C, lie in U Visit some neighbors of nodes in U may extend the knowledge of C Now, can we re-use Q (defined before) on C? My T. Thai
29
Some Definitions Again, define an adjacent matrix A (wrt C) as follows: Consider this quantity: where (# of edges in the partial adj matrix) My T. Thai
30
Relationship between B, C, U
Consider nodes in B If C is a sharp community, then Nodes in B have more connections to nodes in C Nodes in B have less (a few) connections to nodes in U My T. Thai
31
Definitions Define a boundary-adjacent matrix B as follows:
Define local community R: where δ(i,j) = 1 iff vi in B and vj in C or vice versa. Otherwise, δ(i,j) = 0. T: #edges with one ore more endpoints in B I: #edges in B such that none of their endpoints in U My T. Thai
32
Properties of R 0 < R < 1
Directly proportional to the sharpness of the boundary given by B When R is undefined? My T. Thai
33
A Greedy Algorithm My T. Thai
34
Overview of Second Method
Start at a vertex, check the degree of each vertex with respect to each one-hop neighbors, two-hop neighbors, …, l-hops neighbors Why? If the community is highly connected, the l-hops neighbors tend to revisit the nodes At the boundary, the number of newly added edges decreases My T. Thai
35
Some Definitions kie(j): Emerging degree of a vertex i which is l-hops away vertex j is defined as the number of edges (u,i) where u is not within l-hops away from j Kjl: Total emerging degree of all nodes that exactly l-hops away from j where Sjl is the set of all vertices exactly l-hops away from j Initially, Kj0= degree of node j = kj My T. Thai
36
Some Definitions The change in total emerging degree My T. Thai
37
Algorithm Randomly choose a starting vertex j
Initially, l = 0, add j to C (C is a community), and K0j = kj l = l++; add all l-hop neighbors of j to C Compute ΔKjl. If ΔKjl < α, then return C. Otherwise, repeat step 3 My T. Thai
38
Any Problem? How to define α? Do we need to define α? If not, what should we change? What if the starting vertex is the “bridge one”? What can we do? My T. Thai
39
Impact of α α = 0, never stop until explore the entire connected subgraph α is large, stop sooner (l is small), resulting in many small communities α is too large, return n singleton communities (α > kmax where kmax is the largest degree) My T. Thai
40
A Small Example Actual CS of the Karate Club Obtained by the Alg
My T. Thai
41
Dynamic Communities My T. Thai
42
Dynamic Networks Event decomposition A dynamic network
A collection of network snapshots at many time points. Changes are frequently introduced Insertions / Removals of nodes Insertions / Removals of edges t = 0 t = 1 t = 2 t = 3 Event decomposition Insertions / Removals of nodes = {Insert/Remove a node} Insertions / Removals of edges = {Insert/Remove an edge}
43
Recall… Modularity function However, max Q is NP-hard
44
Adaptive Solutions An adaptive method: A basic community structure
To maximize the gained modularity with low computational complexity Locally compute a new structure based on local information after each change of the network A basic community structure Only for the first snapshot Adaptively update the network communities based on this basic structure Method: Blondel et al (2008)
45
QCA: An Adaptive Method
Input network Blondel’s method : Network changes Need to handle Node insertion Edge insertion Node removal Edge removal Basic communities Updated communities
46
Membership determination
A node actively determines its Membership u FinS(u) S FoutC(u) C
47
Introducing a new node C1 C2 C3 Possibilities No new edges
New edges linking with one community New edges linking multiple communities u C2 C1 C3
48
Handling node insertion
u Join u to the community C with the highest FoutC(u) C1 C2 FoutC1(u) FoutC2(u) C3 FoutC3(u)
49
Introducing a new edge Possibilities
A new edge is inside a single community A new edge is joining two communities u v u v
50
Handling edge insertion
Keep the current community structure intact a b Find qu,C,D and qv,D,C Join a to C or D according to qu,C,D and qv,C,D If a (or b) changes its membership Check all a’s neighbors a b
51
Time complexity Inserting a new node Inserting a new edge
Visit all neighbors of u at most once O(du) Inserting a new edge Computing qu,C(u),C(v) and in constant time O(1)
52
Removing an edge Resulting community is either: Remains unchanged
Breaks up into smaller communities If it contains substructures that are less attractive to the others u v
53
Handling edge removal Strategy Find maximal ‘quasi’-cliques
Let the other singletons determine their best communities
54
Removing a node All edges connected to u will be removed
Resulting community either Remains unchanged Breaks up into smaller spices and merged to others
55
Handling node removal Strategy 3-Clique percolation
Let the left over nodes determine their best communities
56
Experimental Results Test our algorithms on real-world data traces
Enron , ArXiv citation and Facebook networks In comparison with the Blondel’s method at each snapshot Metrics Modularity Number of communities Normalized Mutual Information (NMI) Running time
57
ArXiv e-print Citation network
Modularity # Communities NMI Running Time
58
Facebook Modularity # Communities NMI Running Time
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.