Computational Molecular Biology

Computational Molecular Biology
Community Structures

What is Community Structure
Definition: A community is a group of nodes in which: There are more edges (interactions) between nodes within the group than to nodes outside of it My T. Thai

Why Community Structure (CS)?
Many systems can be expressed by a network, in which nodes represent the objects and edges represent the relations between them: Social networks: collaboration, online social networks Technological networks: IP address networks, WWW, software dependency Biological networks: protein interaction networks, metabolic networks, gene regulatory networks My T. Thai

Why CS? Yeast Protein interaction networks My T. Thai

Why CS? IP address network My T. Thai

Why Community Structure?
Nodes in a community have some common properties Communities represent some properties of a networks Examples: In social networks, represent social groupings based on interest or background In citation networks, represent related papers on one topic In metabolic networks, represent cycles and other functional groupings My T. Thai

How to detect a community?
My T. Thai

Early Work Using hierarchical clustering Overview of this method:
For each pair (u,v), calculate weight wuv which represents how closely connected u and v are Initialize G = (V, emptyset) At each iteration, add an edge with the strongest weight My T. Thai

Early Work My T. Thai

Early Work How to define the weight wuv
Many different methods have been proposed: Number disjoint paths between u and v Number of possible paths between u and v Disadvantages: Tendency to separate the boundary vertices from the communities (to which they should belong) My T. Thai

An Overview of Recent Work
Disjoint CS Overlapping CS Centralized Approach Define the quantity of modularity and use the greedy algorithms, IP, SDP Spectral clustering Random Walk, Clique Percolation Localized Approach My T. Thai

Edge Betweeness Focus on the edges which are least central, i.e.,, the edges which are most “between” communities Instead of adding edge to G = (V, emptyset), progressively removing edges from an original graph G = (V,E) My T. Thai

Edge Betweeness Definition:
For each edge (u,v), the edge betweeness of (u,v) is defined as the number of shortest paths between any pair of nodes in a network that run through (u,v) betweeness(u,v) = | { Pxy | x, y in V, Pxy is a shortest path between x and y, and (u,v) in Pxy}| My T. Thai

Why Edge Betweeness My T. Thai

Algorithm Initialize G = (V,E) representing a network
while E is not empty Calculate the betweeness of all edges in G Remove the edge e with the highest betweeness, G = (V, E – e) Indeed, we just need to recalculate the betweeness of all edges affected by the removal My T. Thai

Time Complexity Let |V| = n and |E| = m
Calculate the betweeness of all edges: O(mn) Since we need to recalculate each time we remove an edge: O(m2n) My T. Thai

An Example My T. Thai

Disadvantages/Improvements
Can we improve the time complexity? The communities are in the hierarchical form, can we find the disjoint communities? My T. Thai

Define the quantity (measurement) of modularity Q and find an approximation algorithm to maximize Q
My T. Thai

How to define Q Let A be an adjacency matrix of the network
Fraction of edges that fall within communities where My T. Thai

How to define Q What is the problem? If we try to maximize the above equation, then we may put all nodes into one single community How to fix it? My T. Thai

How to define Q? Let kv be the degree of node v
The term kv kw /2m represents the probability of an edge existing between vertices v and w if connections are made random but respecting vertex degrees My T. Thai

Greedy Algorithm Initially, we have n communities (each node is a community) At each step, join two communities whose the hierarchical tree has the largest increase in Q Stop when we left with a single community (run n -1 steps) My T. Thai

Disadvantages It is still a hierarchical approach
Cannot escape a suboptimal maximum How to avoid the suboptimal maximum? My T. Thai

Local Communities My T. Thai

Overview Find communities based on local information (not information of entire network) Two ways: Detect the communities Define local modularity, then greedily optimize this function My T. Thai

What We Have Learnt Can we use this Q?
How can we “twist” it to make it work? My T. Thai

Consider this Figure Suppose we have perfect knowledge of some subgraphs C Then we should know some neighbors on C, lie in U Visit some neighbors of nodes in U may extend the knowledge of C Now, can we re-use Q (defined before) on C? My T. Thai

Some Definitions Again, define an adjacent matrix A (wrt C) as follows: Consider this quantity: where (# of edges in the partial adj matrix) My T. Thai

Relationship between B, C, U
Consider nodes in B If C is a sharp community, then Nodes in B have more connections to nodes in C Nodes in B have less (a few) connections to nodes in U My T. Thai

Definitions Define a boundary-adjacent matrix B as follows:
Define local community R: where δ(i,j) = 1 iff vi in B and vj in C or vice versa. Otherwise, δ(i,j) = 0. T: #edges with one ore more endpoints in B I: #edges in B such that none of their endpoints in U My T. Thai

Properties of R 0 < R < 1
Directly proportional to the sharpness of the boundary given by B When R is undefined? My T. Thai

A Greedy Algorithm My T. Thai

Overview of Second Method
Start at a vertex, check the degree of each vertex with respect to each one-hop neighbors, two-hop neighbors, …, l-hops neighbors Why? If the community is highly connected, the l-hops neighbors tend to revisit the nodes At the boundary, the number of newly added edges decreases My T. Thai

Some Definitions kie(j): Emerging degree of a vertex i which is l-hops away vertex j is defined as the number of edges (u,i) where u is not within l-hops away from j Kjl: Total emerging degree of all nodes that exactly l-hops away from j where Sjl is the set of all vertices exactly l-hops away from j Initially, Kj0= degree of node j = kj My T. Thai

Some Definitions The change in total emerging degree My T. Thai

Algorithm Randomly choose a starting vertex j
Initially, l = 0, add j to C (C is a community), and K0j = kj l = l++; add all l-hop neighbors of j to C Compute ΔKjl. If ΔKjl < α, then return C. Otherwise, repeat step 3 My T. Thai

Any Problem? How to define α? Do we need to define α? If not, what should we change? What if the starting vertex is the “bridge one”? What can we do? My T. Thai

Impact of α α = 0, never stop until explore the entire connected subgraph α is large, stop sooner (l is small), resulting in many small communities α is too large, return n singleton communities (α > kmax where kmax is the largest degree) My T. Thai

A Small Example Actual CS of the Karate Club Obtained by the Alg
My T. Thai

Dynamic Communities My T. Thai

Dynamic Networks Event decomposition A dynamic network
A collection of network snapshots at many time points. Changes are frequently introduced Insertions / Removals of nodes Insertions / Removals of edges t = 0 t = 1 t = 2 t = 3 Event decomposition Insertions / Removals of nodes = {Insert/Remove a node} Insertions / Removals of edges = {Insert/Remove an edge}

Recall… Modularity function However, max Q is NP-hard

Adaptive Solutions An adaptive method: A basic community structure
To maximize the gained modularity with low computational complexity Locally compute a new structure based on local information after each change of the network A basic community structure Only for the first snapshot Adaptively update the network communities based on this basic structure Method: Blondel et al (2008)

QCA: An Adaptive Method
Input network Blondel’s method : Network changes Need to handle Node insertion Edge insertion Node removal Edge removal Basic communities Updated communities

Membership determination
A node actively determines its Membership u FinS(u) S FoutC(u) C

Introducing a new node C1 C2 C3 Possibilities No new edges
New edges linking with one community New edges linking multiple communities u C2 C1 C3

Handling node insertion
u Join u to the community C with the highest FoutC(u) C1 C2 FoutC1(u) FoutC2(u) C3 FoutC3(u)

Introducing a new edge Possibilities
A new edge is inside a single community A new edge is joining two communities u v u v

Handling edge insertion
Keep the current community structure intact a b Find qu,C,D and qv,D,C Join a to C or D according to qu,C,D and qv,C,D If a (or b) changes its membership Check all a’s neighbors a b

Time complexity Inserting a new node Inserting a new edge
Visit all neighbors of u at most once O(du) Inserting a new edge Computing qu,C(u),C(v) and in constant time O(1)

Removing an edge Resulting community is either: Remains unchanged
Breaks up into smaller communities If it contains substructures that are less attractive to the others u v

Handling edge removal Strategy Find maximal ‘quasi’-cliques
Let the other singletons determine their best communities

Removing a node All edges connected to u will be removed
Resulting community either Remains unchanged Breaks up into smaller spices and merged to others

Handling node removal Strategy 3-Clique percolation
Let the left over nodes determine their best communities

Experimental Results Test our algorithms on real-world data traces
Enron , ArXiv citation and Facebook networks In comparison with the Blondel’s method at each snapshot Metrics Modularity Number of communities Normalized Mutual Information (NMI) Running time

ArXiv e-print Citation network
Modularity # Communities NMI Running Time

Facebook Modularity # Communities NMI Running Time

Computational Molecular Biology

Similar presentations

Presentation on theme: "Computational Molecular Biology"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computational Molecular Biology

Similar presentations

Presentation on theme: "Computational Molecular Biology"— Presentation transcript:

Similar presentations

About project

Feedback