The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed Ibrahim (King’s College, London, UK) Alioune Ngom (University of Windsor, Windsor, Canada)
Protein Complexes and Functional Modules 2 Protein complex: Proteins interacting with each other at the same time and place [Spirin et al. 2004] Functional module: Set of proteins involved in a common elementary biological function Bind each other at different time and place Multiple protein complexes [Chen et al ]
Identification of Functional Modules 3 Protein Interaction Networks (PINs) Functional modules correspond to highly connected sub- graphs in a PIN Many graph clustering approaches Clique-based methods: strict and not scalable to large PINs Density-based methods: issues with low-degree nodes and low topological connectivity Hierarchical methods Hierarchical organization of the modules within PINs Global metric: not scalable to large PINs Local metric: common misclassification of low-degree nodes Poor performance on noisy PINs; i.e., false positives interactions
Graph Clustering 4 Find non-overlapping communities in PINs
Hierarchical Methods -- Related Works 5 Divisive Approaches Iteratively remove an edge with the Highest Edge Betweenness Score CNM method [Clauset et al 2004] O(m h logn) Lowest Edge Clustering Coefficient Radicchi method [Radicchi et al 2004] O(m 2 ) These are global measures
Hierarchical Methods -- Related Works 6 Agglomerative Approaches: Iteratively merge two clusters C u and C v Edge Clustering Value: Local similarity metric between nodes HC-PIN Algorithm [Wang et al 2011]
Our New Criterion – UnWeighted PINs 7 Relative Vertex-to-Vertex Clustering Value 0 ≤ R(u → v) ≤ 100 Likelihood of u to be in v’s cluster Not how likely that both u and v lie in the same cluster Local similarity pre-metric Principle of preferential attachment in scale-free networks
Our New Criterion – Weighted PINs 8 Where, w(x, y) = weight on interaction edge (x, y)
FAC-PIN Algorithm – Test for Inclusion 9 Insert u into C v whenever 1. R(u → v) = R(u → v) > R(v → u) 3. R(u → v) = R(v → u) and 1. R(u → v) = R(v → u) = 100 or 2. R(u → v) > 50 That is whenever: R(u → v) > 50μ and R(u → v) ≥ R(v → u) Algorithm: for each v; iteratively insert its neighbors u into C v whenever test is true for u.
FAC-PIN Algorithm - Clustering 10 Initialization Phase Form singleton cluster C(v) for each v Community Detection Phase For each v, include each neighbor u into C(v) whenever [ R w (u → v) > 50μ and R w (u → v) ≥ R w (v → u) ] is true with merging parameter: 0 ≤ μ < 2 Partition Computation Phase Obtain the induced subgraph of G for each C(v) as sub- network cluster Evaluation Phase
FAC-PIN Algorithm - Clustering 11
Computational Complexities 12 Given n nodes and m edges CNM Algorithm: O( m h logn ) h = height Radicchi Algorithm: O( m 2 ) HC-PIN Algorithm: O( m δ 2 ) FAC-PIN Algorithm: O( n δ 2 ) << O( n D 2 ) δ = average degree and D = maximum degree
Computational Experiments 13 For any given PIN: 1. Apply FAC-PIN with merging parameters μ 2. Evaluate modularity of resulting partitions P k,μ Three modularity functions 3. P k = best P k,μ 4. Execution time to obtain P k,μ 5. Functional Enrichment validations with SGD GO P-value cutoff = 0.05 Retain significant clusters and number of significant clusters
Data Sets 14 8 un-weighted PIN data of from REACTOME database Including PIN data of S. cerevisiae (yeast SC-1) PIN data 5697 proteins interactions 1 un-weighted PIN and corresponding weighted PIN data of S. cerevisiae (yeast SC-2) from DIP database 4726 proteins interactions Protein complexes from MIPS database
Results – Effect of Merging Parameter μ (SC-2; 4726 proteins and interactions) 15 Recall: merging test = [ R w (u → v) > 50μ and R w (u → v) ≥ R w (v → u) ] Less neighbors are merged with v as μ increases, hence k increases with μ
Results – Execution Times in Seconds (PINs from Reactome database; μ = 0.5) 16
Results – Modularity Functions 17 Function Q: Function Ω: Function D: where w(u, v) = 0 or 1 for un-weighted PINs
Results – Modularity of FAC-PIN Partitions (PINs from Reactome database; μ = 0.5) 18 QwΩwDwQwΩwDw
Functional Module Prediction 19 Recall indicates how effectively proteins with the same functional category in the network are extracted Precision illustrated how consistently proteins in the same module are annotated f-measure is used to evaluate the overall performance Average f-measure as the accuracy of the algorithms
Functional Enrichment of FAC-PIN Modules 20 Hypergeometric distribution… …
Results – Functional Enrichment Validations (Un-weighted SC-1; 5697 proteins and interactions; μ = 0.5) 21