KDD 2009 Scalable Graph Clustering using Stochastic Flows Applications to Community Discovery Venu Satuluri and Srinivasan Parthasarathy Data Mining Research.

Slides:



Advertisements
Similar presentations
A Clustering Framework for Unbalanced Partitioning and Outlier Filtering on High Dimensional Datasets 1 Turgay Tugay Bilgin and A.Yilmaz Camurcu 2 1 Department.
Advertisements

Social network partition Presenter: Xiaofei Cao Partick Berg.
METIS Three Phases Coarsening Partitioning Uncoarsening
Modularity and community structure in networks
Ricochet A Family of Unconstrained Algorithms for Graph Clustering.
Clustering Social Networks Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
1 Learning Dynamic Models from Unsequenced Data Jeff Schneider School of Computer Science Carnegie Mellon University joint work with Tzu-Kuo Huang, Le.
1 Local Sparsification for Scalable Module Identification in Networks Srinivasan Parthasarathy Joint work with V. Satuluri, Y. Ruan, D. Fuhry, Y. Zhang.
Biased Normalized Cuts 1 Subhransu Maji and Jithndra Malik University of California, Berkeley IEEE Conference on Computer Vision and Pattern Recognition.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
CS 376b Introduction to Computer Vision 04 / 08 / 2008 Instructor: Michael Eckmann.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
Normalized Cuts Demo Original Implementation from: Jianbo Shi Jitendra Malik Presented by: Joseph Djugash.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Clustering Social Networks Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan.
A scalable multilevel algorithm for community structure detection
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Markov Cluster Algorithm
Multilevel Graph Partitioning and Fiduccia-Mattheyses
Multilevel Hypergraph Partitioning G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar Computer Science Department, U of MN Applications in VLSI Domain.
Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which usually.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Graph clustering Jin Chen CSE Fall 2012 MSU 1.
Department of Engineering Science Department of Zoology Soft partitioning in networks via Bayesian nonnegative matrix factorization Ioannis Psorakis, Steve.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.
Community Detection by Modularity Optimization Jooyoung Lee
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Co-clustering Documents and Words Using Bipartite Spectral Graph Partitioning Jinghe Zhang 10/28/2014 CS 6501 Information Retrieval.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Markov Cluster (MCL) algorithm Stijn van Dongen.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County.
Sharon Bruckner, Bastian Kayser, Tim Conrad Freie Uni. Berlin Finding Modules in Networks with Non-modular Regions.
Local/Global Term Analysis for Discovering Community Differences in Social Networks David Fuhry, Yiye Ruan, and Srinivasan Parthasarathy Data Mining Research.
Image Segmentation Superpixel methods Speaker: Hsuan-Yi Ko.
University at BuffaloThe State University of New York Detecting Community Structure in Networks.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Melbourne, Australia, Oct., 2015 gSparsify: Graph Motif Based Sparsification for Graph Clustering Peixiang Zhao Department of Computer Science Florida.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
CS654: Digital Image Analysis Lecture 28: Advanced topics in Image Segmentation Image courtesy: IEEE, IJCV.
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Srinivasan Parthasarathy Community Detection in Graphs.
Spectral Clustering Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.
NN k Networks for browsing and clustering image collections Daniel Heesch Communications and Signal Processing Group Electrical and Electronic Engineering.
 Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary.
Motoki Shiga, Ichigaku Takigawa, Hiroshi Mamitsuka
High Performance Computing Seminar
Graph clustering to detect network modules
Hierarchical Agglomerative Clustering on graphs
Spectral methods for Global Network Alignment
A Viewpoint-based Approach for Interaction Graph Analysis
Research in Computational Molecular Biology , Vol (2008)
Hansheng Xue School of Computer Science and Technology
Structure learning with deep autoencoders
Using Multilevel Force-Directed Algorithm to Draw Large Clustered Graph 研究生: 何明彥 指導老師:顏嗣鈞 教授 2018/12/4 NTUEE.
Graph Clustering based on Random Walk
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
Spectral methods for Global Network Alignment
SEG5010 Presentation Zhou Lanjun.
Alan Kuhnle*, Victoria G. Crawford, and My T. Thai
“Traditional” image segmentation
Presentation transcript:

KDD 2009 Scalable Graph Clustering using Stochastic Flows Applications to Community Discovery Venu Satuluri and Srinivasan Parthasarathy Data Mining Research Laboratory Dept. of Computer Science and Engineering The Ohio State University

Scalable Graph Clustering using Stochastic FlowsVenu Satuluri and Srinivasan Parthasarathy Outline Introduction -Problem Statement -Markov Clustering (MCL) Proposed Algorithms -Regularized MCL (R-MCL) -Multi-level Regularized MCL (MLR-MCL) Evaluation Conclusions 2

Scalable Graph Clustering using Stochastic FlowsVenu Satuluri and Srinivasan Parthasarathy Problem Statement Graph Clustering: Partition the vertices of a graph into disjoint sets such that each partition is a well-connected/coherent group. Applications: Discovery of protein complexes [Snel ‘02] Community discovery in social networks [Newman ‘06] Image segmentation [Shi ‘00] Existing solutions: Spectral methods [Shi ‘00] Edge-based agglomerative/divisive methods [Newman ‘04] Kernel K-Means [Dhillon ‘07] Metis [Karypis ’98] Markov Clustering (MCL) [van Dongen ’00] 3

Scalable Graph Clustering using Stochastic FlowsVenu Satuluri and Srinivasan Parthasarathy Markov Clustering (MCL) [van Dongen ‘00] The original algorithm for clustering graphs using stochastic flows. Advantages: Simple and elegant. Widely used in Bioinformatics because of its noise tolerance and effectiveness. Disadvantages: Very slow. - Takes 1.2 hours to cluster a 76K node social network. Prone to output too many clusters. - Produces 1416 clusters on a 4741 node PPI network. Can we redress the disadvantages of MCL while retaining its advantages? 4

Scalable Graph Clustering using Stochastic FlowsVenu Satuluri and Srinivasan Parthasarathy Terminology Flow: Transition probability from a node to another node. Flow matrix: Matrix with the flows among all nodes; i th column represents flows out of i th node. Each column sums to Flow Matrix 5

Scalable Graph Clustering using Stochastic FlowsVenu Satuluri and Srinivasan Parthasarathy The MCL algorithm Expand: M := M*M Inflate: M := M.^r (r usually 2), renormalize columns Converged? Output clusters Input: A, Adjacency matrix Initialize M to M G, the canonical transition matrix M:= M G := (A+I) D -1 Yes Output clusters No Prune Enhances flow to well-connected nodes as well as to new nodes. Increases inequality in each column. “Rich get richer, poor get poorer.” Saves memory by removing entries close to zero. 6

Scalable Graph Clustering using Stochastic FlowsVenu Satuluri and Srinivasan Parthasarathy The Regularize operator Why does MCL output many clusters? The original matrix is only used at the start, and neighboring information fades as time goes on. Called “overfitting”; it does not penalize divergence of flows between neighbors. Remedy: Let q i, i=1:k, be the flow distributions of the k neighbors of node q in the graph. Let w i, i=1:k, be the respective normalized edge weights, flow of q: Closed solution: This update defines the Regularize operator. In matrix notation, Regularize(M) := M*M G = M*(A+I)D -1 7

Scalable Graph Clustering using Stochastic FlowsVenu Satuluri and Srinivasan Parthasarathy The Regularized-MCL algorithm Regularize: M := M*M G Inflate: M := M.^r (r usually 2), renormalize columns Converged? Output clusters Yes Output clusters No Prune Takes into account flows of the neighbors. Increases inequality in each column. “Rich get richer, poor get poorer.” Saves memory by removing entries close to zero. Input: A, Adjacency matrix Initialize M to M G, the canonical transition matrix M:= M G := (A+I) D -1 8

Scalable Graph Clustering using Stochastic FlowsVenu Satuluri and Srinivasan Parthasarathy Multi-level Regularized MCL Input Graph Intermediate Graph Intermediate Graph Coarsest Graph... Coarsen Run Curtailed R-MCL,project flow. Input Graph Run R-MCL to convergence, output clusters. Faster to run on smaller graphs first Captures global topology of graph Initializes flow matrix of refined graph 9

Coarsening operation Construct a matching: defined as a set of edges, no vertex is shared among these edges. Each edge is mapped into a super-node in the coarsened graph, and the new edges are the union of the original ones. Two maps used to keep the track of the process matchingmapping ABC Map1: 1, 2, 5 Map1: 4, 3, 6 Map1: A B C

Project flow 11

Evaluation criteria The normalized cut of a cluster C in the graph G is defined as: Average Ncut: 12 /k/k Avg.

Comparison with MCL Why R-MCL is much faster than MCL? – Regularize is more faster that expansion, because M G is sparser than M, and R-MCL can stop earlier It seems MLR-MCL only upgrades the performance very less, especially the AVG. N-cut 13

Scalable Graph Clustering using Stochastic FlowsVenu Satuluri and Srinivasan Parthasarathy Comparison with Graclus and Metis Quality: MLR-MCL improves upon both Graclus and Metis 14

Scalable Graph Clustering using Stochastic FlowsVenu Satuluri and Srinivasan Parthasarathy Comparison with Graclus and Metis Speed: MLR-MCL is faster than Graclus and competitive with Metis 15

Scalable Graph Clustering using Stochastic FlowsVenu Satuluri and Srinivasan Parthasarathy Evaluation on PPI networks Yeast PPI network with 4741 proteins and interactions. Annotations from the Gene Ontology database used as ground truth. MLR-MCL returns clusters of higher biological significance than MCL or Graclus. 16

Scalable Graph Clustering using Stochastic FlowsVenu Satuluri and Srinivasan Parthasarathy Conclusions Regularized MCL overcomes the fragmentation problem of MCL. Multi-level Regularized MCL further improves quality and speed of R-MCL. MLR-MCL often outperforms state-of-the-art algorithms, both quality and speed-wise, on a wide variety of real datasets. Future Directions: Novel coarsening strategies Extensions to directed and bi-partite graphs. Acknowledgements: This work is supported in part by the following grants: NSF CAREER IIS , RI-CNS , CCF and IIS

Scalable Graph Clustering using Stochastic FlowsVenu Satuluri and Srinivasan Parthasarathy Thank You! References: 1.MCL - Graph Clustering by Flow Simulation. S. van Dongen, Ph.D. thesis, University of Utrecht, Graclus - Weighted Graph Cuts without Eigenvectors: A Multilevel Approach. Dhillon et. al., IEEE. Trans. PAMI, Metis - A fast and high quality multilevel scheme for partitioning irregular graphs. Karypis and Kumar, SIAM J. on Scientific Computing, Normalized Cuts and Image Segmentation. Shi and Malik, IEEE. Trans. PAMI, Finding and evaluating community structure in networks. Newman and Girvan, Phys. Rev. E 69, The identification of functional modules from the genomic association of genes. Snel et. al., PNAS