Graph Clustering based on Random Walk

Slides:



Advertisements
Similar presentations
Quantum walks: Definition and applications
Advertisements

4.1 Introduction to Matrices
Lecture 19: Parallel Algorithms
8.3 Representing Relations Connection Matrices Let R be a relation from A = {a 1, a 2,..., a m } to B = {b 1, b 2,..., b n }. Definition: A n m  n connection.
Graph-02.
KDD 2009 Scalable Graph Clustering using Stochastic Flows Applications to Community Discovery Venu Satuluri and Srinivasan Parthasarathy Data Mining Research.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
Matrix Multiplication To Multiply matrix A by matrix B: Multiply corresponding entries and then add the resulting products (1)(-1)+ (2)(3) Multiply each.
Maths for Computer Graphics
Protein Domain Finding Problem Olga Russakovsky, Eugene Fratkin, Phuong Minh Tu, Serafim Batzoglou Algorithm Step 1: Creating a graph of k-mers First,
Finite Mathematics & Its Applications, 10/e by Goldstein/Schneider/SiegelCopyright © 2010 Pearson Education, Inc. 1 of 60 Chapter 8 Markov Processes.
Markov Cluster Algorithm
DynaTraffic – Models and mathematical prognosis
Graph clustering Jin Chen CSE Fall 2012 MSU 1.
Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.
GRAPH THEORY.  A graph is a collection of vertices and edges.  An edge is a connection between two vertices (or nodes).  One can draw a graph by marking.
Liang Ge.  Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary.
Yaomin Jin Design of Experiments Morris Method.
If A and B are both m × n matrices then the sum of A and B, denoted A + B, is a matrix obtained by adding corresponding elements of A and B. add these.
PageRank. s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31.
Markov Cluster (MCL) algorithm Stijn van Dongen.
Sharon Bruckner, Bastian Kayser, Tim Conrad Freie Uni. Berlin Finding Modules in Networks with Non-modular Regions.
Data Structures CSCI 132, Spring 2014 Lecture 38 Graphs
3.4 Solution by Matrices. What is a Matrix? matrix A matrix is a rectangular array of numbers.
MATRICES MATRIX OPERATIONS. About Matrices  A matrix is a rectangular arrangement of numbers in rows and columns. Rows run horizontally and columns run.
Lecture 14, CS5671 Clustering Algorithms Density based clustering Self organizing feature maps Grid based clustering Markov clustering.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Goldstein/Schnieder/Lay: Finite Math & Its Applications, 9e 1 of 60 Chapter 8 Markov Processes.
 Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Subject Four Graphs Data Structures. What is a graph? A data structure that consists of a set of nodes (vertices) and a set of edges that relate the nodes.
13.4 Product of Two Matrices
Introduction to Graphs
Industrial Engineering Dep
Introduction to Matrices
Matrix Multiplication
by Hyunwoo Park and Kichun Lee Knowledge-Based Systems 60 (2014) 58–72
Introduction to Graphs
DTMC Applications Ranking Web Pages & Slotted ALOHA
Community detection in graphs
Clustering Evaluation The EM Algorithm
Lecture 22: Parallel Algorithms
Degree and Eigenvector Centrality
Multiplying Matrices.
MATRICES MATRIX OPERATIONS.
Deterministic Gossiping
Graphs Representation, BFS, DFS
Lecture 12 Network Analysis (3)
1.
Introduction to Matrices
Matrices Elements, Adding and Subtracting
MATRICES MATRIX OPERATIONS.
Lecture 4: Algorithmic Methods for G/M/1 and M/G/1 type models
MATRICES MATRIX OPERATIONS.
2.2 Introduction to Matrices
Multiplying Matrices.
Maths for Signals and Systems Linear Algebra in Engineering Lectures 10-12, Tuesday 1st and Friday 4th November2016 DR TANIA STATHAKI READER (ASSOCIATE.
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
Matrix Operations Chapter 4, Sections 1, 2, 3.
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies
Matrices An appeaser is one who feeds a crocodile—hoping it will eat him last. Winston Churchhill.
MATRICES MATRIX OPERATIONS.
MATRICES MATRIX OPERATIONS.
Multiplying Matrices.
Algorithms CSCI 235, Spring 2019 Lecture 32 Graphs I
Multiplying Matrices.
Agenda Review Lecture Content: Shortest Path Algorithm
Introduction to Graphs
Multiplying Matrices.
Introduction to Graphs
Presentation transcript:

Graph Clustering based on Random Walk Bing Lidong 2010-02-10

Outline Background MCL MCL++ Graph Clustering Random Walks Basis Inflation Operator Algorithm Convergence MCL++ R-MCL MLR-MCL

Outline Background MCL MCL++ Graph Clustering Random Walks Basis Inflation Operator Algorithm Convergence MCL++ R-MCL MLR-MCL

Graph Clustering Clustering: group items naturally Vector clustering Graph clustering Many links within a cluster, and fewer links between clusters Vectors are more likely to each other in the same cluster

Random Walk Observation: If you start at a node, and then randomly travel to a connected node, you’re more likely to stay within a cluster than travel between. This is what MCL based on. Random walk on a graph is a Markov process, that means next state only depends on current state.

Outline Background MCL MCL++ Graph Clustering Random Walks Basis Inflation Operator Algorithm Convergence MCL++ R-MCL MLR-MCL

Example 1 2 3 4 5 6 Transition matrix P P1000 What’s wrong?? 0 0.5 0.5 0.33 0 0 0.33 0 0.5 0 0 0 0.33 0.5 0 0 0 0 0.33 0 0 0 0.5 0.5 0 0 0 0.33 0 0.5 0 0 0 0.33 0.5 0 What’s wrong?? 0.2148 0.2148 0.2148 0.2148 0.2148 0.2148 0.1428 0.1428 0.1428 0.1428 0.1428 0.1428 0.2141 0.2141 0.2141 0.2141 0.2141 0.2141 P1000

What happened? 0 0.5 0.5 0.33 0 0 0.33 0 0.5 0 0 0 0.33 0.5 0 0 0 0 0.33 0 0 0 0.5 0.5 0 0 0 0.33 0 0.5 0 0 0 0.33 0.5 0 1 2 3 4 5 6 "Flow is easier within dense regions than across sparse boundaries, however, in the long run this effect disappears." How to deal with it? During the walking, we should encourage the intra-cluster communications and punish the inter-ones.

MCL Inflation MCL adjusting the transitions by columns. For each vertex, the transition values are changed so that Strong neighbors are further strengthened Less popular neighbors are demoted. This adjusting can be done by raising a single column to a non-negative power, and then re-normalizing. This operation is named “Inflation” (the matrix powers is named “Expansion”)

Inflation operation

Inflation example Strengthens strong flows, and weakens already weak flows The inflation parameter, r, controls the extent of this strengthening / weakening. This influences the granularity of clusters. Square, and then normalize

MCL Algorithm Two processes are repeated alternately: Expansion Inflation

Convergence Convergence is not proven in the thesis, however it is shown experimentally that it often does occur. In practice, the algorithm converges nearly always to a "doubly idempotent" matrix: It's at steady state. Every value in a single column has the same number

Example

Example (cont.)

Example (cont.) How to interpret clusters?

MCL Interpreting Clusters To interpret clusters, the vertices are split into two types. Attractors, which attract other vertices, and vertices that are being attracted by the attractors. Attractors have at least one positive flow value within their corresponding row (in the steady state matrix). Each attractor is attracting the vertices which have positive values within its row. Attractors and the elements they attract are swept together into the same cluster.

Overlapping clusters Only when a vertex is attracted exactly equally by more than one cluster This occurs only when both clusters are isomorphic

Inflation parameter

MCL Analysis For clusters with large diameter, MCL has problems Distributing flow across cluster needs long expansion and low inflation (otherwise the cluster will split). Takes many iterations and causes MCL to be sensitive to small perturbations in the graph.

MCL Analysis (cont.) O(N3), where N is the number of vertices N3 cost of one matrix multiplication on two matrices of dimension N. Inflation can be done in O(N2) time The number of steps to converge is not proven, but experimentally shown to be ~10 to 100 steps, and mostly consist of sparse matrices after the first few steps. Speed can be improved through pruning Inspect matrix and set small values directly to zero Works well when the diameter of the clusters is small

Outline Background MCL MCL++ Graph Clustering Random Walks Basis Inflation Operator Algorithm Convergence MCL++ R-MCL MLR-MCL

References [1] S. V. Dongen. Graph Clustering by Flow Simulation. PhD Thesis, University of Utrecht, 2000. http://igitur- archive.library.uu.nl/dissertations/1895620/inhoud.htm [2] http://www.cs.ucsb.edu/~xyan/classes/CS595D- 2009winter/MCL_Presentation2.pdf [3] V. Satuluri and S. Parthasarathy. Scalable Graph Clustering Using Stochastic Flows: Applications to Community Discovery, KDD'09. http://portal.acm.org/citation.cfm?id=1557101 [4] http://velblod.videolectures.net/2009/contrib/kdd09_paris/s atuluri_sgcusfacd/kdd09_satuluri_sgcusfacd_01.ppt