 Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary.

Slides:



Advertisements
Similar presentations
Social network partition Presenter: Xiaofei Cao Partick Berg.
Advertisements

Introduction to Graph Theory Instructor: Dr. Chaudhary Department of Computer Science Millersville University Reading Assignment Chapter 1.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Graph-02.
KDD 2009 Scalable Graph Clustering using Stochastic Flows Applications to Community Discovery Venu Satuluri and Srinivasan Parthasarathy Data Mining Research.
Graphs Graphs are the most general data structures we will study in this course. A graph is a more general version of connected nodes than the tree. Both.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
 Graph Graph  Types of Graphs Types of Graphs  Data Structures to Store Graphs Data Structures to Store Graphs  Graph Definitions Graph Definitions.
Introduction to Bioinformatics
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
Entropy Rates of a Stochastic Process
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Graph & BFS.
Graphs Intro G.Kamberova, Algorithms Graphs Introduction Gerda Kamberova Department of Computer Science Hofstra University.
Estimating the Global PageRank of Web Communities Paper by Jason V. Davis & Inderjit S. Dhillon Dept. of Computer Sciences University of Texas at Austin.
Graph COMP171 Fall Graph / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D E A C F B Vertex Edge.
Graph & BFS Lecture 22 COMP171 Fall Graph & BFS / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Expanders Eliyahu Kiperwasser. What is it? Expanders are graphs with no small cuts. The later gives several unique traits to such graph, such as: – High.
Finite Mathematics & Its Applications, 10/e by Goldstein/Schneider/SiegelCopyright © 2010 Pearson Education, Inc. 1 of 60 Chapter 8 Markov Processes.
Markov Cluster Algorithm
Graphs, relations and matrices
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Graph clustering Jin Chen CSE Fall 2012 MSU 1.
Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.
GRAPH Learning Outcomes Students should be able to:
Entropy Rate of a Markov Chain
GRAPH THEORY.  A graph is a collection of vertices and edges.  An edge is a connection between two vertices (or nodes).  One can draw a graph by marking.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
Liang Ge.  Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary.
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
1 Random Walks on Graphs: An Overview Purnamrita Sarkar, CMU Shortened and modified by Longin Jan Latecki.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Data Structures Week 9 Introduction to Graphs Consider the following problem. A river with an island and bridges. The problem is to see if there is a way.
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
Based on slides by Y. Peng University of Maryland
Week 11 - Monday.  What did we talk about last time?  Binomial theorem and Pascal's triangle  Conditional probability  Bayes’ theorem.
Markov Cluster (MCL) algorithm Stijn van Dongen.
Data Structures & Algorithms Graphs
Hidden Markovian Model. Some Definitions Finite automation is defined by a set of states, and a set of transitions between states that are taken based.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
Union By Rank Ackermann’s Function Graph Algorithms Rajee S Ramanikanthan Kavya Reddy Musani.
Lecture 14, CS5671 Clustering Algorithms Density based clustering Self organizing feature maps Grid based clustering Markov clustering.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Goldstein/Schnieder/Lay: Finite Math & Its Applications, 9e 1 of 60 Chapter 8 Markov Processes.
Theory of Computational Complexity Probability and Computing Lee Minseon Iwama and Ito lab M1 1.
1 Euler and Hamilton paths Jorge A. Cobb The University of Texas at Dallas.
Week 11 - Wednesday.  What did we talk about last time?  Graphs  Paths and circuits.
Section 9.3. Section Summary Representing Relations using Matrices Representing Relations using Digraphs.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Subject Four Graphs Data Structures. What is a graph? A data structure that consists of a set of nodes (vertices) and a set of edges that relate the nodes.
Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama
Chapter - 12 GRAPH MATRICES AND APPLICATIONS.
Graphs Lecture 19 CS2110 – Spring 2013.
Chapter 3. Decompositions of Graphs
Matrix Representation of Graphs
Random Walks on Graphs.
Network analysis.
Mean Shift Segmentation
Graph Clustering based on Random Walk
Graph Operations And Representation
Discrete Math 2 Shortest Path Using Matrix
GRAPHS Lecture 17 CS2110 Spring 2018.
Presentation transcript:

 Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Graph Clustering  Intuition: ◦ High connected nodes could be in one cluster ◦ Low connected nodes could be in different clusters.  Model: ◦ A random walk may start at any node ◦ Starting at node r, if a random walk will reach node t with high probability, then r and t should be clustered together.

Markov Clustering (MCL)  Markov process ◦ The probability that a random will take an edge at node u only depends on u and the given edge. ◦ It does not depend on its previous route. ◦ This assumption simplifies the computation.

MCL  Flow network is used to approximate the partition  There is an initial amount of flow injected into each node.  At each step, a percentage of flow will goes from a node to its neighbors via the outgoing edges.

MCL  Edge Weight ◦ Similarity between two nodes ◦ Considered as the bandwidth or connectivity. ◦ If an edge has higher weight than the other, then more flow will be flown over the edge. ◦ The amount of flow is proportional to the edge weight. ◦ If there is no edge weight, then we can assign the same weight to all edges.

Intuition of MCL  Two natural clusters  When the flow reaches the border points, it is likely to return back, than cross the border. AB

MCL  When the flow reaches A, it has four possible outcomes. ◦ Three back into the cluster, one leak out. ◦ ¾ of flow will return, only ¼ leaks.  Flow will accumulate in the center of a cluster (island).  The border nodes will starve.

 Simualtion of Random Flow in graph  Two Operations: Expansion and Inflation  Intrinsic relationship between MCL process result and cluster structure

 Popular Description: partition into graph so that  Intra-partition similarity is the highest  Inter-partition similarity is the lowest

 Observation 1:  The number of Higher-Length paths in G is large for pairs of vertices lying in the same dense cluster  Small for pairs of vertices belonging to different clusters

 Oberservation 2:  A Random Walk in G that visits a dense cluster will likely not leave the cluster until many of its vertices have been visited

 nxn Adjacency matrix A. ◦ A(i,j) = weight on edge from i to j ◦ If the graph is undirected A(i,j)=A(j,i), i.e. A is symmetric  nxn Transition matrix P. ◦ P is row stochastic ◦ P(i,j) = probability of stepping on node j from node i = A(i,j)/∑ i A(i,j)  nxn Laplacian Matrix L. ◦ L(i,j)=∑ i A(i,j)-A(i,j) ◦ Symmetric positive semi-definite for undirected graphs ◦ Singular

Adjacency matrix A Transition matrix P /2 1

1 1 t=0

1 1/ t=0 t=1

1 1/ t=0 t=1 1 1/2 1 t=2

1 1/ t=0 t=1 1 1/2 1 t=2 1 1/2 1 t=3

 x t (i) = probability that the surfer is at node i at time t  x t+1 (i) = ∑ j (Probability of being at node j)*Pr(j->i) =∑ j x t (j)*P(j,i)  x t+1 = x t P = x t-1 *P*P= x t-2 *P*P*P = …=x 0 P t  What happens when the surfer keeps walking for a long time?

Flow Formulation Flow: Transition probability from a node to another node. Flow matrix: Matrix with the flows among all nodes; i th column represents flows out of i th node. Each column sums to Flow Matrix 20

 Measure or Sample any of these—high-length paths, random walks and deduce the cluster structure from the behavior of the samples quantities.  Cluster structure will show itself as a peaked distribution of the quantities  A lack of cluster structure will result in a flat distribution

 Markov Chain  Random Walk on Graph  Some Definitions in MCL

 A Random Process with Markov Property  Markov Property: given the present state, future states are independent of the past states  At each step the process may change its state from the current state to another state, or remain in the same state, according to a certain probability distribution.

 A walker takes off on some arbitrary vertex  He successively visits new vertices by selecting arbitrarily one of outgoing edges  There is not much difference between random walk and finite Markov chain.

 Simple Graph  Simple graph is undirected graph in which every nonzero weight equals 1.

 Associated Matrix  The associated matrix of G, denoted M G,is defined by setting the entry (M G ) pq equal to w(v p,v q )

 Markov Matrix  The Markov matrix associated with a graph G is denoted by T G and is formally defined by letting its q th column be the q th column of M normalized

 The associate matrix and markov matrix is actually for matrix M+I  I denotes diagonal matrix with nonzero element equals 1  Adding a loop to every vertex of the graph because for a walker it is possible that he will stay in the same place in his next step

 Find Higher-Length Path  Start Point: In associated matrix that the quantity (M k ) pq has a straightforward interpretation as the number of paths of length k between v p and v q

(M G +I) 2 MGMG

MGMG

 Flow is easier with dense regions than across sparse boundaries,  However, in the long run, this effect disappears.  Power of matrix can be used to find higher- length path but the effect will diminish as the flow goes on.

 Idea: How can we change the distribution of transition probabilities such that prefered neighbours are further favoured and less popular neighbours are demoted.  MCL Solution: raise all the entries in a given column to a certain power greater than 1 (e.g. squaring) and rescaling the column to have the sum 1 again.

 Expansion Operation: power of matrix, expansion of dense region  Inflation Operation: mention aboved, elimination of unfavoured region

The MCL algorithm Expand: M := M*M Inflate: M := M.^r (r usually 2), renormalize columns Converged? Output clusters Input: A, Adjacency matrix Initialize M to M G, the canonical transition matrix M:= M G := (A+I) D -1 Yes Output clusters No Prune Enhances flow to well-connected nodes as well as to new nodes. Increases inequality in each column. “Rich get richer, poor get poorer.” Saves memory by removing entries close to zero. 43

Multi-level Regularized MCL Input Graph Intermediate Graph Intermediate Graph Coarsest Graph... Coarsen Run Curtailed R-MCL,project flow. Input Graph Run R-MCL to convergence, output clusters. Faster to run on smaller graphs first Captures global topology of graph Initializes flow matrix of refined graph 44

 animation.html animation.html

 Find attractor: the node a is an attractor if Maa is nonzero  Find attractor system: If a is an attractor then the set of its neighbours is called an attractor system.  If there is a node who has arc connected to any node of an attractor system, the node will belong to the same cluster as that attractor system.

Attractor Set={1,2,3,4,5,6,7,8,9,10} The Attractor System is {1,2,3},{4,5,6,7},{8,9},{10} The overlapping clusters are {1,2,3,11,12,15},{4,5,6,7,13},{8,9,12,13,14,15},{10,12,13}

 how many steps are requred before the algorithm converges to a idempoent matrix?  The number is typically somewhere between 10 and 100  The effect of inflation on cluster granularity

R denotes the inflation operation constants. a denotes the loop weight.

 MCL stimulates random walk on graph to find cluster  Expansion promotes dense region while Inflation demotes the less favoured region  There is intrinsic relationship between MCL result and cluster structure