Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary.

Similar presentations


Presentation on theme: " Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary."— Presentation transcript:

1

2  Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

3 Graph Clustering  Intuition: ◦ High connected nodes could be in one cluster ◦ Low connected nodes could be in different clusters.  Model: ◦ A random walk may start at any node ◦ Starting at node r, if a random walk will reach node t with high probability, then r and t should be clustered together.

4 Markov Clustering (MCL)  Markov process ◦ The probability that a random will take an edge at node u only depends on u and the given edge. ◦ It does not depend on its previous route. ◦ This assumption simplifies the computation.

5 MCL  Flow network is used to approximate the partition  There is an initial amount of flow injected into each node.  At each step, a percentage of flow will goes from a node to its neighbors via the outgoing edges.

6 MCL  Edge Weight ◦ Similarity between two nodes ◦ Considered as the bandwidth or connectivity. ◦ If an edge has higher weight than the other, then more flow will be flown over the edge. ◦ The amount of flow is proportional to the edge weight. ◦ If there is no edge weight, then we can assign the same weight to all edges.

7 Intuition of MCL  Two natural clusters  When the flow reaches the border points, it is likely to return back, than cross the border. AB

8 MCL  When the flow reaches A, it has four possible outcomes. ◦ Three back into the cluster, one leak out. ◦ ¾ of flow will return, only ¼ leaks.  Flow will accumulate in the center of a cluster (island).  The border nodes will starve.

9  Simualtion of Random Flow in graph  Two Operations: Expansion and Inflation  Intrinsic relationship between MCL process result and cluster structure

10  Popular Description: partition into graph so that  Intra-partition similarity is the highest  Inter-partition similarity is the lowest

11  Observation 1:  The number of Higher-Length paths in G is large for pairs of vertices lying in the same dense cluster  Small for pairs of vertices belonging to different clusters

12  Oberservation 2:  A Random Walk in G that visits a dense cluster will likely not leave the cluster until many of its vertices have been visited

13  nxn Adjacency matrix A. ◦ A(i,j) = weight on edge from i to j ◦ If the graph is undirected A(i,j)=A(j,i), i.e. A is symmetric  nxn Transition matrix P. ◦ P is row stochastic ◦ P(i,j) = probability of stepping on node j from node i = A(i,j)/∑ i A(i,j)  nxn Laplacian Matrix L. ◦ L(i,j)=∑ i A(i,j)-A(i,j) ◦ Symmetric positive semi-definite for undirected graphs ◦ Singular

14 Adjacency matrix A Transition matrix P 1 1 1 1 1 1/2 1

15 1 1 t=0

16 1 1/2 1 1 1 t=0 t=1

17 1 1/2 1 1 1 t=0 t=1 1 1/2 1 t=2

18 1 1/2 1 1 1 t=0 t=1 1 1/2 1 t=2 1 1/2 1 t=3

19  x t (i) = probability that the surfer is at node i at time t  x t+1 (i) = ∑ j (Probability of being at node j)*Pr(j->i) =∑ j x t (j)*P(j,i)  x t+1 = x t P = x t-1 *P*P= x t-2 *P*P*P = …=x 0 P t  What happens when the surfer keeps walking for a long time?

20 Flow Formulation Flow: Transition probability from a node to another node. Flow matrix: Matrix with the flows among all nodes; i th column represents flows out of i th node. Each column sums to 1. 123 123 0.5 11 123 10 0 21.00 300.50 Flow Matrix 20

21  Measure or Sample any of these—high-length paths, random walks and deduce the cluster structure from the behavior of the samples quantities.  Cluster structure will show itself as a peaked distribution of the quantities  A lack of cluster structure will result in a flat distribution

22  Markov Chain  Random Walk on Graph  Some Definitions in MCL

23  A Random Process with Markov Property  Markov Property: given the present state, future states are independent of the past states  At each step the process may change its state from the current state to another state, or remain in the same state, according to a certain probability distribution.

24

25  A walker takes off on some arbitrary vertex  He successively visits new vertices by selecting arbitrarily one of outgoing edges  There is not much difference between random walk and finite Markov chain.

26  Simple Graph  Simple graph is undirected graph in which every nonzero weight equals 1.

27  Associated Matrix  The associated matrix of G, denoted M G,is defined by setting the entry (M G ) pq equal to w(v p,v q )

28  Markov Matrix  The Markov matrix associated with a graph G is denoted by T G and is formally defined by letting its q th column be the q th column of M normalized

29

30  The associate matrix and markov matrix is actually for matrix M+I  I denotes diagonal matrix with nonzero element equals 1  Adding a loop to every vertex of the graph because for a walker it is possible that he will stay in the same place in his next step

31

32  Find Higher-Length Path  Start Point: In associated matrix that the quantity (M k ) pq has a straightforward interpretation as the number of paths of length k between v p and v q

33 (M G +I) 2 MGMG

34 MGMG

35

36  Flow is easier with dense regions than across sparse boundaries,  However, in the long run, this effect disappears.  Power of matrix can be used to find higher- length path but the effect will diminish as the flow goes on.

37  Idea: How can we change the distribution of transition probabilities such that prefered neighbours are further favoured and less popular neighbours are demoted.  MCL Solution: raise all the entries in a given column to a certain power greater than 1 (e.g. squaring) and rescaling the column to have the sum 1 again.

38

39

40

41

42  Expansion Operation: power of matrix, expansion of dense region  Inflation Operation: mention aboved, elimination of unfavoured region

43 The MCL algorithm Expand: M := M*M Inflate: M := M.^r (r usually 2), renormalize columns Converged? Output clusters Input: A, Adjacency matrix Initialize M to M G, the canonical transition matrix M:= M G := (A+I) D -1 Yes Output clusters No Prune Enhances flow to well-connected nodes as well as to new nodes. Increases inequality in each column. “Rich get richer, poor get poorer.” Saves memory by removing entries close to zero. 43

44 Multi-level Regularized MCL Input Graph Intermediate Graph Intermediate Graph Coarsest Graph... Coarsen Run Curtailed R-MCL,project flow. Input Graph Run R-MCL to convergence, output clusters. Faster to run on smaller graphs first Captures global topology of graph Initializes flow matrix of refined graph 44

45

46

47

48  http://www.micans.org/mcl/ani/mcl- animation.html http://www.micans.org/mcl/ani/mcl- animation.html

49  Find attractor: the node a is an attractor if Maa is nonzero  Find attractor system: If a is an attractor then the set of its neighbours is called an attractor system.  If there is a node who has arc connected to any node of an attractor system, the node will belong to the same cluster as that attractor system.

50 Attractor Set={1,2,3,4,5,6,7,8,9,10} The Attractor System is {1,2,3},{4,5,6,7},{8,9},{10} The overlapping clusters are {1,2,3,11,12,15},{4,5,6,7,13},{8,9,12,13,14,15},{10,12,13}

51  how many steps are requred before the algorithm converges to a idempoent matrix?  The number is typically somewhere between 10 and 100  The effect of inflation on cluster granularity

52 R denotes the inflation operation constants. a denotes the loop weight.

53  MCL stimulates random walk on graph to find cluster  Expansion promotes dense region while Inflation demotes the less favoured region  There is intrinsic relationship between MCL result and cluster structure


Download ppt " Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary."

Similar presentations


Ads by Google