Markov Cluster (MCL) algorithm Stijn van Dongen.

Slides:



Advertisements
Similar presentations
Ricochet A Family of Unconstrained Algorithms for Graph Clustering.
Advertisements

Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
1. Markov Process 2. States 3. Transition Matrix 4. Stochastic Matrix 5. Distribution Matrix 6. Distribution Matrix for n 7. Interpretation of the Entries.
Lecture 21: Spectral Clustering
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
Spectral Clustering Course: Cluster Analysis and Other Unsupervised Learning Methods (Stat 593 E) Speakers: Rebecca Nugent1, Larissa Stanberry2 Department.
Convergent and Correct Message Passing Algorithms Nicholas Ruozzi and Sekhar Tatikonda Yale University TexPoint fonts used in EMF. Read the TexPoint manual.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Networks, Lie Monoids, & Generalized Entropy Metrics Networks, Lie Monoids, & Generalized Entropy Metrics St. Petersburg Russia September 25, 2005 Joseph.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Efficient and Robust Computation of Resource Clusters in the Internet Efficient and Robust Computation of Resource Clusters in the Internet Chuang Liu,
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Protein Domain Finding Problem Olga Russakovsky, Eugene Fratkin, Phuong Minh Tu, Serafim Batzoglou Algorithm Step 1: Creating a graph of k-mers First,
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
Similar Sequence Similar Function Charles Yan Spring 2006.
Probabilistic Model of Sequences Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Markov Cluster Algorithm
Application of Graph Theory to OO Software Engineering Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides Department of Applied Informatics.
HCC class lecture 14 comments John Canny 3/9/05. Administrivia.
BCB 570 Spring Protein-Protein Interaction Networks & methods Julie Dickerson Electrical and Computer Engineering.
CS6800 Advanced Theory of Computation Fall 2012 Vinay B Gavirangaswamy
DynaTraffic – Models and mathematical prognosis
1 Chapter 2 Matrices Matrices provide an orderly way of arranging values or functions to enhance the analysis of systems in a systematic manner. Their.
Lecture 11 – Stochastic Processes
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Graph clustering Jin Chen CSE Fall 2012 MSU 1.
Photo-realistic Rendering and Global Illumination in Computer Graphics Spring 2012 Stochastic Radiosity K. H. Ko School of Mechatronics Gwangju Institute.
Information Networks Power Laws and Network Models Lecture 3.
Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.
Entropy Rate of a Markov Chain
Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405.
Radial Basis Function Networks
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
Liang Ge.  Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary.
Edge Linking & Boundary Detection
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
International Workshop on Complex Networks, Seoul (23-24 June 2005) Vertex Correlations, Self-Avoiding Walks and Critical Phenomena on the Static Model.
The Influence Mobility Model: A Novel Hierarchical Mobility Modeling Framework Muhammad U. Ilyas and Hayder Radha Michigan State University.
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
Identification of Ortholog Groups by OrthoMCL Protein sequences from organisms of interest All-against-all BLASTP Between Species: Reciprocal best similarity.
CS774. Markov Random Field : Theory and Application Lecture 02
Courtesy of J. Akinpelu, Anis Koubâa, Y. Wexler, & D. Geiger
Relevant Subgraph Extraction Longin Jan Latecki Based on : P. Dupont, J. Callut, G. Dooms, J.-N. Monette and Y. Deville. Relevant subgraph extraction from.
KPS 2007 (April 19, 2007) On spectral density of scale-free networks Doochul Kim (Department of Physics and Astronomy, Seoul National University) Collaborators:
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Bioinformatics: Practical Application of Simulation and Data Mining Markov Modeling I Prof. Corey O’Hern Department of Mechanical Engineering Department.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Lecture 14, CS5671 Clustering Algorithms Density based clustering Self organizing feature maps Grid based clustering Markov clustering.
Step 3: Tools Database Searching
10.1 Properties of Markov Chains In this section, we will study a concept that utilizes a mathematical model that combines probability and matrices to.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Motif identification with Gibbs Sampler Xuhua Xia
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
NN k Networks for browsing and clustering image collections Daniel Heesch Communications and Signal Processing Group Electrical and Electronic Engineering.
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
 Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary.
From DeGroot & Schervish. Example Occupied Telephone Lines Suppose that a certain business office has five telephone lines and that any number of these.
Random Walks on Graphs.
Minimum Spanning Tree 8/7/2018 4:26 AM
by Hyunwoo Park and Kichun Lee Knowledge-Based Systems 60 (2014) 58–72
Intelligent Information System Lab
Graph Clustering based on Random Walk
Protein Structural Classification
Presentation transcript:

Markov Cluster (MCL) algorithm Stijn van Dongen

Markov Cluster (MCL) algorithm Traditionally, most methods deal with similarity relationships in a pairwise manner, while graph theory allows classification of proteins into families based on a global treatment of all relationships in similarity space simultaneously. Similarity between proteins are arranged in a matrix that represents a connection graph. Nodes of the graph represent proteins, and edges represent sequence similarity that connects such proteins. A weight is assigned to each edge by taking -log 10 (E-value) obtained by a BLAST comparison.

These weights are transformed into probabilities associated with a transition from one protein to another within this graph. This matrix is passed through iterative rounds of matrix multiplication and matrix inflation until there is little or no net change in the matrix. The final matrix is then interpreted as a protein family clustering. The inflation value parameter of the MCL algorithm is used to control the granularity of these clusters.

The MCL algorithm simulates random walks within a graph by alternation of two operators called expansion and inflation. Expansion coincides with taking the power of a stochastic matrix using the normal matrix product (i.e. matrix squaring). Inflation corresponds with taking the Hadamard power of a matrix, followed by a scaling step, such that the resulting matrix is stochastic again, i.e. the matrix elements (on each column) correspond to probability values. Expansion and inflation represent different tidal forces which are alternated until an equilibrium state is reached. An equilibrium state takes the form of a so-called doubly idempotent matrix, i.e. a matrix that does not change with further expansion of inflation steps.

Inflation operator: A column stochastic matrix is a non-negative matrix with the property that each of its columns sums to 1. M  R kxk, M >= 0, and real number r >1.  r M is the column stochastic matrix resulting from inflating each of the columns of M with power coefficient r.  r is called the inflation operator with power coefficient r. Formally, the action of  r : R kxk -> R kxk is defined by: (  r M) pq = (M pq ) r / {∑(M iq ) r ; i=1,k}. Each column j of a stochastic matrix M corresponds with node j of the stochastic graph associated with M. M corresponds with the probability of going from node j to node i. It is observed that for values of r >1, inflation changes the probabilities associated with the collection of random walks departing from one particular node by favouring more probable walks over less probable walks.

blastp proteome specific comparisons all protein significant hits Adapted from Enright et al. NAR 2002.