DISCOVERING LARGER NETWORK MOTIFS Li Chen 4/16/2009 CSC 8910 Analysis of Biological Network, Spring 2009 Dr. Yi Pan.

Slides:



Advertisements
Similar presentations
AMCS/CS229: Machine Learning
Advertisements

Cluster Analysis: Basic Concepts and Algorithms
Hierarchical Clustering, DBSCAN The EM Algorithm
Clustering Basic Concepts and Algorithms
PARTITIONAL CLUSTERING
www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
CS690L: Clustering References:
Microarray Data Analysis (Lecture for CS397-CXZ Algorithms in Bioinformatics) March 19, 2004 ChengXiang Zhai Department of Computer Science University.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
Data Mining Techniques: Clustering
Clustering II.
Cluster Analysis.
4. Clustering Methods Concepts Partitional (k-Means, k-Medoids)
Mutual Information Mathematical Biology Seminar
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Overview Of Clustering Techniques D. Gunopulos, UCR.
Instructor: Qiang Yang
Cluster Analysis.
Introduction to Bioinformatics - Tutorial no. 12
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Clustering Unsupervised learning Generating “classes”
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Data Mining Chun-Hung Chou
Cluster Analysis Part II. Learning Objectives Hierarchical Methods Density-Based Methods Grid-Based Methods Model-Based Clustering Methods Outlier Analysis.
9/03Data Mining – Clustering G Dong (WSU) 1 4. Clustering Methods Concepts Partitional (k-Means, k-Medoids) Hierarchical (Agglomerative & Divisive, COBWEB)
Clustering Analysis of Spatial Data Using Peano Count Trees Qiang Ding William Perrizo Department of Computer Science North Dakota State University, USA.
Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles Jin Chen Sep 2012.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A Graph-based Friend Recommendation System Using Genetic Algorithm
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
Clustering.
Patterns around Gnutella Network Nodes Sui-Yu Wang.
Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 Dept. Computer Science and Information Engineering.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Mr. Idrissa Y. H. Assistant Lecturer, Geography & Environment Department of Social Sciences School of Natural & Social Sciences State University of Zanzibar.
CLUSTERING PARTITIONING METHODS Elsayed Hemayed Data Mining Course.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Clustering By : Babu Ram Dawadi. 2 Clustering cluster is a collection of data objects, in which the objects similar to one another within the same cluster.
Cluster Analysis Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
CLARANS: A Method for Clustering Objects for Spatial Data Mining IEEE Transactions on Knowledge and Data Enginerring, 2002 Raymond T. Ng et al. 22 MAR.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Graph clustering to detect network modules
CSCI2950-C Lecture 12 Networks
Data Mining Soongsil University
Network Motif Discovery using Subgraph Enumeration and Symmetry-Breaking by Grochow and Kellis Wooyoung Kim 4/3/2009 CSc 8910 Analysis of Biological Network,
Clustering in Ratemaking: Applications in Territories Clustering
Community detection in graphs
CSE572, CBS598: Data Mining by H. Liu
DATA MINING Introductory and Advanced Topics Part II - Clustering
Discovering Larger Network Motifs
CSE572, CBS572: Data Mining by H. Liu
CSE572, CBS572: Data Mining by H. Liu
Clustering Wei Wang.
CSE572: Data Mining by H. Liu
Presentation transcript:

DISCOVERING LARGER NETWORK MOTIFS Li Chen 4/16/2009 CSC 8910 Analysis of Biological Network, Spring 2009 Dr. Yi Pan

THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS

Two distinct definitions of a motif based on frequency and statistical significance Definition 1: a motif is a sub-graph that appears more than a threshold number of times. Definition 2: a motif is a sub-graph that appears more often than expected by chance. (over-presented motif)

THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS Two characteristics used to evaluate a motif Frequency: 1. Arbitrary overlaps of nodes and edges (non- identical case) 2. Only overlaps of nodes (edge-disjoint case) 3. No overlaps (edge and vertex-disjoint case)

THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS Statistical Significance: compares the obtained values of the frequencies for the observed and random networks. 1. Z-score 2. Abundance

THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS Models of Random Graphs Preserves the same degree distribution of biological networks Preserve degree sequence (search of n-node motifs) Based on geometric random networks and Poisson distribution of the degree Incorporate node clustering into model

THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS 3. Compact Topological Motifs: introduces a compact graph representation obtained by grouping together maximal sets of nodes that are ‘indistinguishable’. The graph on the left show the sets U1 and U2 as compact nodes and U1U2 as compact edge.

THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS Motif Discovery Algorithm Exact algorithm on motifs with a small number of nodes 1. Exhaustive Recursive Search (ERS): the input network is represented by an adjacency matrix M. (motif size <= 4) 2. ESU: starting with individual nodes and adding one node at a time until the required size k is reached. (motif size <=14)

THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS Approximate Algorithms 1. Search Algorithm Based on Sampling (MFINDER): it picks at random edges of the input graph until a set of k nodes obtained to get sample sub-graph and assigns weights to the samples to correct the non-uniform sampling. It scale will with large networks, but does not scale well with large motifs.

THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS 2. Rand-ESU: do not needed to compute the weights of all samples compared with MFINDER. ESU builds a tree whose leaves correspond to sub-graphs of size k while internal nodes correspond to sub-graphs of size 1 up to k-1, depending on the tree level. It assigns to each level in the tree a probability that the nodes are further explored, so as to guarantee all leaves are visited with uniform probability.

THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS 3. NeMoFINDER: combines approaches of data mining and computational biology communities. It search for repeated trees and extend them to sub-graphs. It leads to a reduction of the computation time for discovery of larger motifs, but at the cost of missing some potentially interesting sub-graphs.

THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS 4. Sub-graph Counting by Scalar Computation: it characterize a biological network by a set of measures based on scalars and functional of the adjacency matrix associated to the network. Its advantages are mathematical elegance and computational efficiency.

THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS 5. A-priori-based Motif Detection: the basic idea is if a sub- graph is frequent so are all its sub-graphs. It builds candidate motifs of size k by joining motifs of size k-1 and then evaluating their frequency.

A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS

Desirable features of clustering algorithms to evaluate Scalability Robustness Order insensitivity Minimum user-specified input Mixed data types Arbitrary-shaped clusters Point proportion admissibility: Duplicating data and re- clustering should not alter the results.

A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS Five categories clustering algorithm Partitioning Clustering Algorithm Hierarchical Clustering Algorithm Grid-based Clustering Algorithm Density-based Clustering Algorithm Model-based Clustering Algorithm Graph-based Clustering Algorithm

A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS Partition Clustering Algorithm Numerical Methods 1. K-means algorithm and Farthest First Traversal k-center (FFT) algorithm 2. K-medoids or PAM (Partitioning Around Medoids) 3. CLARA (Clustering Large Applications) 4. CLARANS (Clustering Large Applications Based upon Randomized Search) and Fuzzy K-means

A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS Discrete Methods 1. K-modes 2. Fuzzy K-modes 3. Squeezer and COOLCAT. Mixed of Discrete and Numerical Clustering Methods 1. K-prototypes

A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS Hierarchical Clustering Algorithm Divide the data into a tree of nodes, where each node represents a cluster. Two categories based on methods or purposes 1. Agglomerative vs. Divisive 2. Single vs. Complete vs. Average linkage

A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS Popular: natures can have various levels of subsets Drawbacks: 1. Slow 2. Errors are not tolerable 3. Information losses when moving the levels Two kinds of methods 1. Numerical Methods: BIRCH, CURE, Spectral clustering 2. Discrete Methods: ROCK, Chameleon, LIMBO

A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS Grid-based Clustering Algorithm Form a grid structure of cells from the input data. Then each data is distributed in a cell of the grid. STING combines a numerical grid-base clustering method and hierarchical method

A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS Density-based Clustering Algorithm Use a local density standard Clusters are dense subspaces separated by low density spaces Examples of bioinformatics application : finding the densest subspaces in interactome(protein-protein interaction) networks

A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS DBSCAN, OPTICS, DENCLUE, WaveCluster, CLIQUE use numerical values for clustering SEQOPTICS is used for sequence clustering HIERDENC (Hierarchical Density-based Clustering), MULIC (Multiple Layer Incremental Clustering), Projected (subspace) clustering, CACTUS, STIRR, CLICK, CLOPE use discrete values for clustering

A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS Model-based Clustering Algorithm Uses a model often derived by a statistical distribution Bioinformatics applications 1. gene expression 2. interactomes 3. sequences

A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS Numerical model-based methods 1. Self-Organizing Maps Discrete model-based clustering algorithm 1. COBWEB Numerical and discrete model-based clustering methods 1. BILCOM (Bi-level clustering of Mixed Discrete and Numerical Biomedical Data) using empirical Bayesian approach

A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS Examples 1. Gene expression clustering 2. Protein sequence clustering 3. AutoClass 4. SVM Clustering methods Graph-based Clustering Algorithm Applied to interactomers for complex prediction and sequence networks

A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS Examples: 1. MCODE (Molecular Complex Detection) 2. SPC (Super Paramagnetic Clustering) 3. RNSC (Restricted Neighborhood Search Clustering) 4. MCL(Markov Clustering) 5. TribeMCL 6. SPC 7. CD-HIT 8. ProClust 9. BAG algorithms

A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS Usage in Bioinformatics Applications Gene expression clustering 1. K-means algorithm 2. Hierarchical algorithm 3. SOMs Interactomes 1. AutoClass, 2. SVM clustering 3. COBSEB 4. MULIC Sequence clustering 1. Hierarchical clustering algorithm

[1] Bill Andreopoulos, Aijun An, Xiaogang Wang, and Michael Schroeder. A roadmap of clustering algorithms: finding a match for a biomedical application. Brief Bioinform, pages bbn058+, February [2] Alberto Apostolico, Matteo Comin, and Laxmi Parida". Bridging Lossy and Lossless Compression by Motif Pattern Discovery. Electronic Notes in Discrete Mathematics, 21: , General Theory of Information Transfer and Combinatorics. [3] Giovanni Ciriello and Concettina Guerra. A review on models and algorithms for motif discovery in protein-protein interaction networks. Brief Funct Genomic Proteomic, 7(2): , [4] Jun Huan, Wei Wang, and Jan Prins. Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. Data Mining, IEEE International Conference on, 0:549, [5] Michihiro Kuramochi and George Karypis. Finding Frequent Patterns in a Large Sparse Graph. Data Mining and Knowledge Discovery, 11(3): , November [6] Laxmi Parida. Discovering Topological Motifs Using a Compact Notation. Journal of Computational Biology, 14(3): , REFERENCES

Thank you so much !