Discovering Larger Network Motifs

Slides:



Advertisements
Similar presentations
 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
Advertisements

Gene duplication models and reconstruction of gene regulatory network evolution from network structure Juris Viksna, David Gilbert Riga, IMCS,
www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
School of CSE, Georgia Tech
Predicting Enhancers in Co-Expressed Genes Harshit Maheshwari Prabhat Pandey.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Design principle of biological networks—network motif.
P RELIMINARIES –C OMPUTATIONAL P ROBLEM Given a set of real numbers, output a sequence, ( l 1, …, l i, …, l n ), where l i ≤ l i+1 for i = 1 … n-1. Naive.
Structure discovery in PPI networks using pattern-based network decomposition Philip Bachman and Ying Liu BIOINFORMATICS System biology Vol.25 no
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
Computational Biology, Part 2 Representing and Finding Sequence Features using Consensus Sequences Robert F. Murphy Copyright  All rights reserved.
Part I: Introductory Materials Introduction to Graph Theory Dr. Nagiza F. Samatova Department of Computer Science North Carolina State University and Computer.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
1 Shortest Path Calculations in Graphs Prof. S. M. Lee Department of Computer Science.
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Romain Rivière AReNa –  Characterise RNA families  Improve non-coding RNA identification in genomic data  Determine the RNA players in.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Topological Analysis in PPI Networks & Network Motif Discovery Jin Chen MSU CSE Fall 1.
DISCOVERING LARGER NETWORK MOTIFS Li Chen 4/16/2009 CSC 8910 Analysis of Biological Network, Spring 2009 Dr. Yi Pan.
Social Networks in Most Visible Form. Social Networking Techniques in Business Several social networking techniques can help us in reaching maximum number.
A Graph-based Friend Recommendation System Using Genetic Algorithm
ReferencesReferences AcknowledgementsAcknowledgements TORQUE server DefinitionsDefinitions MethodsMethods IntroductionIntroduction Experiments & Results.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Qiong Cheng, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Oct IEEE 7 th International Conference on BioInformatics.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
Li Chen 4/3/2009 CSc 8910 Analysis of Biological Network, Spring 2009 Dr. Yi Pan.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Patterns around Gnutella Network Nodes Sui-Yu Wang.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Predicting Protein Function Annotation using Protein- Protein Interaction Networks By Tamar Eldad Advisor: Dr. Yanay Ofran Computational Biology.
G LOBAL S IMILARITY B ETWEEN M ULTIPLE B IONETWORKS Yunkai Liu Computer Science Department University of South Dakota.
Introduction to Graph Theory By: Arun Kumar (Asst. Professor) (Asst. Professor)
Graph Indexing From managing and mining graph data.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Subgraph Search Over Uncertain Graphs Erşan Demircioğlu.
1 Substructure Similarity Search in Graph Databases R 陳芃安.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Graph clustering to detect network modules
Cohesive Subgraph Computation over Large Graphs
CSCI2950-C Lecture 12 Networks
Bioinformatics 3 V6 – Biological Networks are Scale- free, aren't they? Fri, Nov 2, 2012.
CS 201: Design and Analysis of Algorithms
Greedy & Heuristic algorithms in Influence Maximization
Network Motif Discovery using Subgraph Enumeration and Symmetry-Breaking by Grochow and Kellis Wooyoung Kim 4/3/2009 CSc 8910 Analysis of Biological Network,
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Data Partition Dr. Xiao Qin Auburn University.
Byung Joon Park, Sung Hee Kim
CARPENTER Find Closed Patterns in Long Biological Datasets
Association Rule Mining
Divide Areas Algorithm For Optimal Multi-Robot Coverage Path Planning
Graph Database Mining and Its Applications
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Subtree Isomorphism in O(n2.5)
Efficient Subgraph Similarity All-Matching
Modelling Structure and Function in Complex Networks
SEG5010 Presentation Zhou Lanjun.
Data Mining for Finding Connections of Disease and Medical and Genomic Characteristics Vipin Kumar William Norris Professor and Head, Department of Computer.
Analysis and design of algorithm
gApprox: Mining Frequent Approximate Patterns from a Massive Network
Survey on Coverage Problems in Wireless Sensor Networks - 2
Approximate Graph Mining with Label Costs
Presentation transcript:

Discovering Larger Network Motifs Wooyoung Kim and Li Chen 4/24/2009 CSC 8910 Analysis of Biological Network, Spring 2009 Dr. Yi Pan

Outline Project Topic Related Works Proposed Ideas Unsolved Problems

Project Topic Discovering Larger Network Motifs Given a biological network (PPI, transcriptional regulatory network, gene network, etc), find network motifs whose size is large (>15)

Related Works (1) Network Motif Discovery using subgraph enumeration and symmetry breaking motif size <=15 Given a candidate subgraph, find all symmetry subgraphs in the graph, then evaluate it by checking the frequency. Problem: How to find candidate subgraph?  Proposed solution: Cluster the whole network and find the representation at each cluster to claim that as candidate subgraphs.

Related Works (2) Motif Discovery Algorithm Exact algorithm on motifs with a small number of nodes 1. Exhaustive Recursive Search (ERS): (motif size <= 4) 2. ESU: starting with individual nodes and adding one node at a time until the required size k is reached. (motif size <=14) 3. Compact Topological Motifs

Related Works (3) Approximate Algorithms Search Algorithm Based on Sampling (MFINDER) Rand-ESU NeMoFINDER Sub-graph Counting by Scalar Computation A-priori-based Motif Detection

Related Works (4) Network Clustering Compact representation of network. Type I: minimum number of clusters Type II: maximum cohesiveness Aggregation of topological motifs (combining smaller network motifs to observe the whole structure) However, in our proposed solution, the clustering task is grouping similar network patterns together, not grouping similar nodes (sequence) together. Nor it is not used for aggregating motifs.

Proposed Ideas Given a graph G = (V,E), and t (the size of desirable motif) and k (the number of motifs), find a network motif with size t. List all graph patterns with t (or larger than t) nodes. Represent the network as an adjacency matrix A (1, -1, 0) Scan A for all t x t sub-matrices Cluster the subgraphs into k clusters Use any numerical clustering algorithms including K-means, NMF, etc. Find a subgraph representation at each cluster. Use the symmetry breaking technique to find the representation. Each representation can be a candidate of network motif.

Unsolved Problems How to cluster the graphs? The clustering algorithms to apply will be various based on what features we are using for the data. What type of clustering algorithm? Type I or type II? How to find the representation subgraph of each cluster? Should we consider network alignment first? Should we consider the sequence similarities as well? Will there be any relationship between sequence motif and network motif? Applying the sequence motif into vertex attributes matrix? compact topological motifs. Large network motif vs. small network motif

Discovering Topological Motifs Using a Compact Notation

Compact Notation Main Idea A topological motif can be represented either as a motif or as a collection of location lists of the vertices of the motif. It works in the space of the location lists so as to discover motif.

Compact Notation Method Step1: compute an exhaustive list of potential lists of vertices of motifs as compact location lists Step 2: enlarge the collection of compact location lists computed in the first step by including all the non-empty intersections, along with the differences.

Compact Notation An Example Different color indicate different attribute

Compact Notation G1’s adjacency matrices

Compact Notation Adjacency Matrix B1 (the conjugacy relationship of two lists is shown by “”) L = {ℓ1, ℓ2, ℓ3, ℓ4}

Compact Notation Initialization Step

Compact Notation Iterative Step

References [1] Bill Andreopoulos, Aijun An, Xiaogang Wang, and Michael Schroeder. A roadmap of clustering algorithms: finding a match for a biomedical application. Brief Bioinform, pages bbn058+, February 2009. [2] Alberto Apostolico, Matteo Comin, and Laxmi Parida". Bridging Lossy and Lossless Compression by Motif Pattern Discovery. Electronic Notes in Discrete Mathematics, 21:219 - 225, 2005. General Theory of Information Transfer and Combinatorics. [3] Giovanni Ciriello and Concettina Guerra. A review on models and algorithms for motif discovery in protein-protein interaction networks. Brief Funct Genomic Proteomic, 7(2):147-156, 2008. [4] Jun Huan, Wei Wang, and Jan Prins. Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. Data Mining, IEEE International Conference on, 0:549, 2003. [5] Michihiro Kuramochi and George Karypis. Finding Frequent Patterns in a Large Sparse Graph. Data Mining and Knowledge Discovery, 11(3):243-271, November 2005. [6] Laxmi Parida. Discovering Topological Motifs Using a Compact Notation. Journal of Computational Biology, 14(3):300-323, 2007.

References [7] Radu Dobrin, Qasim K. Beg, Albert-Laszlo Barabasi, and Zoltan N. Oltvai. Aggregation of topological motifs in the escherichia coli transcriptional regulatory network. BMC Bioinformatics, 5:10, 2004. [8] McKay, B.D. Isomorph-free exhaustive generation. J. Algorithms, 26:306-324, 1998 [9] Middendorf, M., Zive, E., and Wiggins, C.H. Inferring network mechanisms: the Drosophila melanogaster protein interaction network. PNAS, 102 (9):3192-3197, Mar 2005. [10]Grochow, J. A. and Kellis, M. Network motif discovery using subgraph enumeration and symmetry-breaking. In RECOMB 2007, Lecture Notes in Computer Science 4453, pp. 92-106. Springer-Verlag, 2007.

Thank you so much !