Topological Analysis in PPI Networks & Network Motif Discovery Jin Chen MSU CSE891-001 2012 Fall 1.

Slides:



Advertisements
Similar presentations
Network analysis Sushmita Roy BMI/CS 576
Advertisements

Biological Networks Analysis Degree Distribution and Network Motifs
www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
The Architecture of Complexity: Structure and Modularity in Cellular Networks Albert-László Barabási University of Notre Dame title.
An Intro To Systems Biology: Design Principles of Biological Circuits Uri Alon Presented by: Sharon Harel.
Analysis and Modeling of Social Networks Foudalis Ilias.
Network Properties 1.Global Network Properties ( Chapter 3 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber) 1)Degree distribution.
School of Information University of Michigan Network resilience Lecture 20.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Frequent Subgraph Pattern Mining on Uncertain Graph Data
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
Mining and Searching Massive Graphs (Networks)
Biological Networks Feng Luo.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
1. Elements of the Genetic Algorithm  Genome: A finite dynamical system model as a set of d polynomials over  2 (finite field of 2 elements)  Fitness.
Structure discovery in PPI networks using pattern-based network decomposition Philip Bachman and Ying Liu BIOINFORMATICS System biology Vol.25 no
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Gene and Protein Networks II Monday, April CSCI 4830: Algorithms for Molecular Biology Debra Goldberg.
Global topological properties of biological networks.
Network Motifs: simple Building Blocks of Complex Networks R. Milo et. al. Science 298, 824 (2002) Y. Lahini.
Network Motifs Zach Saul CS 289 Network Motifs: Simple Building Blocks of Complex Networks R. Milo et al.
Advanced Topics in Data Mining Special focus: Social Networks.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE
A graph theory approach to characterize the relationship between protein functions and structure of biological networks Serene Wong March 15, 2011.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.
CSE Selected Topics in Bioinformatics
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
School of Information University of Michigan SI 614 Network subgraphs (motifs) Biological networks Lecture 11 Instructor: Lada Adamic.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.
Complex Networks First Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Analysis of biological networks Part III Shalev Itzkovitz Shalev Itzkovitz Uri Alon’s group Uri Alon’s group July 2005 July 2005.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.
A P ARALLEL A LGORITHM FOR E XTRACTING T RANSCRIPTIONAL R EGULATORY N ETWORK M OTIFS Fu Rong Wu.
Li Chen 4/3/2009 CSc 8910 Analysis of Biological Network, Spring 2009 Dr. Yi Pan.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
LECTURE 2 1.Complex Network Models 2.Properties of Protein-Protein Interaction Networks.
Introduction to biological molecular networks
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
Bioinformatics Center Institute for Chemical Research Kyoto University
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
March 3, 2009 Network Analysis Valerie Cardenas Nicolson Assistant Adjunct Professor Department of Radiology and Biomedical Imaging.
Constructing and Analyzing a Gene Regulatory Network Siobhan Brady UC Davis.
Class 2: Graph Theory IST402. Can one walk across the seven bridges and never cross the same bridge twice? Network Science: Graph Theory THE BRIDGES OF.
Robustness, clustering & evolutionary conservation Stefan Wuchty Center of Network Research Department of Physics University of Notre Dame title.
Graph Indexing From managing and mining graph data.
1 Lesson 12 Networks / Systems Biology. 2 Systems biology  Not only understanding components! 1.System structures: the network of gene interactions and.
Netlogo demo. Complexity and Networks Melanie Mitchell Portland State University and Santa Fe Institute.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Bioinformatics 3 V6 – Biological Networks are Scale- free, aren't they? Fri, Nov 2, 2012.
Biological networks CS 5263 Bioinformatics.
Biological Networks Analysis Degree Distribution and Network Motifs
Discovering Larger Network Motifs
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Modelling Structure and Function in Complex Networks
SEG5010 Presentation Zhou Lanjun.
Approximate Graph Mining with Label Costs
Presentation transcript:

Topological Analysis in PPI Networks & Network Motif Discovery Jin Chen MSU CSE Fall 1

Layout Topological properties of real networks – Degree distribution (power-law & exponential) – Path distance (small-world, non-small-world) Network motif – Definitions – Algorithms 2

WWW has power-law degree distribution Distribution of links on the www a)Outgoing links. The tail of the distributions follows P(k)≈k -r, with r out =2.45 b)Incoming links, and r in =2.1 c)Average of the shortest path between two documents as a function of system size R. Albert, H. Jeong, A.-L. Barabási, Nature 401, 130 (1999) The degree distribution scales as a power-law 3

Power grid has exponential degree distribution R. Albert et al, Phys. Rev. E 69, (R) (2004) 4

Metabolic networks have a power-law degree distribution H. Jeong et al., Nature 407, 651 (2000) Archaeoglobus fulgidus E. coli Caenorhabditis elegans All 5

Regulatory Network of E. Coli has out-degree power- law distribution & in-degree exponential distribution Shen-Orr et al. Nature Genetics 31, (2002) from RegulonDB (Salgado et al. 2006) The distribution of the number of transcription factors controlling a gene is exponential The distribution of the number of genes regulated by a transcription factor is power-law with an average of ~5 6

Small-world networks A small-world network is a network in which most nodes are not neighbors of one another, but most nodes can be reached from every other by a small number of hops or steps A small-world network is defined as: Small-world properties are found in many real-world phenomena where L is the distance between two randomly chosen nodes; N is the number of nodes N in the network 7

Six degrees of separation Six degrees of separation = everyone is on average approximately six steps away from any other person on Earth But if persons are linked if they knew each other, then the number of degrees of separation between Albert Einstein and Alexander the Great is almost certainly greater than

Relationship btw. power-law & small-world If a network has a degree-distribution which can be fit with a power law distribution, it is taken as a sign that the network is small-world But a small-world network is not necessary to have power-law distribution (e.g. clique) 9

Robustness Barabasi AL hypothesized that the prevalence of small world networks in biological systems may reflect an evolutionary advantage of such an architecture One possibility is that small-world networks are more robust to perturbations than other network architectures It would provide an advantage to biological systems that are subject to damage by mutation or viral infection 10

True PPIs fit small-world, false PPIs distributed randomly Hypothesis: true PPIs fit the pattern of a small-world network; false PPIs are distributed randomly in the network By studying the local cohesiveness for each PPI, true and false PPIs can be separated – Incorporate a set of clustering coefficient measures of neighborhood cohesiveness – Look for “network motifs” as an index of how well the PPIs are locally connected Debra S. Goldberg, Frederick P. Roth (2003). PNAS, 100(8) 4372–4376.

“Network Motifs: Simple Building Blocks of Complex Networks” – Focused on directed, cyclic subgraphs of 3 or 4 nodes in yeast (no self-loops) – Used exhaustive enumeration and random networks as a comparison Concept of Network Motif Milo et al. Science (2002) Vol. 298 no pp

In the 13 possible 3 node networks, one predominates in gene expression networks (Feed forward loop) In the 199 possible 4 node networks, one predominates (bi- fan) Concept of Network Motif X Z Y X Z Y W Feed Forward loop Bi-fan 13

14

Efficient sampling algorithm for detecting network motifs – Focused on directed, cyclic graphs – Used a sampling approach to estimate motif frequency – Found motifs of size 6 & 7 Concept of Network Motif Kashtan et.al. Bioinformatics (2004) Volume20, Issue11 Pp

Problem Definition Given a PPI network – Unlabelled & undirected subgraphs – Find repeated and unique motifs of size 2 to K (5 to 25) Mining Maximal Frequent Subgraphs from Graph Databases (SPIN, FSSM) – Looks for frequent labelled subgraphs from a database of graphs – Counts whether a subgraph occurs at least once in a graph 16 Huan et al. SIGKDD (2004)

Tough problem 1.Number of motifs increases exponentially with size 2.Motifs frequency is not A priori 3.Graph isomorphism does not have polynomial solution 17 Concepts of frequency f1: allow arbitrary overlaps of nodes & edges---NOT DOWNWARD CLOSURE! f2: allow overlaps of nodes but edges disjoint f3: no overlap allowed (edge and node-disjoint)

Algorithm parameters Input a Protein-Protein Interaction (PPI) network G – K : maximal motif size – F : frequency threshold – S : uniqueness threshold Output set U of frequent and unique motifs of size 3 to K Since motifs are small (2 to 25 nodes), use adjacency matrices. Further, represent motifs as Canonical Adjacency Matrices (CAM) 18 Chen et al SIGKDD 2006

Find Repeated size-k Trees Given a graph G Let K = 5 (max motif size) Let F = 2 (min frequency) Let S = 0.95 (uniqueness threshold) G 19

Find Repeated size-k Trees Find all subgraphs of size 2 to 5. Fig 2. Size 2 to 5 trees t2t2 t3t3 t 4_1 t 4_2 t 5_1 t 5_2 t 5_3 20

Find Repeated size-k Trees Occurences of t 4_1 in G

Find Repeated size-k Trees Treet2t2 t3t3 t 4_1 t 4_2 t 5_1 t 5_2 t 5_3 Freq t2t2 t3t3 t 4_1 t 4_2 t 5_1 t 5_2 t 5_3 F = 2 22

Find Repeated size-k Trees Remaining frequent trees t2t2 t3t3 t 4_1 t 4_2 t 5_2 t 5_3 T 2 = T 3 = T 4 = T 5 = 23

Use Repeated Size-k Trees to Partition Graph Take each graph in Tk and use it to partition G (i.e. T4) GD4 24

Perform graph join operation to find repeated size-k graphs 25 t 4_1 t 4_2

Perform graph join operation to find repeated size-k graphs Generate all k-node, k-1 edge graphs from each graph in T k. (i.e. 4-node, 3-edge subgraphs from T 4 ) t 4_1 t 4_2 & & & h1h1 h2h2 h3h3 h4h4 h5h5 26

Perform graph join operation to find repeated size-k graphs Join each tree with it’s cousins to produce frequent motif candidates C k. t 4_1 t 4_2 & & & h1h1 h2h2 h3h3 h4h4 h5h5 C4C4 27

Perform graph join operation to find repeated size-k graphs Count the frequency of each graph C k in GD k. GD g 1_2 g 1_1 F = 4 F = 2 28

Generate k node, k+1 edge graphs from k node, k edge graphs Perform graph join operation to find repeated size-k graphs. g 1_2 h6h6 g2g2 F = 2 in GD 4 move edge merge 29

Graph Cousins Not allowed to join in this state Only consider the spanning trees and the subgraphs created from them Use the properties of the cousins to trim the number of graph isomorphism tests that need to be done Since the spanning trees partition the graph space, using these cousins saves some time 30

Graph Cousins Type I : Direct Cousin h is isomorphic to a subgraph which has the same number of nodes & edges as g and g != h g h g’ is a Type I cousin of because is isomorphic to 31

Graph Cousins GD 4 g h G 4_1 G 4_2 G 4_3 G 4_4 G 4_5 G 4_1 G 4_2 G 4_3 G 4_5 32

Graph Cousins GD 4 gh G 4_1 G 4_2 G 4_3 G 4_4 G 4_5 G 4_1 G 4_2 G 4_3 G 4_5 33

Graph Cousins Type II : Twin Cousin h is isomorphic to a subgraph g. h g is isomorphic to 34

Graph Cousins Type III : Distant Cousin h is a disconnected subgraph of g. h g is a disconnected subgraph of 35

Graph Cousins Type III : Distant Cousin h is a disconnected subgraph of g. is a disconnected subgraph of h g 36

Saves time when counting graph frequency GD k partitions the network into several subgraphs If they can limit the isomorphism search to a subset of those graphs, they can save time Graph Cousins 37

Determine subgraph frequency in random networks A frequent subgraphs may appear frequently by chance In order to determine the significance of a subgraph, generate random networks with the same number of node and the same number of edges Also impose the constraint that each node must have the same number of neighbors as it’s counterpart in the real network 38

Performance Test Uetz dataset : 957 PPIs, 104 proteins – In budding yeast MIPS CYGD dataset : PPIs, 4341 proteins – Also in budding yeast Compared with – Exhaustive enumeration – Sampling – FPF 39

Performance : runtime ~2.8 hrs F = 50 U =

Performance : runtime ~2.8 hrs 41

Performance : max. motif size 42