Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

Ch 7.7: Fundamental Matrices
Modelling and Identification of dynamical gene interactions Ronald Westra, Ralf Peeters Systems Theory Group Department of Mathematics Maastricht University.
Using genetic markers to orient the edges in quantitative trait networks: the NEO software Steve Horvath dissertation work of Jason Aten Aten JE, Fuller.
It’s a Small World by Jamie Luo. Introduction Small World Networks and their place in Network Theory An application of a 1D small world network to model.
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Introduction to Bioinformatics
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Andy Yip, Steve Horvath Depts Human Genetics and Biostatistics, University of California, Los Angeles The Generalized Topological.
Visual Recognition Tutorial
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Centrality and Prestige HCC Spring 2005 Wednesday, April 13, 2005 Aliseya Wright.
Ch 7.9: Nonhomogeneous Linear Systems
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
衛資所 生物資訊組 陳俊宇 April 07, 03. graph nodeedge Chromosomegenepositional correlations Pathwayenzymefunctional correlations Gene expression genecoexpressed.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
Development of Empirical Models From Process Data
Steve Horvath, Andy Yip Depts Human Genetics and Biostatistics, University of California, Los Angeles The Generalized Topological.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Is my network module preserved and reproducible? PloS Comp Biol. 7(1): e Steve Horvath Peter Langfelder University of California, Los Angeles.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
1 Chapter 2 Matrices Matrices provide an orderly way of arranging values or functions to enhance the analysis of systems in a systematic manner. Their.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Ai Li and Steve Horvath Depts Human Genetics and Biostatistics, University of California, Los Angeles Generalizations of.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
“An Extension of Weighted Gene Co-Expression Network Analysis to Include Signed Interactions” Michael Mason Department of Statistics, UCLA.
A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD Université de Liege - Institut Montefiore
CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:
Finite Element Method.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
Illustration of FE algorithm on the example of 1D problem Problem: Stress and displacement analysis of a one-dimensional bar, loaded only by its own weight,
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
New Measures of Data Utility Mi-Ja Woo National Institute of Statistical Sciences.
Sharon Bruckner, Bastian Kayser, Tim Conrad Freie Uni. Berlin Finding Modules in Networks with Non-modular Regions.
Hierarchy Overview Background: Hierarchy surrounds us: what is it? Micro foundations of social stratification Ivan Chase: Structure from process Action.
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.
Complex brain networks: graph theoretical analysis of structural and functional systems.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Introduction to Matrices and Statistics in SNA Laura L. Hansen Department of Sociology UMB SNA Workshop July 31, 2008 (SOURCE: Introduction to Social Network.
Module 1Newtonian Relativity1 Module 1 Newtonian Relativity What do we mean by a “theory of relativity”? Let’s discuss the matter using conventional terminology.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Estimating standard error using bootstrap
Graph clustering to detect network modules
Matrices Introduction.
Boyce/DiPrima 10th ed, Ch 7.9: Nonhomogeneous Linear Systems Elementary Differential Equations and Boundary Value Problems, 10th edition, by William E.
Spectral clustering of graphs
Factor and Principle Component Analysis
Fig. 1 Computing the four node TOMs for nodes A,B,C,D in two simple networks 1) tA,B,C,D=0+40+6=0.667 and 2) tA,B,C,D=1+41+6= From: Network neighborhood.
Spectral methods for Global Network Alignment
Boosting and Additive Trees (2)
Topological overlap matrix (TOM) plots of weighted, gene coexpression networks constructed from one mouse studies (A–F) and four human studies including.
Spectral methods for Global Network Alignment
Volume 3, Issue 1, Pages (July 2016)
Distribution-Free Procedures
Presentation transcript:

Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24

Content Here we study network concepts in special types of networks, which we refer to as approximately factorizable networks. In these networks, the pairwise connection strength (adjacency) between 2 network nodes can be factored into node specific contributions, named node 'conformity'. Scope: Our results apply to modules in gene co- expression networks and to special types of modules in protein-protein interaction networks

Background Network concepts are also known as network statistics or network indices –Examples: connectivity (degree), clustering coefficient, topological overlap, etc Network concepts underlie network language and systems biological modeling. Dozens of potentially useful network concepts are known from graph theory. Question: How are seemingly disparate network concepts related to each other?

Review of some fundamental network concepts

Connectivity Gene connectivity = row sum of the adjacency matrix –For unweighted networks=number of direct neighbors –For weighted networks= sum of connection strengths to other nodes

Density Density= mean adjacency Highly related to mean connectivity

Centralization Centralization = 1 because it has a star topology Centralization = 0 because all nodes have the same connectivity of 2 = 1 if the network has a star topology = 0 if all nodes have the same connectivity

Heterogeneity Heterogeneity: coefficient of variation of the connectivity Highly heterogeneous networks exhibit hubs

Clustering Coefficient Measures the cliquishness of a particular node « A node is cliquish if its neighbors know each other » Clustering Coef of the black node = 0 Clustering Coef = 1 This generalizes directly to weighted networks (Zhang and Horvath 2005)

The topological overlap dissimilarity is used as input of hierarchical clustering Generalized in Zhang and Horvath (2005) to the case of weighted networks Generalized in Yip and Horvath (2007) to higher order interactions Generalized in Li and Horvath (2006) to multiple nodes

Question: What do all of these fundamental network concepts have in common? Answer: They are tensor valued functions of the off- diagonal elements of the adjacency matrix A.

CHALLENGE Challenge: Find relationships between these and other seemingly disparate network concepts. For general networks, this is a difficult problem. But a solution exists for a special subclass of networks: approximately factorizable networks Motivation: modules in larger networks are often approximately factorizable

Approximately factorizable networks and conformity

The conformity vector reduces the dimensionality of the adjacency matrix Note that the (symmetric) adjacency matrix contains n*(n-1)/2 parameters a(i,j). The conformity vector contains only n parameters CF(i) Thus, by focusing on the conformity based adjacency matrix, we effectively reduce the dimensionality of the adjacency matrix. This approximation is only valid if the network has high factorizability as defined on the next slide.

The higher F(A), the better A CF approximates A The factorizability F(A) is normalized to take on values in the unit interval [0, 1]. Empirical observation: subnetworks comprised of module genes tend to have high factorizability F(A)>0.8

Applications: modules in a) protein-protein networks b) gene co-expression networks

The Topological Overlap Matrix Can Be Considered as Adjacency Matrix Important insight for protein-protein interaction (PPI) networks: Since the matrix TopOverlap[i,j] is symmetric and its entries lie in [0, 1], it satisfies our assumptions on an adjacency matrix. Since the adjacency matrices of our PPI networks are very sparse, we replaced them by the corresponding topological overlap matrices. Roughly speaking, the topological overlap matrix can be considered as a 'smoothed out' version of the adjacency matrix.

Hierarchical clustering dendrogram and module definition. Drosophila PPI network. The color-band below the dendrogram denotes the modules, which are defined as branches in the dendrogram. Of the 1371 proteins, 862 were clustered into 28 proper modules, and the remaining proteins are colored in grey; Recall that we used TOM instead of the original adjacency matrix as weighted network between the proteins

Hierarchical clustering dendrogram and module definition. Yeast PPI network

Observation 1 Sub-networks comprised of module nodes tend to be approximately factorizable. Specifically, they have high factorizability F(A)

We use both PPI and gene co-expression network data to show empirically that subnetworks comprised of module nodes are often approximately factorizable. CAVEATS Approximate factorizability is a very stringent structural assumption that is not satisfied in general networks. Modules in gene co-expression networks tend to be approximately factorizable if the corresponding expression profiles are highly correlated, the situation is more complicated for modules in PPI networks: only after replacing the original adjacency matrix by a 'smoothed out' version (the topological overlap matrix), do we find that the resulting modules are approximately factorizable.

To reveal relationships between network concepts, we use a trick. Strictly speaking it violates our assumption on an adjacency matrix since its diagonal elements are not 1. It is very useful for defining approximate conformity based network concepts. Approximately conformity based network concepts have several theoretical advantages as we detail below.

Network Concept Functions Abstract definition: tensor-valued function of a general n × n matrix M = [mij] a general matrix. Examples

Question: Find simple relationships between approximate CF based network concepts

Major advantage of approximate CF- based network concepts: they exhibit simple relationships Relationship between heterogeneity, density, and clustering coefficient Observation 1

Observation 2 Fundamental network concepts are approximately equal to their approximate CF- based analogs in approximately factorizable networks Recall that fundamental network concepts are defined with respect to the adjacency matrix Approximate CF-based network concepts are defined with respect to the conformity vector.

Drosophila PPI module networks: the relationship between fundamental network concepts NetworkConcep (y-axis) and their approximate CF-based analogs NetworkConceptCF,app (x-axis).

Yeast PPI module networks: the relationship between fundamental network concepts NetworkConcep (y-axis) and their approximate CF-based analogs NetworkConceptCF,app (x-axis).

Yeast gene co-expression module networks: the relationship between fundamental network concepts NetworkConcept(A - I) (y-axis) and their approximate CF-based analogs NetworkConceptCF,app (x-axis).

Approximate relationships between network concepts in modules The topological overlap between two nodes is determined by the maximum of their respective connectivities and the heterogeneity. Observation 3

The mean clustering coefficient is determined by the density and the network heterogeneity in approximately factorizable networks. Other examples involve the topological overlap Thus, seemingly disparate network concepts satisfy simple and intuitive relationships in these special but biologically important types of networks. Observation 3 (cont’d)

Drosophila PPI module networks: the relationship between fundamental network concepts.

Yeast PPI module networks: the relationship between fundamental network concepts.

Yeast gene co-expression module networks: the relationship between fundamental network concepts.

Observation 4: network concepts are simple function of the connectivity in approximately factorizable networks where the last approximation assumes

Robustness to module definition In our applications, we define modules as branches of an average linkage hierarchical clustering tree based which uses the topological overlap measure as input. But our theoretical results are applicable to any approximately factorizable network. We find that the theoretical results are quite robust with respect to the underlying assumptions and are highly robust with respect to the module definition.

Summary We study network concepts in special types of networks, which we refer to as approximately factorizable networks. To provide a formalism for relating network concepts to each other, we define three types of network concepts: fundamental-, conformity-based-, and approximate conformity-based concepts. The approximate conformity-based analogs of fundamental network concepts have several theoretical advantages. 1.they allow one to derive simple relationships between seemingly disparate networks concepts. For example, we derive simple relationships between the clustering coefficient, the heterogeneity, the density, the centralization, and the topological overlap. 2.Approximate conformity-based network concepts is that they allow one to show that fundamental network concepts can be approximated by simple functions of the connectivity in module networks.

Appendix

What is the conformity? This insight leads to an iterative algorithm for computing CF, see the next slide

Monotonic algorithm for computing the conformity