Steve Horvath, Andy Yip Depts Human Genetics and Biostatistics, University of California, Los Angeles The Generalized Topological.

Slides:

Advertisements

Similar presentations

Gene Correlation Networks

Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"

Molecular Systems Biology 3; Article number 140; doi: /msb

Introduction to Bioinformatics

V4 Matrix algorithms and graph partitioning

Andy Yip, Steve Horvath Depts Human Genetics and Biostatistics, University of California, Los Angeles The Generalized Topological.

C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.

Seeing the forest for the trees : using the Gene Ontology to restructure hierarchical clustering Dikla Dotan-Cohen, Simon Kasif and Avraham A. Melkman.

Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Mutual Information Mathematical Biology Seminar

Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,

Steve Horvath University of California, Los Angeles

Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.

Is Forkhead Box N1 (FOXN1) significant in both men and women diagnosed with Chronic Fatigue Syndrome? Charlyn Suarez.

Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.

Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz

Is my network module preserved and reproducible? PloS Comp Biol. 7(1): e Steve Horvath Peter Langfelder University of California, Los Angeles.

Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.

Consensus eigengene networks: Studying relationships between gene co-expression modules across networks Peter Langfelder Dept. of Human Genetics, UC Los.

Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.

Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.

Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.

Ai Li and Steve Horvath Depts Human Genetics and Biostatistics, University of California, Los Angeles Generalizations of.

An Overview of Weighted Gene Co-Expression Network Analysis

MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.

Network Analysis and Application Yao Fu

“An Extension of Weighted Gene Co-Expression Network Analysis to Include Signed Interactions” Michael Mason Department of Statistics, UCLA.

A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong.

Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD Université de Liege - Institut Montefiore

CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:

Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.

Soft Computing Lecture 18 Foundations of genetic algorithms (GA). Using of GA.

Hierarchical Clustering and Dynamic Branch Cutting

Networks and Interactions Boo Virk v1.0.

Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.

Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.

Expression Modules Brian S. Yandell (with slides from Steve Horvath, UCLA, and Mark Keller, UW-Madison)

Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.

CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.

Networks Igor Segota Statistical physics presentation.

Top X interactions of PIN Network A interactions Coverage of Network A Figure S1 - Network A interactions are distributed evenly across the top 60,000.

Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.

More About Clustering Naomi Altman Nov '06. Assessing Clusters Some things we might like to do: 1.Understand the within cluster similarity and between.

Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.

Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.

Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.

Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.

Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.

Consensus modules: modules present across multiple data sets Peter Langfelder and Steve Horvath Eigengene networks for studying the relationships between.

Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.

Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”

Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.

Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.

 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.

Graph clustering to detect network modules

Unsupervised Learning

Fig. 1 Computing the four node TOMs for nodes A,B,C,D in two simple networks 1) tA,B,C,D=0+40+6=0.667 and 2) tA,B,C,D=1+41+6= From: Network neighborhood.

Spectral methods for Global Network Alignment

Microarray Clustering

Hierarchical clustering approaches for high-throughput data

Topological overlap matrix (TOM) plots of weighted, gene coexpression networks constructed from one mouse studies (A–F) and four human studies including.

Spectral methods for Global Network Alignment

Anastasia Baryshnikova Cell Systems

Volume 3, Issue 1, Pages (July 2016)

Self-organizing map numeric vectors and sequence motifs

Text Categorization Berlin Chen 2003 Reference:

Characteristics of tissue‐specific co‐expression networks (CNs)‏

Brandon Ho, Anastasia Baryshnikova, Grant W. Brown Cell Systems

Unsupervised Learning

Presentation transcript:

Steve Horvath, Andy Yip Depts Human Genetics and Biostatistics, University of California, Los Angeles The Generalized Topological Overlap Matrix in Biological Network Analysis

Contents Dissimilarity measures in undirected networks Dissimilarities based on shared neighbors Generalized topological overlap matrix Applications Simulation

Network Terminology Unweighted Network=adjacency matrix A=[a ij ], that encodes whether a pair of nodes is connected. –A is a symmetric matrix with entries in [0,1] –a ij =1 nodes i and j are connected else 0 HERE WE CONSIDER AN UNWEIGHTED NETWORKS Gene connectivity K= row sum of the adjacency matrix=number of direct neighbors Network Module=Subset of highly interconnected nodes

Define an Adjacency Matrix Measure of Node Dissimilarity Identify Network Modules (Clustering) Understand the biological meaning of modules and network concepts Basic Steps in Many Biological Network Analyses

What is a dissimilarity? And why do we need it? Mathematical Definition of a Dissimilarity measure 1) Symmetry: D(i,j)=D(j,i) 2) Non-negative D(i,j)>=0 3) Zero diagonal: D(i,i)=0 Major application: module detection Module=cluster of “similar” nodes Implementation: use the dissimilarity measure as input of a clustering procedure, e.g. average linkage hierarchical clustering, or partitioning around medoid clustering Aside: node dissimilarities have many other uses, e.g. to study how a node dissimilarity between 2 interacting genes changes across conditions…

Possible measures of node dissimilarity 1.Simply use 1 minus the adjacency matrix 2.Length of shortest path connecting 2 nodes 3.Our focus: measures based on number of shared neighbors –Intuition: if 2 people share the same friends they are close in a social network

Similarity based on number of shared neighbors

Standard Topological Overlap measure (Ravasz et al 2002) Generalization to unweighted networks discussed in Zhang and Horvath (2005). Generalization to multiple nodes defined in Ai Li, S Horvath (2006) Multinode topological overlap matrix.

The topological overlap measures interconnectedness for an unweighted network, one can show that the topological overlap=1 only if the node with fewer links satisfies two conditions: –(a) all of its neighbors are also neighbors of the other node, i.e. it is connected to all of the neighbors of the other node and –(b) it is linked to the other node. In contrast, top. overlap=0 if i and j are unlinked and the two nodes don't have common neighbors.

Our set theory interpretation of the topological overlap matrix

Generalizing the topological overlap matrix to 2 step neighborhoods etc Simply replace the neighborhoods by 2 step neighborhoods in the following formula Reference: Andy M. Yip and SH (2006) The Generalized Topological Overlap Matrix For Detecting Modules in Gene Networks.

Computationally simple calculation of GTOMm GTOMm can be directly calculated from A+A*A+A*A*A+A…..A where * denotes matrix mutiplication Computation time driven by m matrix multiplications of A

Summary: dissimilarity measures based on an adjacency matrix A

Defining Gene Modules =sets of tightly co-regulated genes

Module Identification based on the notion of topological overlap An important aim of metabolic network analysis is to detect subsets (modules) of nodes that are tightly connected to each other. We adopt the definition of Ravasz et al (2002): modules are groups of nodes that have high topological overlap.

Using the TOM matrix to cluster genes To group nodes with high topological overlap into modules (clusters), we typically use average linkage hierarchical clustering coupled with the TOM distance measure. Once a dendrogram is obtained from a hierarchical clustering method, we choose a height cutoff to arrive at a clustering. –Here modules correspond to branches of the dendrogram TOM plot Hierarchical clustering dendrogram TOM matrix Module: Correspond to branches Genes correspond to rows and columns

Comparison of 3 different similarities in capturing the functional class `protein biosynthesis'. (a) ADJ=GTOM0 (b) GTOM1 (c) GTOM2 The middle row shows the color bar ordered by the corresponding dendrogram but colored by the module assignment with respect to the TOM measure in (b), the bottom shows the color bar ordered by the corresponding dendrogram where genes belong to the class `protein biosynthesis' are colored in dark red. Almost all protein biosynthesis genes are grouped together by the GTOM2 measure whereas the other two measures tend to distribute the class over two modules.

Overall, modules are quite robust with respect to the GTOM measure. Smaller modules are more visible in GTOM0 and GTOM1 plots Larger modules are more pronounced in GTOM2 and GTOM3 plots ADJ=GTOM0 GTOM1 GTOM2 GTOM3 Topological Overlap Matrix Plots for different GTOM measures, yeast

Multidimensional Scaling Plots involving different GTOMs ADJ=GTOM0 GTOM1 GTOM2 GTOM3

Simple simulated example where GTOM2 is better than GTOM1 and GTOM0

Example, when GTOM2 is superior to GTOM1 or GTOM0 Top 8 GTOM2 neighbors of Node 1 are exactly Node 1 – Node 8. TOM1 neighbors of Node 1 are Node 1,4,5,8,9, Black circles: 8 closest GTOM1 neighbors of node 1

Predicting essential proteins in a fly network Idea: start with a single highly connected essential protein and consider its closest neighbors based on a dissimilarity measure One would hope that the most similar (closest) neighbors are also essential since they may be part of the same pathway Data protein-protein interaction data from Biogrid Essentiality: determined by knock-out experiments

GTOM2 outperforms GTOM1 and GTOM0 in the fly protein-protein network Y-axis proportion of essential genes among the closest neighbors X-axis size of closest neighborhood

Since the topological overlap matrix considers shared neighbors, it tends to be more robust to spurious connections. Limitation of GTOM: it requires an unweighted network (binary adjacencies) GTOM is based on pairwise overlap. In contrast, MTOM is based on multi-node overlap. Overall, GTOM0, GTOM1 and GTOM2 lead to similar clusters (modules). Our experience In most applications, we find that GTOM1 is better than GTOM0 Often GTOM1 performs better than GTOM2 But in the fly network GTOM2 is better than GTOM1 GTOMm with m>2 tends to lump nodes together  loss of resolution Discussion

What is neighborhood analysis? MTOM software Ai Li, Horvath S (2006) Network Neighborhood Analysis with the multinode topological overlap measure. Bioinformatics. doi: /bioinformatics/btl581

Many biological questions can be interpreted as neighborhood analysis Abstract definition: Find the network neighborhood of an initial (seed) set of highly interconnected nodes. Examples –A) Drosophila protein-protein interaction network: Find the neighborhood of a set of essential proteins. Hypothesis: it should be enriched with essential proteins as well  predicting knock out effect –B) Consider survival time as an idealized gene expression profile. Find the neighborhood of genes in the corresponding gene co-expression network. Hypothesis: it should be enriched with prognostic genes that are associated with survival time  variable selection –C) Yeast protein network: Find the neighborhood of a set of cell- cycle related genes. Hypothesis: it should be enriched with other cell-cycle related genes  useful for annotation

Recursive approaches for defining an MTOM neighborhood of size S Recursive approach Input a seed of starting nodes and the neighborhood size S For each node outside of the current neighborhood compute its topological overlap value with the current version of the neighborhood. Add the node with highest MTOM value to the neighborhood. Repeat b) and c) until the neighborhood size is reached.

Fly protein-protein network analysis (BioGrid Data) Goal: study the neighborhood of highly connected essential genes Record the proportion of genes that remain essential as a function of different seed genes

Drosophila, protein-protein network Using 3 genes as seed of recursive MTOM analysis leads to neighborhoods with the highest percentage of essential genes Screening: Naïve, non-rec, recurs, recurs, recurs Initial:

Brain Cancer Network Application I: Finding the neighborhood of 5 cancer genes in a brain cancer gene co-expression network A major advantage of the MTOM approach is that it allows one to input more than 1 probe set as initial neighborhood. In this application, we were interested in finding the neighborhood of five highly correlated cell mitosis related cancer genes: TOP2A, Rac1, TPX2, EZH2 and KIF14. Neighborhood size = 20 Out of 20 probes 13 are cancer related.

Brain Cancer Network Application II: Finding the Neighborhood of a Clinical Outcome (Survival Time) Recursive MTOM based neighborhood of size S=20 of the patient survival time (TTS) Result: highly enriched in cancer- and neuron related genes: 11 probe sets are related to neuron cells 10 probe sets are related to cancers. A standard approach which simply selects a neighborhood on the basis of the absolute values of the correlations between gene expression profile and survival time, leads to a neighborhood with fewer cancer- and neuron related genes. only 4 probe sets are related to neuron cells and 6 probe sets are related to cancer.

Acknowledgement Andy Yip (GTOM) Ai Li (MTOM)

Webpages and References This talk and relevant R code Yip A, Horvath S (2006) The Generalized Topological Overlap Matrix For Detecting Modules in Gene Networks Proceedings Volume Gene Networks: Theory and Application Workshop at BIOCOMP'06, Las Vegas Ai Li, Steve Horvath (2006) The Multi-Point Topological Overlap Matrix for Gene Neighborhood Analysis. Proceedings Volume Gene Networks: Theory and Application Workshop at BIOCOMP'06, Las Vegas Bin Zhang and Steve Horvath (2005) "A General Framework for Weighted Gene Co-Expression Network Analysis", Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1, Article Yeast Co-Expression Network MRJ Carlson, B Zhang, Z Fang, PS Mischel, S Horvath, SF Nelson, Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression networks", BMC Genomics 2006, 7:40 (3 March 2006).