Presented by Yuhua Jiao 2012-12-4. Outline Limitation of some network clustering methods Hierarchical Agglomerative Clustering – Method – Performance.

Slides:



Advertisements
Similar presentations
Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Biological Networks Analysis Introduction and Dijkstras algorithm.
Advertisements

A Tutorial on Learning with Bayesian Networks
Putting genetic interactions in context through a global modular decomposition Jamal.
Dynamic Bayesian Networks (DBNs)
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
An Introduction to Variational Methods for Graphical Models.
Consistent probabilistic outputs for protein function prediction William Stafford Noble Department of Genome Sciences Department of Computer Science and.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Computational analysis of protein-protein interactions for bench biologists 2-8 September, Berlin Protein Interaction Databases Francesca Diella.
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Seeing the forest for the trees : using the Gene Ontology to restructure hierarchical clustering Dikla Dotan-Cohen, Simon Kasif and Avraham A. Melkman.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
A scalable multilevel algorithm for community structure detection
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
6. Gene Regulatory Networks
Biological networks: Types and origin
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Network Analysis and Application Yao Fu
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Introduction to biological molecular networks
Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
By Jay Krishnan. Introduction Information gathered from Proteomic techniques + neuroscientific research = Information on protein composition and function.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Dynamic Networks: How Networks Change with Time? Vahid Mirjalili CSE 891.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Graph clustering to detect network modules
Hierarchical Agglomerative Clustering on graphs
Bioinformatics 3 V6 – Biological Networks are Scale- free, aren't they? Fri, Nov 2, 2012.
Learning Deep Generative Models by Ruslan Salakhutdinov
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Recovering Temporally Rewiring Networks: A Model-based Approach
Finding modules on graphs
Dimension reduction : PCA and Clustering
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Anastasia Baryshnikova  Cell Systems 
Clustering.
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Presentation transcript:

Presented by Yuhua Jiao

Outline Limitation of some network clustering methods Hierarchical Agglomerative Clustering – Method – Performance evaluation Results and Discussion – Data preparation – Empirical evaluation – Multi-resolution view of a physical interaction network

Background of network clustering Challenges in biological network analysis – Inference of structure of subgroups of related vertices – Prediction of possible links not represented in data Network clustering is a valuable approach for – summarizing the structure in large networks, – predicting unobserved interactions – predicting functional annotations

Common limitations for some network clustering algorithms Poor resolution of top-level clusters – Stochastic block models Over-splitting of bottom-level clusters – Hierarchical network model Requirements to pre-define the number of clusters prior to analysis – Stochastic block models An inability to jointly cluster over multiple interaction types

Hierarchical network model by Clauset, Moore, and Newman (CMN)

Hierarchical Agglomerative Clustering – An approximation for optimizing a network probability motivated by CMN. – Interactions with vertices outside a group often provide more information than within-group interactions. – Power Graph Analysis is a lossless transformation of biological networks into a compact, less redundant representation, exploiting the abundance of cliques and bicliques as elementary topological motifs.

HAC-Method Notation – Graph – Groups – Edges between groups – Total possible connections – Number of holes

For a given pair of group i and j, edges between groups are result of t ij independent Bernoulli trials. The probability of observed edges, conditioned on parameter θ ij The maximum likelihood estimate of θ ij is The maximum likelihood value of P ij ( θ ij ) is HAC-Method

Given two groups: n i = 5 n j = 4 Probability density is The likelihood of the flat model An instance of flat model

HAC-Method Generalization to hierarchical model – Binary dendrogram T – Each node r in the dendrogram represents the joining of vertices in left sub-tree L(r) and vertices in right sub-tree R(r). – Er and hr are numbers of edges and holes crossing between the left and right sub-trees.

HAC-Agglomerative clustering Maximum likelihood guide tree – K top-level clusters – R total tree nodes – Merge clusters 1 and 2 into cluster 1’, defining a new model M’ 1 2 1’ Current top level

HAC-Agglomerative clustering

During the merging process, if clusters 1 and 2 are selected for merging and are both collapsed, the probability ratio is calculated, where the subscripts indicate edges and holes within and between groups. The merged cluster is collapsed if λ c ≥1. Clusters of two vertices are always merged because λ c = 1. HAC-Bayesian model selection for terminal clusters

Performance Evaluation Data preparation – BioGRID database ( – The graph is undirected and unweighted with no self edges. Other methods – Fast Modularity (CNM) – Variational Bayes modularity (VBM) – Graph Diffusion Kernel (GDK) – Heuristic merging scores Edge density (HAC-E) Combined edge density and shared neighbor density (HAC-ES) Decomposed Newman modularity Q from CNM (HAC-Q)

Link Prediction Starting with a real-world network, training networks are generated by deleting a specified fraction of edges. A test set is defined by the held-out edges and a random choice of an equal number of holes. The trained group structure provides maximum likelihood estimates for edges within and between clusters (Eq. 9). For VBM and CNM, we estimated edge densities between all pairs of clusters and within all clusters. For hierarchical models, we estimated densities between all left and right clusters at all tree levels. For GDK, each pair’s diffusion was directly used to rank pairs. Finally we assessed precision and recall of pairs in the test set ranked by link probability or GDK score.

Results and Discussion Data Preparation

Further Discussion Extending HAC to dynamic networks is limited: – A solution is required to the identifiability problem: how complexes inferred at one time point correspond to complexes inferred at other time points. – Transitions of a protein from one complex to another must be permitted by the model, requiring dynamical coupling between network snapshots. Dynamical Hierarchical Agglomerative Clustering (DHAC) – Maximum likelihood is converted to fully Bayesian statistics – The likelihood modularity is ‘kernelized’ with an adaptive bandwidth to couple network clusters at nearby time points. – Matching clusters across time points is solved with a new belief propagation method that extends Expectation-Maximization and belief propagation for bipartite matching to consistently match multiple time-evolving clusters.

ki/Interactome Interactome is defined as the whole set of molecular interactions in cells. It is usually displayed as a directed graph. Molecular interactions can occur between molecules belonging to different biochemical families (proteins, nucleic acids, lipids, carbohydrates,....) and also within a given family. When spoken in terms of proteomics, interactome refers to protein-protein interaction network(PPI), or protein interaction network (PIN). Another extensively studied type of interactome is the protein-DNA interactome (network formed by transcription factors (and DNA or chromatin regulatory proteins) and their target genes. The word "interactome" was originally coined in 1999 by a group of french scientists headed by Bernard Jacq (see Nucleic acids research 27(1):89-94; PubMed ID: ). There are now nearly 300 research articles indexed with the word "interactome" ( and more than Google pages ( It has been suggested that the size of an organism's interactome correlates better than genome size with the biological complexity of the organism (Stumpf, et al., 2008). Although protein-protein interaction maps containing several thousands of binary interactions are now available for several organisms, none of them is presently complete and the size of interactomes is still a matter of debate. Methods of mapping the interactome The study of the interactome is called interactomics. The basic unit of protein network is protein-protein interaction (PPI). Because the interactome considers the whole organism, there is a need to collect a massive amount of information. Experimental methods have been devised to determine PPI, such as 1) affinity purification and 2) yeast two hybrid (Y2H). The former is suited to identify a protein complex, while the latter is suited to explore the binary interactions in mass quantities. The former is considered as a low-throughput method (LTP), while the latter is considered as high-throughput method (HTP). There have been several efforts to map the eukaryotic interactome through HTP methods. Yeast, fly, worm, and human HTP maps have been created so far (2006).

Assortative networks ? ng Disassortative networks ? Power graph[16] ? Graph/structure degree ?