A hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles Authors: Chia-Hao Chin 1,4,

Slides:



Advertisements
Similar presentations
Bahman Bahmani  Fundamental Tradeoffs  Drug Interaction Example [Adapted from Ullman’s slides, 2012]  Technique I: Grouping 
Advertisements

VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Frequent Subgraph Pattern Mining on Uncertain Graph Data
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
University at BuffaloThe State University of New York Young-Rae Cho Department of Computer Science and Engineering State University of New York at Buffalo.
Decomposition of overlapping protein complexes: A graph theoretical method for analyzing static and dynamic protein associations Algorithms for Molecular.
Threshold selection in gene co- expression networks using spectral graph theory techniques Andy D Perkins*,Michael A Langston BMC Bioinformatics 1.
Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis Jonsson.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Mycielski’s Construction Mycielski’s Construction: From a simple graph G, Mycielski’s Construction produces a simple graph G’ containing G. Beginning with.
CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
HCS Clustering Algorithm
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.
Evidence for dynamically organized modularity in the yeast protein- protein interaction network Han, et al
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
A scalable multilevel algorithm for community structure detection
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Introduction to Graph Theory
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Network Analysis and Application Yao Fu
Efficient Gathering of Correlated Data in Sensor Networks
Information Visualization using graphs algorithms Symeonidis Alkiviadis
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Networks and Interactions Boo Virk v1.0.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Hubba: Hub Objects Analyzer—A Framework of Interactome Hubs Identification for Network Biology 吳 信 宏, Hsin-Hung Wu Laboratory.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Mycielski’s Construction Mycielski’s Construction: From a simple graph G, Mycielski’s Construction produces a simple graph G’ containing G. Beginning with.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
The Bi-Module problem: new algorithms and applications Group meeting January 2013 David Amar.
Trees Dr. Yasir Ali. A graph is called a tree if, and only if, it is circuit-free and connected. A graph is called a forest if, and only if, it is circuit-free.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Biological Network Analysis
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Modular organization.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Network-based Prediction of Protein Function by Roded Sharan, Igor Ulitsky and Ron Shamir Molecular Systems Biology2007.
Graph clustering to detect network modules
Emily Pachunka ● Spring 2017
Cohesive Subgraph Computation over Large Graphs
Finding Dense and Connected Subgraphs in Dual Networks
Biological networks CS 5263 Bioinformatics.
Time-Course Network Enrichment
Community detection in graphs
Clustered representations: Clusters, covers, and partitions
Department of Computer Science University of York
Highly correlating interaction profiles can predict functional similarity Highly correlating interaction profiles can predict functional similarity AROC.
Modelling Structure and Function in Complex Networks
SEG5010 Presentation Zhou Lanjun.
Anastasia Baryshnikova  Cell Systems 
CISC 667 Intro to Bioinformatics (Spring 2007) Genetic networks and gene expression data CISC667, S07, Lec24, Liao.
Presentation transcript:

A hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles Authors: Chia-Hao Chin 1,4, Shu-Hwa Chen 1, Chin-Wen Ho 4, Ming-Tat Ko 1,5, Chung-Yen Lin 1,2,3,5 1. Institute of Information Science, Academia Sinica, Taiwan 2. Division of Biostatistics and Bioinformatics, National Health Research Institutes, Taiwan 3. Institute of Fishery Science, College of Life Science, National Taiwan University, Taiwan 4. Department of Computer Science and Information Engineering, National Central University, Taiwan 5. Research Center of Information Technology Innovation, Academia Sinica, Taiwan

Outline Goal Method Experiment results

Detecting functional modules Identify functional modules by parsing Protein-Protein Interaction (PPI) networks into densely connected regions +

A more reliable PPI C1C1 C2C2 C3C3 C4C4 V1V V2V V3V V1V1 V2V2 V3V3 V1V1 V2V2 V3V3 V1V V2V V3V Pearson correlation threshold = 0.6 Gene expression data A PPI network

The overview of HUNTER An Example Module seeds generation Modules amalgamation Module seed growth module seeds grown modules final modules

Module seed generation Four cases for this stage input graph contain expression data UnweightedWeighted NoCase 1Case 2 YesCase 3Case 4

Module seed generation(1/4) Case 1 : –Input data is an unweighted graph. Find a maximum connected component of the subgraph induced by v's neighbors. v The Union of the vertex set of a maximum connected component and vertex v is a module seed. Union vertices of this sugraph and vertex v. This is a maximum connected component of the subgraph induced by v's neighbors. This is the subgraph induced by v's neighbors. It is composed of three connected components.

A q-connected module A vertex set U  V is q-connected if the probability is at least q for all W  U with at least one edge that connects W with U \ S. [Ulitsky et. al. 2009] a b c p( {a}, {b, c} ) = 1 - (1-0.8)*(1-0.6) = 0.92 p( {a, b}, {c} ) = 1 - (1-0.8)*(1-0.7) = 0.94 p( {a, c}, {b} ) = 1 - (1-0.6)*(1-0.7) = 0.88 If q = 0.9, then this graph is not q-connected.

q-connected We call a set of vertices U ⊆ V q-connected if, for all U’ ⊂ U, the probability that at least one edge connects U’ with U\U’ is at least q. Let E(U, W) denote the event that at least one edge connects a node from W ⊂ U with a node from U\W. Then U is q-connected if and only if P( E( U, W) ) > q for every W ⊂ U. V U W WU\W P( E( U, W) ) > q

U is q-connected if and only if < 1– q for every W ⊂ U. Assuming edge appearances are independent, we get

Find a maximum q-connected component algorithm Input data –Graph G=( V(G), E(G), p ), where p is a edge-weight function. –Threshold q Initial value –G’ = ( V(G), E(G), w ), where w = -log( 1- p(e) ) –t = -log( 1- q ) –max-q-connected component   Max-q-connected(G’) –If |V(G’)| > | V(max-q-connected component) | then If min-cut-value(G’)  t then –max-q-connected component  G’ else –(G’ 1, G’ 2 ) = min-cut-partition(G’) –Max-q-connected(G 1 ’) –Max-q-connected(G 2 ’)

Find a maximum q-connected component algorithm Input data –Graph G=( V(G), E(G), p ), where p is a edge-weight function. –Threshold q Initial value –G’ = ( V(G), E(G), w ), where w = -log( 1- p(e) ) –t = -log( 1- q ) –max-q-connected component   –candidate max-q-connected component C   Max-q-connected(G’) –Push G’ into C. –If |V(G’)| > | V(max-q-connected component) | then If min-cut-value(G’)  t then –max-q-connected component  G’ else –(G’ 1, G’ 2 ) = min-cut-partition(G’) –Max-q-connected(G 1 ’) –Max-q-connected(G 2 ’)

P i Proof Proof by contradiction –Assume candidate max-q-connected component C = { C 1, C 2,…, C k } S is a max-q-connected component, and S  C. ∵ S is a max-q-connected component, and S  C. ∴  C i  C i contains S, and  C j contains S, |V(C i ) |  |V(C j ) | ∵ S is q-connected ∴ min-cut-value(S)  t …(1) ∵ C i is not q-connected ∴ min-cut-value(C i ) < t …(2) ∵ C i is the minimum graph in C that contain S. ∴ min-cut-value(S)  min-cut-value(C i ) …(3) ∵ (1), (2) and (3) ∴ t  min-cut-value(S)  min-cut-value(C i ) < t (contradiction)  Q.E.D. S

Module seed generation(2/4) Case 2 : –Input data is a weighted graph. Find a maximum q-connected component of the subgraph induced by v's neighbors. v This subgraph is q-connected, and the vertex set of it is a module seed If a threshold q = 0.9, then this induced subgraph is not q-connected. If a threshold q = 0.9, then this induced subgraph is q-connected. If a threshold q = 0.9, then this induced subgraph is not q-connected. Is this subgraph q-connected? Find a maximum q-connected component of the subgraph induced by v's neighbors.

Module seed generation(3/4) Case 3 : –Input data is composed of an unweighted graph and gene expression data. Find a maximum connected component of the subgraph induced by v's neighbors, where the Pearson correlation of any pair of vertices is greater than a threshold. v In this subgraph, the Pearson correlation of each pair of vertices is greater than a threshold, and the vertex set of it is a module seed A blue dashed line means its Pearson correlation is less than a threshold t = 0.6 A green dashed line means its Pearson correlation is larger than a threshold t = 0.6 Check each subgraph by using gene expression data.

Module seed generation(4/4) Case 4 : –Input data is composed of a weighted graph and gene expression data. Find a maximum connected component of the subgraph induced by v's neighbors, where the Pearson correlation of any pair of vertices is greater than a threshold. v The vertex set of this subgraph is a module seed. A blue dashed line means its Pearson correlation is less than a threshold t = 0.6 A green dashed line means its Pearson correlation is larger than a threshold t = 0.6 This induced subgraph is not q-connected We check whether this subgraph is q-connected. We check each subgraph by using gene expression data. This subgraph is q-connected.

Module growth After creating a module seed, we join the neighbors of the module seed if most of their adjacent nodes also belong to the module seed. v w A module seed v w A grown module

Module amalgamation we merge any two modules if they have too many common proteins grown module 1 grown module 2 A final module

Stage 1: Module seed generation

Find q-connected component

Stage 2: Module seed growth

Stage 3: Modules amalgamation

Functional Group Verification Using Gene Ontology  Gene Ontology Three separate ontologies: Biological Process Molecular Function Cellular Component Organized as a DAG describing gene products (proteins and functional RNA) GO Annotation A GO term is associated with a gene or gene product to form a GO annotation.

p-value Given a gene ontology and term t, the p-value is the probability of observing x or more proteins in the cluster c. –N: the number of proteins annotated to a term of the GO ontology. –M: the number of proteins annotated to the GO term t. –n : the number of proteins of the cluster c. –x : the number of proteins of the cluster c which are annotated to the GO term t. N M n x

F-measure For each method, we measured –Sensitivity: the fraction of annotations that are enriched in at least one module at p-value < [Ulitsky et.al. 2009]. –Specificity: the fraction of modules enriched with at least one annotation at p-value < [Ulitsky et. al. 2009].

We compare our method with three newly developed methods CEZANNA [Ulitsky et. al. 2009] CMC [Liu et. al. 2009] Core [Leung et. al. 2009]

Check experiment results by GO

Check experiment results by golden standard databases p-value: Given a golden standard database and complex g, the p-value is the probability of observing x or more proteins in the cluster c. –N: the number of proteins in a golden standard database. –M: the number of proteins in a complex g of the golden standard database. –n : the number of proteins of the cluster c. –x : the number of proteins of the cluster c which also belong to the complex g. N M n x

Check experiment results by golden standard databases

RNA Polymerase I RNA Polymerase III RNA Polymerase II Common module for RNA polymerase I, II, III Common module for RNA polymerase I, III Common regulatory unit for RNA polymerase I, II TFIIF for RNA polymerase II A cluster of our prediction on yeast PPI

Threshold q-connected –We set q as 0.95 corresponds to an "error probability" of correlation threshold t –Initiation A complete graph given a cutoff threshold –Remove those edges whose Pearson correlation are less or equal than the threshold cutoff threshold = 0.6

Clustering coefficient k i : degree of node i E i : edges between neighbors of node i’s The density of the network surrounding node i, characterized as the number of triangles through i. i The center node has 8 (grey) neighbors There are 4 edges between the neighbors C = 2*4 /(8*(8-1)) = 8/56 = 1/7 K is the number of nodes whose degree are larger than 1.

A threshold for Pearson correlation The authors conjectured that the removed links are likely to be noise as long as the difference between the observed clustering coefficient and its randomized counterpart increases monotonically [Elo et. al. 2007]. A threshold r 0 = 0r 1 = 0.01r 100 = 1 threshold C( r i ) – C 0 ( r i ) the first local maximumC*

References Elo LL, Jarvenpaa H, Oresic M, Lahesmaa R, Aittokallio T: Systematic construction of gene coexpression networks with applications to human T helper cell differentiation process. Bioinformatics 2007, 23(16): Liu G, Wong L, Chua HN: Complex discovery from weighted PPI networks. Bioinformatics 2009, 25(15): Leung HC, Xiang Q, Yiu SM, Chin FY: Predicting protein complexes from PPI data: a core-attachment approach. J Comput Biol 2009, 16(2): Ulitsky I, Shamir R: Identifying functional modules using expression profiles and confidence-scored protein interactions. Bioinformatics 2009, 25(9):

Thank you for your attention!