Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Max Cut Problem Daniel Natapov.
Fast Algorithms For Hierarchical Range Histogram Constructions
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Putting genetic interactions in context through a global modular decomposition Jamal.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Predicting domain-domain interactions using a parsimony approach Katia Guimaraes, Ph.D. NCBI / NLM / NIH.
A hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles Authors: Chia-Hao Chin 1,4,
Lecture 21: Spectral Clustering
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
How many transcripts does it take to reconstruct the splice graph? Introduction Alternative splicing is the process by which a single gene may be used.
Mutual Information Mathematical Biology Seminar
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Clustering (Part II) 10/07/09. Outline Affinity propagation Quality evaluation.
Simulation and Application on learning gene causal relationships Xin Zhang.
Lecture 11. Matching A set of edges which do not share a vertex is a matching. Application: Wireless Networks may consist of nodes with single radios,
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
Joint analysis of genetic and physical interactions in S. cerevisiae Igor Ulitsky Ron Shamir lab School of Computer Science Tel Aviv University.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Lecture 11. Matching A set of edges which do not share a vertex is a matching. Application: Wireless Networks may consist of nodes with single radios,
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Data Presentation.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Efficient Gathering of Correlated Data in Sensor Networks
Gene Set Enrichment Analysis (GSEA)
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genecentric: Finding Graph Theoretic Structure in High- Throughput Epistasis Data Andrew Gallant, Max Leiserson, M. Kachalov, Lenore Cowen, Ben Hescott.
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Statistical Testing with Genes Saurabh Sinha CS 466.
IMPROVED RECONSTRUCTION OF IN SILICO GENE REGULATORY NETWORKS BY INTEGRATING KNOCKOUT AND PERTURBATION DATA Yip, K. Y., Alexander, R. P., Yan, K. K., &
Introduction to biological molecular networks
Cluster validation Integration ICES Bioinformatics.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
The Bi-Module problem: new algorithms and applications Group meeting January 2013 David Amar.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Time-Course Network Enrichment
Statistical Testing with Genes
Minimum Spanning Tree 8/7/2018 4:26 AM
Large Scale Data Integration
1 Department of Engineering, 2 Department of Mathematics,
Spectral methods for Global Network Alignment
SEG5010 Presentation Zhou Lanjun.
Anastasia Baryshnikova  Cell Systems 
Statistical Testing with Genes
Presentation transcript:

Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways Lenore Cowen Tufts University

Protein-protein interaction

PPI: A simple graph model vertices ↔ genes/proteins edges ↔ physical interactions simplifications: undirected loses temporal information difficult to decompose into separate processes conflates different PPI types into one class of "physical interactions"

Current data High-throughput methods are allowing us to fill in many edges in our simple model, often between unannotated proteins.

What we want: What we have: Question: Can we infer anything about "real" pathways from the low-resolution graph model of pairwise interactions?

Interaction types We distinguish here between two types of interaction: – physical interactions – genetic interactions

Genetic interactions (epistasis) Only 18% of yeast genes are essential (the yeast dies when they’re removed). yeast. essential gene.gene deleted. yeast dies.

Genetic interaction: synthetic lethality nonessential gene.gene deleted. yeast dies. nonessential gene. yeast lives. gene deleted. both genes deleted at once. Some pairs of nonessential genes exhibit interesting correlative relationships.

Nonessential Genes – Some genes are non-essential because they are only required under certain conditions (i.e. an enzyme to metabolize a particular nutrient). – Other genes are non-essential because the network has some built-in redundancy. One gene (completely or partially) compensates for the loss of another. One functional pathway (completely or partially) compensates for the loss of another.

Redundant pathways and synthetic lethality

Kelley and Ideker (2005): Between-Pathway Model (BPM)

In reality, the data are very incomplete: Between-Pathway Model (BPM)

Kelley and Ideker (2005) Goal: detect putative BPMs in yeast interactome Method: 1)find densely-connected subsets of the physical protein-protein interaction (PI) network (putative pathways) 2)check the genetic interaction (GI) network to see if patterns in density of genetic interactions correlate with these putative pathways 3)check resulting structures for overrepresentation of biological function (gene set enrichment) and Ulitsky and Shamir (2007)

Kelley and Ideker (2005) and Ulitsky and Shamir (2007) (1)(2) (3) enriched for function X enriched for function Y

Kelley and Ideker (2005) Problems: – Sparse data limits the potential scope of discovery – independent validation is difficult and Ulitsky and Shamir (2007)

Our method We show how to systematically search for stable bipartite subgraphs (putative BPMs) We use only synthetic lethality interactions to search for BPMs: – allows the use of PIs for independent statistical validation of putative BPMs – scope of potential discovery is greater than when using PIs as seed structures

How should we look for bipartite subgraphs?

Maximum bipartition Definition: Given any graph G, a maximum bipartition of G is an assignment of each node of G to one of two sets, A and B, in such a way that the number of edges that CROSS the partition is maximized.

Maximum bipartition Definition: Given any graph G, a maximum bipartition of G is an assignment of each node of G to one of two sets, A and B, in such a way that the number of edges that CROSS the partition is maximized. Fact: Maximum bipartition is NP-hard.

We don’t want a maximum bipartition anyway! We don’t want to force a choice of sides!

Maximal bipartition Definition: Given any graph G, a maximal bipartition of G is an assignment of each node of G to one of two sets, in such a way that moving any single node from one set to the other does not increase the number of edges of G which cross between the two sets.

Maximal bipartition

Algorithm Randomly assign a set-label to each node in G. Call a node v “happy” if at least half of its neighbors are in the opposite set from v, and “unhappy” otherwise. While there exists an unhappy node: – Pick one such node at random. – Flip its set label.

Algorithm (an “unhappy” node flips to “happy.”)

Algorithm Claim: This procedure terminates in at most |E| steps, where |E| is the number of edges in G. Proof: While a particular node may switch its affiliation many times over the course of the algorithm, notice that each time a flip is performed, the number of edges crossing between the two partitions increases by at least one. So there can be at most |E| steps.

Algorithm Claim: On termination, every node is “happy.” Proof: [This is just the termination condition of the while-loop.] Observe that the partition generated in this way is maximal: flipping any single node cannot increase the number of edges crossing between partitions, because all nodes are happy.

Stable Bipartite Subgraph: Motivation If a gene exists within a BPM, then we expect the two pathways of the BPM to fall into opposite sets within most maximal partitions (because the partitioning algorithm is looking to maximize the number of edges crossing between sets). So in a maximal partition, genes in the same pathway as a BPM gene g should tend to be assigned to the same set as g; those in the opposite pathway should wind up in the opposite set; and those in neither pathway should bounce around with little or no correlation to g’s set-assignment.

Stable Bipartite Subgraph Definition: For a node m, repeat this procedure k times to find maximal bipartite subgraphs. Let A be the set of nodes that occur in the same partition as m at least r percent of the time. Let B be the set of nodes that occur in the opposite partition of m at least r percent of the time. Return A and B as m’s stable bipartite subgraph.

Stable Bipartite Subgraph Definition: For a node m, repeat this procedure k times to find maximal bipartite subgraphs. Let A be the set of nodes that occur in the same partition as m at least r percent of the time. Let B be the set of nodes that occur in the opposite partition of m at least r percent of the time. Return A and B as m’s stable bipartite subgraph. The stable bipartite subgraphs are our BPMs! (k=250; r= 70 percent)

Test Datasets original physical + genetic interaction data used in Kelley + Ideker (2005) up-to-date set of physical + genetic interactions taken from BioGRID database (October 2007) 1,678 genes (nodes) 6,818 edges (SL interactions) 682 genes (nodes) 1,858 edges (SL interactions)

Return Stable BPMs?

Example BPM

How do we know it is meaningful? Biological validation: Enrichment results. We find things that are known to be functionally related in our putative pathways. [GO Enrichment] Statistical validation: - Location of known PI edges - Prediction of new SL edges

Results Network BPMsSL edges covered %Enrich. pathways Kelley& Ideker G / % Our Results G6021, / % Ulitsky& ShamirA G’140<3, / % Ulitsky& ShamirB G’270<3, / % Our Results G*1,5104, / %

Results SGD GO-SLIM coverage Ulitsky + ShamirUs 46.3%79.8%

Results: Dually-enriched BPMs

Results: Differentially-enriched BPMs

Example BPM

Website

Website

Website

Results: BPM Validation In addition to validation based on coherence of biological function, we can also statisticially validate our methods directly from the structure of the network! Method 1: Examine the distribution of known PIs within each BPM.

Results: BPM Validation Goal: estimate the probability of seeing as many or fewer physical interactions between the two sets as were actually observed.

Results: BPM Validation

Method 2: Examine the distribution of new SL interactions appearing within each BPM in the Kelley/Ideker network.

Results: BPM Validation Goal: estimate the probability of seeing as many or more new synthetic-lethality interactions appearing between the two sets as were actually observed.

Results: BPM Validation Results: Across the set of 175 candidate BPMs from G which contained at least 20 new SL edges in G +, the average probability that the observed between-pathway bias would occur by chance was Since these new edges were not used to construct candidate BPMs in G, their distribution bias provides independent support for the hypothesis that stable subgraphs do indeed correspond to biologically meaningful structures.

Validation: Microarray Data Rosetta compendium (Hughes et al, 2000): -- contains yeast expression profiles of 276 deletion mutants: i.e. for each gene in the yeast genome, measures how its expression levels change when particular gene g is deleted, as compared to wildtype yeast.

Delete a gene in pathway 1; see if changes in pathway 2 coherent

log10 ratio BPM Deleted Gene Pathway restriction Sort

At step i: N to 1 Calculate weighted percent of genes in pathway seen so far and precent of genes not in pathway: Score is max difference

Using a permutation test we sample 99 random subsets of genes the same size as the pathway We calculate the cluster rank score for each of these 99 sets We sort the test plus the pathway score The p-value is the percentile A pathway is validated if its p-value is <=0.1 How to validate a pathway

Delete a gene in pathway 1; see if changes in pathway 2 coherent We call a pathway “Validated” if its Cluster Rank Score has p-value <.1

Kelley-Ideker Histogram of the Lowest CRS per Pathway per BPM This histogram displays all the CRS scores from all of the results from Kelley and Ideker’s BPMs bucketed according to their lowest p value score. The p value scores <= 0.10 indicate a validated BPM.

Ulitskyi Histogram of the Lowest CRS per Pathway per BPM This histogram displays all the CRS scores from all of the results from Ulitskyi’s BPMs bucketed according to their lowest p value score. The p value scores <= 0.10 indicate a validated BPM.

Ma Histogram of the Lowest CRS per Pathway per BPM This histogram displays all the CRS scores from all of the results from Ma’s BPMs bucketed according to their lowest p value score. The p value scores <= 0.10 indicate a validated BPM.

Brady Histogram of the Lowest CRS per BPM This histogram displays all the CRS scores from all of the results from Brady’s BPMs bucketed according to their lowest p value score. The p value scores <= 0.10 indicate a validated BPM. Clearly, Brady’s BPMs are disproportionately represented in the lower p value range.

Results BPM dataset# paths hit knockouts # validated pathways % validated pathways Kelley-Ideker (05) % Ulitsky- Shamir (07) 36514% Ma et al. (08) 54611% Our results %

A Tantalizing Peek of What We can Do With More Data! A heat map of the differential expression of yeast genes in pathway 2 in response to the deletion of two different genes (SHE4 and GAS1) from pathway 1 in a validated BPM of Ma et al.

A random-gene validation test couples the two pathways together

Co-authors and collaborators Arthur Brady Noah Daniels Ben Hescott Max Leiserson Kyle Maxwell Donna Slonim

thanks.

A Graph Theory Problem Our algorithm samples from the maximal bipartite subgraphs. With what distribution? Is it uniform? Proportional to the number of edges that cross the cut?? ??? What are the properties of the stable bipartite subgraphs of the synthetic lethal network? Are they conserved across species?

Approach Run the partitioning algorithm 250 times on the yeast SL network (G). For each gene g in G, – Construct a set A consisting of g and all nodes in G which wind up in the same set as g at least 70% of the time. – Construct another set B consisting of all nodes in G which wind up in the opposite set from g at least 70% of the time. We call the subgraph of G defined by A and B the “stable bipartite subgraph of g”, and designate it as a candidate BPM.