Lecture 3 Protein binding networks. C. elegans PPI from Li et al. (Vidal’s lab), Science (2004) Genome-wide protein binding networks Nodes - proteins.

Slides:



Advertisements
Similar presentations
Unravelling the biochemical reaction kinetics from time-series data Santiago Schnell Indiana University School of Informatics and Biocomplexity Institute.
Advertisements

The Architecture of Complexity: Structure and Modularity in Cellular Networks Albert-László Barabási University of Notre Dame title.
Traffic-driven model of the World-Wide-Web Graph A. Barrat, LPT, Orsay, France M. Barthélemy, CEA, France A. Vespignani, LPT, Orsay, France.
The multi-layered organization of information in living systems
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Biological networks: Types and sources Protein-protein interactions, Protein complexes, and network properties.
Biological networks: Types and sources Protein-protein interactions, Protein complexes, and network properties.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
UC Davis, May 18 th 2006 Introduction to Biological Networks Eivind Almaas Microbial Systems Division.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Emergence of Scaling in Random Networks Barabasi & Albert Science, 1999 Routing map of the internet
Detecting topological patterns in complex networks Sergei Maslov Brookhaven National Laboratory.
Detecting topological patterns in protein networks Sergei Maslov Brookhaven National Laboratory.
Extracting hidden information from knowledge networks Sergei Maslov Brookhaven National Laboratory, New York, USA.
Scale Free Networks Robin Coope April Abert-László Barabási, Linked (Perseus, Cambridge, 2002). Réka Albert and AL Barabási,Statistical Mechanics.
A Real-life Application of Barabasi’s Scale-Free Power-Law Presentation for ENGS 112 Doug Madory Wed, 1 JUN 05 Fri, 27 MAY 05.
Mass-action equilibrium and non-specific interactions in protein interaction networks Sergei Maslov Brookhaven National Laboratory.
Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
Protein Binding Networks: from Topology to Kinetics Sergei Maslov Brookhaven National Laboratory.
Biological networks: Types and origin Protein-protein interactions, complexes, and network properties Thomas Skøt Jensen Center for Biological Sequence.
Propagation of noise and perturbations in protein binding networks Sergei Maslov Brookhaven National Laboratory.
Global topological properties of biological networks.
Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.
1 Protein-Protein Interaction Networks MSC Seminar in Computational Biology
BACKGROUND E. coli is a free living, gram negative bacterium which colonizes the lower gut of animals. Since it is a model organism, a lot of experimental.
Statistical Physics of Complex Networks Shai Carmi Thesis defense June 2006 Protein Interaction Networks.
Biological networks: Types and origin
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. (1999). Detecting protein function and protein-protein interactions from genome sequences.
Protein Classification A comparison of function inference techniques.
Interaction Networks in Biology: Interface between Physics and Biology, Shekhar C. Mande, August 24, 2009 Interaction Networks in Biology: Interface between.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
九大数理集中講義 Comparison, Analysis, and Control of Biological Networks (3) Domain-Based Mathematical Models for Protein Evolution Tatsuya Akutsu Bioinformatics.
Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970) Power laws, scalefree networks, the.
Network Analysis and Application Yao Fu
Interactions and more interactions
Improving PPI Networks with Correlated Gene Expression Data Jesse Walsh.
Network Biology Presentation by: Ansuman sahoo 10th semester
School of Information University of Michigan SI 614 Network subgraphs (motifs) Biological networks Lecture 11 Instructor: Lada Adamic.
Finish up array applications Move on to proteomics Protein microarrays.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
Complex Networks First Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Unravelling the biochemical reaction kinetics from time-series data Santiago Schnell Indiana University School of Informatics and Biocomplexity Institute.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Percolation and diffusion in network models Shai Carmi, Department of Physics, Bar-Ilan University Networks Percolation Diffusion Background picture: The.
LECTURE 2 1.Complex Network Models 2.Properties of Protein-Protein Interaction Networks.
Biol 729 – Proteome Bioinformatics Dr M. J. Fisher - Protein: Protein Interactions.
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metabolic Network Science, Vol 292, Issue 5518, , 4 May 2001.
3-D Structural Analysis of Protein Interaction Networks Gives New Insight Into Protein Function, Network Topology and Evolution CSB Seminar Philip M. Kim,
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
1 Lesson 12 Networks / Systems Biology. 2 Systems biology  Not only understanding components! 1.System structures: the network of gene interactions and.
Network Analysis Goal: to turn a list of genes/proteins/metabolites into a network to capture insights about the biological system 1.Types of high-throughput.
Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Lecture II Introduction to complex networks Santo Fortunato.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Structures of Networks
Bioinformatics 3 V6 – Biological Networks are Scale- free, aren't they? Fri, Nov 2, 2012.
Biological networks CS 5263 Bioinformatics.
Network Analysis of Protein-Protein Interactions
Biological Networks Analysis Degree Distribution and Network Motifs
Presentation transcript:

Lecture 3 Protein binding networks

C. elegans PPI from Li et al. (Vidal’s lab), Science (2004) Genome-wide protein binding networks Nodes - proteins Edges - protein-protein binding interactions Functions Structural (keratin) Functional (ribosome) regulation/signaling (kinases) etc

How much data is out there? Species Set nodes edges # of sources S.cerevisiae HTP-PI 4,500 13,000 5 LC-PI 3,100 20,000 3,100 D.melanogaster HTP-PI 6,800 22,000 2 C.elegans HTP-PI 2,800 4,500 1 H.sapiens LC-PI 6,400 31,000 12,000 HTP-PI 1,800 3,500 2 H. pylori HTP-PI 700 1,500 1 P. falciparum HTP-PI 1,300 2,800 1

Yeast two-hybrid technique uses two “hybrid proteins”: bait X* (X fused with Gal4p DNA-binding domain) and prey Y* (Y fused with Gal4p activation domain) Cons: wrong (very high) concentrations, localization (unless both proteins are nuclear), and even host organism (unless done in yeast) Pros: direct binding events Main source of noise: self-activating baits

Affinity capture + Mass Spectrometry Ionizer (multi-)protein complex pulled out by affinity-tagged protein (bait) + _ Mass Filter Detector Pros: in vivo concentrations and localizations Cons: binding interactions are often indirect Main source of noise: highly abundant and sticky proteins

Breakup by experimental technique in yeast BIOGRID database S. cerevisiae Affinity Capture-Mass Spec Affinity Capture-RNA 55 Affinity Capture-Western 5710 Co-crystal Structure 107 FRET 43 Far Western 41 Two-hybrid Total 46063

What are the common topological features? 1. Broad distribution of the number of interaction partners of individual proteins

What’s behind this broad distribution ? Three explanations were proposed: EVOLUTIONARY (duplication-divergence models) BIOPHYSICAL (stickiness due to surface hydrophobicity) FUNCTIONAL (tasks of vastly different complexity) From YY. Shi, GA. Miller., H. Qian., and K. Bomsztyk, PNAS 103, (2006)

Evolutionary explanation: duplication-divergence models A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani. Modelling of protein interaction networks. cond-mat/ , (2001) published in ComPlexUs 1, 38 (2003) Followed by R. V. Sole, R. Pastor-Satorras, E. Smith, T. B. Kepler, A model of large-scale proteome evolution, cond-mat/ (2002) published in Advances in Complex Systems 5, 43 (2002) Then many others including I.Ispolatov, I., Krapivsky, P.L., Yuryev, A., Duplication-divergence model of protein interaction network, Physical Review, E 71, , Network has to grow Preferential attachment in disguise: as ki grows so is the probability to duplicate one of the neighbors

Vazquez-Flammini-Maritan- Vespignani model Start with two interacting proteins At each step randomly pick a protein i and duplicate it to i’ With probability p make the interaction i–i’ (duplicated homodimer) For every node j — i and i’ select one of the two links and remove it with probability q Preferential attachment in disguise: as k i grows so is the probability to duplicate one of the neighbors

Tell-tale signs of gene duplication in PPI networks Pair of duplicated proteins Shared interactions Pair of duplicated proteins Shared interactions Right after duplicationAfter some time

Yeast regulatory network SM, K. Sneppen, K. Eriksen, K-K. Yan, BMC Evolutionary Biology (2003)

100 million years ago

Traces of duplication in PPI networks SM, K. Sneppen, K. Eriksen, and K-K. Yan, BMC Evol. Biol. 4, 9 (2003) (a similar but smaller scale-plot vs K s in A. Wagner MBE 18, 1283 (2001)

But: how important are duplications for shaping hubs? J. Berg, M. Lässig, and A. Wagner, BMC Evol. Biol. (2004) Duplication- divergence models could still be OK if sequences diverge relatively fast

Biophysical explanation: “stickiness” models G. Caldarelli, A. Capocci, P. De Los Rios, M.A. Munoz, Scale-free Networks without Growth or Preferential Attachment: Good get Richer, cond-mat/ , (2002) published in PRL (2002) Followed by Deeds, E.J. and Ashenberg, O. and Shakhnovich, E.I., A simple physical model for scaling in protein-protein interaction networks, PNAS (2006) Then others including Yi Y. Shi, G.A. Miller, H. Qian, and K. Bomsztyk, Free-energy distribution of binary protein–protein binding suggests cross-species interactome differences, PNAS (2006).

Original Candarelli et al. model Stickiness (they called it fitness) is exponentially distributed P(S i >S)=exp(-AS) Nodes i—j interact if S i +S i >T (hard threshold) =Nexp(-A(T-S))=C exp(AS)  P(S i >S)=C/ P(K i >K)=P(S i >S(K))~1/K  P(K)~1/K 2 No preferential attachment: network does not have to grow!

Recent modifications Deeds et al.: Biophysically stickiness should have Gaussian PDF It spoils powerlaw somewhat Shi et al.: Soft threshold Binding i - j is detected with probability p ij = F (S i +S j ). For Yeast-2-hybrid F (S i +S j ) is given by exp(S i +S j -T)/(1+ exp(S i +S j -T)) Removes unrealistic properties of the hard threshold model: neighbors(i) are all neighbors(j) if S i <S j

There are just TOO MANY homodimers Null-model: P self ~ /N N (r) dimer =N  P self = Not surprising as homodimers have many functional roles I. Ispolatov, A. Yuryev, I. Mazo, and SM, 33, 3629 NAR (2005) N (r) dimer

Network properties around homodimers

Human: literature data P self ~0.05, P others ~ Fly: two-hybrid data P self ~0.003, P others ~ I. Ispolatov, A. Yuryev, I. Mazo, and SM, 33, 3629 NAR (2005)

Our interpretation Both the number of interaction partners K i and the likelihood to self-interact are proportional to the same “stickiness” of the protein S i which could depend on the number of hydrophobic residues on the surface protein abundance its’ popularity (in networks taken from many small-scale experiments) etc. In random networks p dimer (K)~K 2 not ~K like we observe empirically I. Ispolatov, A. Yuryev, I. Mazo, and SM, 33, 3629 NAR (2005)

Functional explanation: there are as many binding partners as needed for function Not an explanation: why difficulty of functions is so heterogeneous? Difficult to check: the function of many binding interactions is poorly understood (quite clear in transcriptional regulatory networks e.g. in E. coli ) The 3rd explanation does not exclude the previous two: Evolution by duplications combined with pure Biophysics (stickiness) provide raw materials from which functional interactions are selected

What are the common topological features? 1. Broad distribution of the number of interaction partners (degree K) of individual proteins 2. Anti-hierarchical (disassortative) architecture. Hubs avoid other hubs and thus are on a periphery.

Central vs peripheral network architecture central (hierarchical) peripheral (anti-hierarchical) A. Trusina, P. Minnhagen, SM, K. Sneppen, Phys. Rev. Lett. 92, 17870, (2004) random

What is the case for protein interaction network SM, K. Sneppen, Science 296, 910 (2002)

What are the common topological features? 1. Broad distribution of the number of interaction partners (degree K) of individual proteins 2. Anti-hierarchical (disassortative) architecture. 3. Small-world-property (follows from 1. for / >2 )

Protein binding networks have small-world property Curated dataset from our study 83% in this plot Large-scale Y2H experiment S. cerevisiae 86% of proteins could be connected

Why small-world matters? Claims of “robustness” of this network architecture come from studies of the Internet where breaking up the network is a disaster For PPI networks it is the OPPOSITE: interconnected networks present a problem In a small-world network equilibrium concentrations of all proteins are coupled to each other Danger of undesirable cross-talk

Going beyond topology and modeling the equilibrium and kinetics SM, K. Sneppen, I. Ispolatov, q-bio/ ; SM, I. Ispolatov, PNAS in press (2007)

Law of Mass Action equilibrium dD AB /dt = r (on) AB F A F B – r (off) AB D AB In equilibrium D AB =F A F B /K AB where the dissociation constant K AB = r (off) AB / r (on) AB has units of concentration Total concentration = free concentration + bound concentration  C A = F A +F A F B /K AB  F A =C A /(1+F B /K AB ) In a network F i =C i /(1+  neighbors j F j /K ij ) Can be numerically solved by iterations

What is needed to model? A reliable network of reversible (non-catalytic) protein- protein binding interactions  CHECK! e.g. physical interactions between yeast proteins in the BIOGRID database with 2 or more citations. Most are reversible: e.g. only 5% involve a kinase Total concentrations C i and sub-cellular localizations of all proteins  CHECK! genome-wide data for yeast in 3 Nature papers (2003, 2003, 2006) by the group of J. UCSF. VERY BROAD distribution: C i ranges between 50 and 10 6 molecules/cell Left us with 1700 yeast proteins and ~5000 interactions in vivo dissociation constants K ij OOPS! . High throughput experimental techniques are not there yet

Let’s hope it doesn’t matter The overall binding strength from the PINT database: =1/(5nM). In yeast: 1nM ~ 34 molecules/cell Simple-minded assignment K ij =const=10nM (also tried 1nM, 100nM and 1000nM) Evolutionary-motivated assignment: K ij =max(C i,C j )/20: K ij is only as small as needed to ensure binding given C i and C j All assignments of a given average strength give ROUGHLY THE SAME RESULTS

Robustness with respect to assignment of K ij Spearman rank correlation: 0.89 Pearson linear correlation: 0.98 Bound concentrations: D ij Spearman rank correlation: 0.89 Pearson linear correlation: Free concentrations: F i

Numerical study of propagation of perturbations We simulate a twofold increase of the abundance C 0 of just one protein Proteins with equilibrium free concentrations F i changing by >20% are significantly perturbed We refer to such proteins i as concentration-coupled to the protein 0 Look for cascading perturbations

Resistor network analogy Conductivities  ij – dimer (bound) concentrations D ij Losses to the ground  iG – free (unbound) concentrations F i Electric potentials – relative changes in free concentrations (-1) L  F i /F i Injected current – initial perturbation  C 0 SM, K. Sneppen, I. Ispolatov, arxiv.org/abs/q-bio.MN/ ;

What did we learn from this mapping? The magnitude of perturbations` exponentially decay with the network distance (current is divided over exponentially many links) Perturbations tend to propagate along highly abundant heterodimers (large  ij ) F i /C i has to be low to avoid “losses to the ground” Perturbations flow down the gradient of C i Odd-length loops dampen the perturbations by confusing (-1) L  F i /F i

Exponential decay of perturbations O – real S - reshuffled D – best propagation

SM, I. Ispolatov, PNAS in press (2007) HHT1

What conditions make some long chains good conduits for propagation of concentration perturbations while suppressing it along side branches?

Perturbations propagate along dimers with large concentrations They cascade down the concentration gradient and thus directional Free concentrations of intermediate proteins are low SM, I. Ispolatov, PNAS in press (2007)

Implications of our results

Cross-talk via small-world topology is suppressed, but… Good news: on average perturbations via reversible binding rapidly decay Still, the absolute number of concentration- coupled proteins is large In response to external stimuli levels of several proteins could be shifted. Cascading changes from these perturbations could either cancel or magnify each other. Our results could be used to extend the list of perturbed proteins measured e.g. in microarray experiments

Genetic interactions Propagation of concentration perturbations is behind many genetic interactions e.g. of the “dosage rescue” type We found putative “rescued” proteins for 136 out of 772 such pairs (18% of the total, P-value ) SM, I. Ispolatov, PNAS in press (2007)

Intra-cellular noise Noise is measured for total concentrations C i (Newman et al. Nature (2006)) Needs to be converted in biologically relevant bound (D ij ) or free (F i ) concentrations Different results for intrinsic and extrinsic noise Intrinsic noise could be amplified (sometimes as much as 30 times!)

Could it be used for regulation and signaling? 3-step chains exist in bacteria: anti-anti- sigma-factors  anti-sigma-factors  sigma- factors  RNA polymerase Many proteins we find at the receiving end of our long chains are global regulators (protein degradation by ubiquitination, global transcriptional control, RNA degradation, etc.) Other (catalytic) mechanisms spread perturbations even further Feedback control of the overall protein abundance?

NOW BACK TO TOPOLOGY

Summary There are many kinds of protein networks Networks in more complex organisms are more interconnected Most have hubs – highly connected proteins and a broad (~scale-free) distribution of degrees Hubs often avoid each other (networks are anti- hierarchical or disassortative) Networks evolve by gene duplications There are many self-interacting proteins. Likelihood to self-interact and the degree K both scale with protein’s “stickiness”

THE END

Why so many genes could be deleted without any consequences?

Possible sources of robustness to gene deletions Backup via the network (e.g. metabolic network could have several pathways for the production of the necessary metabolite) Not all genes are needed under a given condition (say rich growth medium) Affects fitness but not enough to kill Protection by closely related homologs in the genome

Protective effect of duplicates Gu, et al 2003 Maslov, Sneppen, Eriksen, Yan 2003 YeastWorm Maslov, Sneppen, Eriksen, Yan 2003 Z. Gu, … W.-H. Li, Nature (2003) Maslov, Sneppen, Eriksen, Yan BMC Evol. Biol. (2003)