Download presentation
1
Bioinformatics: Applications
ZOO 4903 Fall 2006, MW 10:30-11:45 Sutton Hall, Room 312 Jonathan Wren Protein-Protein Interaction Networks
2
Lecture overview What we’ve talked about so far Overview
Proteins & their domains Protein 3D structure Overview Proteins do not function in a vacuum Methods of detecting protein-protein interactions (PPI) Structure and types of networks Behavior of networks PPIs are equally or more important than structure
3
Cells are crowded places!
Hopper & Mayer, 1999, Prokaryotes. Am.Sci. 87:518
4
Importance of protein-protein interactions
Many cellular processes are regulated by multiprotein complexes Distortions of protein interactions can cause diseases Protein function can be predicted by knowing functions of interacting partners (“guilt by association”) A comparison of sequence (GenBank) and protein-protein interaction data (DIP database) Adapted from S. Fields, FEBS, 2005
5
Types of protein-protein interactions (PPI)
Non-obligate PPI Obligate PPI usually permanent the protomers are not found as stable structures on their own in vivo Stable (many enzyme-inhibitor complexes) dissociation constant Kd=[A][B] / [AB] 10-7 ÷ M Transient Weak (electron transport complexes) Kd mM-M Non-obligate transient homodimer, Sperm lysin (interaction is broken and formed continuously) Intermediate (antibody-antigen, TCR-MHC-peptide, signal transduction PPI), Kd M-nM Strong (require a molecular trigger to shift the oligomeric equilibrium) Kd nM-fM Obligate heterodimer Human cathepsin D Non-obligate permanent heterodimer Thrombin and rodniin inhibitor Bovine G protein dissociates into G and G subunits upon GTP, but forms a stable trimer upon GDP
6
Multiple interactions: Guanine-nucleotide binding protein
Guanine nucleotide-binding proteins regulate a variety of processes, including sensual perception, protein synthesis, various transport processes, and cell growth and differentiation. They act as molecular switches and timers that cycle between inactive guanosine diphosphate (GDP)-bound and active guanosine triphosphate (GTP)-bound states. Recent structural studies show that the switch apparatus itself is a conserved fundamental module but that its regulators and effectors are quite diverse in their structures and modes of interaction Adapted from Vetter & Wittinghofer, Science 2001
7
Multiple interactions: Guanine-nucleotide binding protein
A: small changes in the interaction site could be very disruptive to many proteins Question: How conserved are the interactive vs non-interactive portions of this protein? Adapted from Vetter & Wittinghofer, Science 2001
8
Protein evolution - gene duplication
Right after duplication Over time Pair of duplicated proteins Shared interactions Duplication – provides redundancy & room for divergence
9
Methods of identifying PPIs
Experimental Protein-protein arrays Y2H assay TAP assay Computational/Inferential Interolog analysis Co-localization, co-expression Correlated mutations Text-mining
10
Interologs Homolog Interolog Ortholog Paralog Common ancestors
Common 3D structure Common active sites Ortholog Derived from Speciation Paralog Derived from Duplication Interolog Conserved Protein-Protein Interaction If A and B interact in organism X, then if organism Y has a homolog of A (A’) and a homolog of B (B’) then A’ and B’ should interact too. Requires list of known interacting partners Thus, finding one PPI may yield dividends!
11
Protein Arrays H Zhu et al (2000) “Analysis of yeast protein kinases
Protein chips are disposable arrays of microwells in silicone elastomer sheets placed on top of microscope slides. H Zhu et al (2000) “Analysis of yeast protein kinases using protein chips” Nature Genetics 26:
12
The Two-Hybrid System Reporter Gene
Two hybrid proteins are generated with transcription factor domains Both fusions are expressed in a yeast cell that carries a reporter gene whose expression is under the control of binding sites for the DNA-binding domain Activation Domain The key to the two-hybrid screen is that in most eukaryotic transcription factors, the activating and binding domains are modular and can function in close proximity to each other without direct binding Prey Protein Bait Protein Binding Domain Reporter Gene
13
The Two-Hybrid System Reporter mRNA Reporter mRNA Reporter mRNA
Interaction of bait and prey proteins localizes the activation domain to the reporter gene, thus activating transcription. Since the reporter gene typically codes for a survival factor, yeast colonies will grow only when an interaction occurs. Activation Domain Survival gene – such as to create lysine (remember Jurassic Park?) Prey Protein Reporter mRNA Bait Protein Reporter mRNA Reporter mRNA Binding Domain Reporter mRNA Reporter mRNA Reporter Gene
14
Genome-wide analysis by Y2H
Matrix approach: a matrix of prey clones is added to the matrix of bait clones. Diploids where X and Y interact are selected based on the expression of a reporter gene. Library approach: one bait X is screened against an entire library. Positives are selected based on their ability to grow on specific substrates. Uetz et al Nature 2000 – 957 putative interactions in Yeast Rain et al Nature 2001 – 1,200 putative interactions in H. Pylori Ho et al Nature 2002 – 3,617 putative interactions in Yeast (Mass Spec) Adapted from B. Causier, Mass Spectroscopy Reviews, 2004
15
Advantages of Y2H In vivo technique, good approximation of processes which occur in higher eukaryotes. Transient interactions can be determined, can predict the affinity of an interaction. Can be used to detect potential interactions of genes not yet observed to be translated into proteins (e.g. rarely expressed) or novel constructs (e.g. therapeutics) Relatively fast and efficient.
16
Disadvantages of Y2H Fusion of a protein into chimeras can change the structure of a target Protein interactions can be different in yeast and the organisms where the genes came from It is difficult to target extracellular proteins It is hard to detect interactions between proteins active only in a complex Proteins which can interact in two-hybrid experiments, may never interact in vivo
17
Tandem affinity purification method (TAP)
Target protein ORF is fused with the DNA sequences encoding TAP tag; Tagged ORFs are expressed in yeast cells and form native complexes; The complexes are purified by TAP method; Components of each complex are found by gel electrophoresis or MS.
18
Tandem affinity purification method (TAP)
TAP tag consists of two IgG binding domains of Staphylococcus protein A and calmodulin binding peptide; 7123 interactions can be clustered into 547 complexes (Krogan et al, 2006) O. Puig et al, Methods, 2001
19
Differences and similarities between Y2H and MS-TAP
TAP permits protein complexes to be isolated, but cannot detect weak/transient PPIs Both methods generate a lot of false positives, only ~50% interactions are biologically significant Y2H is in vivo technique MS can detect large stable complexes and networks of interactions
20
Text Mining Searching Medline or PubMed for words or word combinations
Co-occurrence of terms is the simplest metric, yet lends to a higher FP rate NLP methods are more specific (e.g., “X binds to Y”; “X interacts with Y”; “X associates with Y” etc.) yet are difficult to detect so it has a higher FN rate Normally requires a list of known gene names or protein names for a given organism
21
Pre-BIND Used Support Vector Machine (SVM) to scan literature for PPIs
Precision, accuracy and recall of 92% for correctly classifying PPI abstracts Estimated to capture 60% of all abstracted protein interactions for a given organism Donaldson et al. BMC Bioinformatics :11
22
Drosophila interaction map
Interaction map plus subcell-localization (from GO) From: A Protein Interaction Map of Drosophila Giot et al. Science 302, (2003)
23
Comparing large scale data of protein-protein interactions
All methods except for Y2H and synthetic lethality technique are biased toward abundant proteins. PPI are biased toward certain cellular localizations. Evolutionarily conserved proteins have much better coverage in Y2H than the proteins restricted to a certain organism. Von Mering et al, Nature, 2002
24
Functional organization of yeast proteome: network of protein complexes
Essential gene products are more likely to interact with essential rather than nonessential proteins Orthologous proteins interact with complexes enriched with orthologs Gavin et al, Nature, 2002
25
PPI Databases online DIP MIPS (small scale)
MIPS (small scale) BIND (PPI, Prot-DNA, Prot-SM) (now owned by Unleashed) OPHID (predicted interactions) MINT - Molecular Interactions Database IntAct (EBI) InterDom (domain interactions) STRING (EMBL)
26
Interaction databases
Types Experiment (E) Structure detail (S) Predicted Physical (P) Functional (F) Curated (C) Homology modeling (H) *International Molecular Exchange (IMEx) consortium The IMEx consortium is a group of major public interaction data providers intending to share curation effort and exchange completed records on molecular interaction data, similar to successful global collaborations for protein and DNA sequences and for macromolecular structures.
27
Comparing the DBs High FP rate in high- throughput exp.
Disagreement between benchmark sets Experimental PPI data is sparse relative to all PPIs, so dataset overlap is small and hard to confirm with multiple sources
28
PPI network properties
Nodes & connections
29
Characteristics of networks
n = nodes, k = connections or “edges” K=2 K=2 K=3 K=1 In biology, n refers to genes/proteins (and/or metabolites) while k refers to interactions
30
Examples of networks: Proximity-based interactions
31
Examples of networks: Distant interactions
32
Elementary features: node (n) diversity and dynamics
33
Elementary features: edge (k) diversity and dynamics
34
Elementary features: Network Evolution
35
Network properties Network Structure Metrics Network Structure Types
Average path length Degree distribution(connectivity) Clustering coefficient Network Structure Types Regular Random Small-world Scale-free
36
Structural metrics: Path length & network diameter
37
Structural Metrics: Degree distribution (connectivity)
38
Structural Metrics: Clustering coefficient
39
Network properties Network Metrics Network Structures
Average path length Degree distribution(connectivity) Clustering coefficient Network Structures Regular Random Small-world Scale-free
40
Regular networks – fully connected
41
Regular networks – Lattice
42
Regular networks – Lattice: ring world
43
Random networks
44
Random Networks
45
Small-world networks
46
Exponential network degree distribution
.
47
Scale-free networks New nodes preferentially attach to highly connected ones Coined by A.L. Barabasi in 1998
48
Different network models: Barabasi-Alberts.
Model of preferential attachment. At each step, a new node is added to the graph. The new node is attached to one of old nodes with probability proportional to the vertex degree. ln(P(k)) Degree distribution – power law distribution. Evolution – old proteins have more connections than new proteins ln(k) Barabasi & Albert, Science, 1999
49
Properties of scale-free networks.
Multiplying k by a constant, does not change the shape of the distribution – scale free distribution. From T. Przytycka Small diameter Tolerance to errors and attacks But: sub-networks can be scale-free while underlying degree distribution is not.
50
Difference between scale-free and random graph models.
Random networks are homogeneous, most nodes have the same number of links. Scale-free networks have a number of highly connected verteces. Adapted from Jeong et al, Nature, 2000
51
The Topology of PPI Networks
Small-world Scale free Recurring motifs (Barabasi et al. Nature Genetics 2003)
52
Highway connections – random. Airports – scale-free.
Reprinted from Linked: The New Science of Networks by Albert-Laszlo Barabasi
53
The Internet
54
Category: Internet Topology Description: Internet connectivity snapshot using Skitter data and Walrus visualization
55
Category: Natural Networks Description: Yeast protein network map, inside cell
56
Category: Natural Networks Description: Portion of food web in North Atlantic Ocean
57
Social networks relevant to spread of diseases, rumors, fads, etc.
Category: Social Networks Description: Individuals and their professions in the network of activities during the German Revolution of
58
Category: Economic Networks Description: World Trade network, 1992
59
Metabolic Networks The metabolic networks of all organisms in all three domains of life (prokaryote, eukaryote, archaea) appear to be scale-free (43 examined) The network diameter of all 43 metabolic networks is the same, irrespective of the number of proteins involved. Does this seem counter-intuitive?
60
Implications – Attack Tolerance
Robust. For <3, removing nodes does not break network into islands. Very resistant to random attacks, but attacks targeting key nodes are more dangerous. Max Cluster Size Path Length
61
Order, complexity and chaos
Principles of evolution
62
Evolutionary puzzle The cell is constructed from unreliable parts and is subjected to mutations, yet it behaves in a robust and reliable manner. Can natural selection alone explain this? Or might some additional principles also come into play, such as self-organization? Kaufmann’s hypothesis: Natural selection acts on self-organizing systems, rather than creating them. Without an innate tendency toward order, most mutations would be fatal.
63
Boolean networks Given a set of nodes, assign to each a set of rules that governs their state (1 or 0, on or off) Ruleset (switch) 0 1 1 0 Ruleset (XOR) 00 0 01 1 10 1 11 0
64
Boolean networks Given a set of nodes, assign to each a set of rules that governs their state (1 or 0, on or off) Ruleset (switch) 0 1 1 0 Ruleset (XOR) 00 0 01 1 10 1 11 0
65
Boolean networks Given a set of nodes, assign to each a set of rules that governs their state (1 or 0, on or off) Ruleset (switch) 0 1 1 0 Ruleset (XOR) 00 0 01 1 10 1 11 0
66
Network behavior Given this type of setup for network structure, what does the network behave like? Does behavior change as n increases? Does behavior change as k increases? What do sparsely connected networks behave like? What do highly connected networks behave like? We can answer this with a simulation… Run network dynamics program after this slide
67
Complexity Ordered behavior is characteristic of genomic and metabolic networks: they quickly settle down into periodic patterns of activity that resist disturbance or cycle thru states. Chaotic behavior is characteristic of many non-biological complex systems: sensitivity to initial conditions, long transients, and very large limit cycles (strange attractors). BUT… Life exists and evolves in a region between order and chaos, termed “complexity” because it exhibits enough order to be stable & reproducible, but enough variation & instability to adapt to change Think about the simulation in terms of evolution & the environment present in the cell with activities for each protein being governed by a set of rules. Whether early (few genes) or late in evolution, there is selection to control interactions.
68
Summary PPI networks are one of the most active areas of bioinformatics research interest PPI networks are constructed by empirical & computational methods Networks have a structure that dictates their behavior Biological networks are scale-free Essential proteins have high connectivity Life evolves on the edge between order & chaos
69
For next time Supplementary reading S4
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.