Presentation is loading. Please wait.

Presentation is loading. Please wait.

Network Analysis of Protein-Protein Interactions

Similar presentations


Presentation on theme: "Network Analysis of Protein-Protein Interactions"— Presentation transcript:

1 Network Analysis of Protein-Protein Interactions
Xiaohua Tony Hu College of Information Science & Technology Drexel University Philadelphia, PA 19104 (O) (F)

2 Outline Introduction Research Goals Topological Analysis
The algorithm - CommBuilder Experimental Results Conclusions & Discussion Q&A September 22, 2018

3 Introduction Biological Networks
Protein-Protein Interaction (PPI) Networks Network Community Structure Community Detection Network Growing September 22, 2018

4 protein-protein interactions Bio-chemical reactions
Biological Networks GENOME PROTEOME protein-protein interactions Citrate Cycle METABOLISM Bio-chemical reactions September 22, 2018

5 Biological Networks Modeling biological systems Genetic networks
Gene association and expression Protein networks Protein structure and interactions Metabolic pathways Challenging Non-trivial and irregular Incomplete and noisy September 22, 2018

6 Related Work The “small world” model was first proposed by Watts and Strogatz, referring to the small average distance between any two vertices in the network. Barabasi and Albert discovered a highly heterogeneous PPI network with scale-free connectivity properties in yeast., PPI networks of S. cerevisiae, H. pylori, C.elegans, and D. melanogaster [11-13]. Thus, the scale-free network model has been well accepted. In PPI networks, not only the degree distribution exhibits power-law dependence, other topological properties have also been shown scale-free topology such as clustering coefficient. Yook and colleagues observe that the clustering coefficient of S. cerevisiae follows a power-law. However, not all research agrees on the power-law behavior in all PPI networks. Thomas and colleagues find that the connectivity distribution in a human PPI network does not follow power law. They argue that current belief of power law distribution may reflect a behavior of a sampled subgraph. From a slightly different angle, Colizza and colleagues also evaluate three PPI networks constructed from yeast data sets. Although they observe that the connectivity distribution follows power law, only one of the three networks exhibits approximate power law behavior for the clustering coefficient. Soffer and Vazquez find that the power law dependence of the clustering coefficient is to some extent caused by the degree correlations of the networks, with high degree vertices preferentially connecting with low degree vertices. September 22, 2018

7 PPI Networks – Why? Proteins are executors of genetic program and rarely act alone Functional assignments of uncharacterized proteins Targets of new drugs September 22, 2018

8 PPI Networks - Properties
PPI network models Scale-free (Barabasi & Albert, 1999) Geometric random (Przulj et al, 2004) Tolerant to random errors but fragile against the removal of the most connected nodes (hubs) Modularity and community structure September 22, 2018

9 PPI Networks - Models A.-L. B. and Z.N. Oltvai, Nat. Rev. Gen.(2004)
September 22, 2018

10 Yeast PPI Network Nodes: proteins Edges : physical Interactions
H. Jeong et al, Nature 411, (2001). September 22, 2018

11 PPI Networks - Topology
H. Jeong et al., Nature 411, (2001) September 22, 2018

12 PPI Networks - Topology
Origin of scale-free gene duplication Preferential attachment Vazquez et al. 2003; Sole et al. 2001; Bhan et al September 22, 2018

13 Network Community Structure
Gathering of vertices into groups such that the connections within groups are denser than between groups (Girvan & Newman, 2002) An important property of PPI networks Delineation of functional groups/processes Transfer of information September 22, 2018

14 Community Detection The GN algorithm (Girvan & Newman, 2002)
Based on betweenness High computational cost Well adopted Metabolic networks (Holme et al, 2003) Functional units Gene networks (Wilkinson & Huberman, 2004) Related genes September 22, 2018

15 Network Growing Genetic regulatory networks (Hashimoto et al, 2004)
Based on probabilistic Boolean networks Web (Flake et al, 2002) Based on the self-organization of the network structure and a maximum flow method September 22, 2018

16 Graph Theory Modeling real-world phenomena, e.g. World Wide Web, electronic circuits, collaborations between scientists, co-citations, biological networks, etc. A mathematical formalism Global properties: e.g. diameter, clustering, degree distribution Local properties: vertex density, motif and graphlet September 22, 2018

17 Graph Theory - Models Random Model Small-world Model
The probability of an edge between two nodes is distributed randomly. Erdos & Renyi Small-world Model Small diameters and large clustering coefficients Watts & Strogatz September 22, 2018

18 Graph Theory - Models Scale-free Model Random Geometric Model
Degree distribution follows a power law of the form P(k) ~ k−γ. Robustness and fragility Preferential attachment (graph evolution) A-L Barabasi Random Geometric Model Nodes randomly distributed in a geometric space (Przulj, et al) September 22, 2018

19 Research Goals To analyze the topological properties of protein-protein interaction networks Different organisms Different experimental systems Different confidence levels September 22, 2018

20 Research Goals What is the community to which a given protein belongs?
Desirable and computationally more feasible to study a community containing a few proteins of interest September 22, 2018

21 Topological Analysis Method Protein-Protein Interaction Networks
Constructed from different data sets Statistical analysis of topological properties SPSS September 22, 2018

22 Topological Analysis Data Sets DIP – Database of Interacting Proteins
Species-specific PPI data sets: D. melanogaster (fly), S. cerevisiae (yeast), E. coli, C. elegans (worm), H. pylori, H. sapiens (human), M. musculus (mouse) BIND – Biomolecular Interaction Network Database Fly PPI with assigned confidence scores (Giot, 2003) GRID – General Repository for Interaction Datasets Yeast and fly PPI (including experiment systems used to obtain the data) September 22, 2018

23 Topological Analysis The experimental systems-specific set includes
(1) fly and yeast PPI networks, downloaded from the General Repository for Interactions Datasets (GRID). From fly data set, we constructed three PPI networks, representing interactions detected by one of the following experimental systems: Enhancement (Fly-E), Suppression (Fly-S), and Two Hybrid (Fly-TH). From yeast data set, we constructed PPI networks representing three experimental systems: Affinity Precipitation (Yeast-AP), Synthetic Lethality (Yeast-SL), and Two Hybrid (Yeast-TH). (2) We also constructed a network representing the entire set of protein interactions (Fly and Yeast). The confidence levels-specific set contains fly data set downloaded from the Biomolecular Interaction Network Database (BIND). We constructed three networks: one with confidence >= 0.5 (Fly50), one with confidence >= 0.3 (Fly30), and the third containing all interactions (Fly00). September 22, 2018

24 Topological Analysis Definitions Graph G(V, E) V: vertex set
Vertex (or Node) Degree: number of edges connected to the vertex. G(V, E) V: vertex set E: edge set |V|, |E|: sizes V1 e.g. |V| = 4 |E| = 6 Edge September 22, 2018

25 Topological Analysis P(k) ~ k-γ Degree distribution P(k)
the probability of a vertex has degree of k. power law: P(k) ~ k-γ Diameter (length) the shortest path from one vertex to another September 22, 2018

26 Topological Analysis Clustering coefficient (C) Vertex Density (D)
Ci = 2ei / (ki*(ki – 1)) ei : # of edges between neighbors of vertex i ki : # of neighboring vertices of i i not included in both Vertex Density (D) Same as C but includes i September 22, 2018

27 Table 1 Protein interaction networks of different organisms.
PROTEINS INTERACTIONS COMPONENTS GIANT COM D. Melanogaster 7441 22636 52 7330 (98.5%) S. cerevisiae (Core) 2614 6379 66 2445 (93.5%) E. coli 1640 6658 200 1396 (85.1%) C. elegans 2629 3970 99 2386 (90.8%) H. pylori 702 1359 9 686 (97.7%) H. sapiens 1059 1318 119 563 (53.2%) M. musculus 327 274 79 49 (15.0%) Table 1 shows the small sizes of giant components for H. sapiens and especially for M. musculus, meaning that we have a fairly large number of unconnected small subgraphs in these two networks. As one can expect, the size of the giant component decreases in higher confidence networks while the number of unconnected subgraphs increases. September 22, 2018

28 Table 2 Protein interaction networks of D. melanogaster – different confidence. NETWORK CONFIDENCE PROTEINS INTERACTIONS COMPONENTS GIANT COMPONENT Fly00 > 0 7064 21111 68 6929 (98.1%) Fly30 >= 0.3 6382 9157 213 5881 (92.1%) Fly50 >= 0.5 4689 4877 590 3068 (65.4%) Across species, all networks exhibit small values of average degree and diameters, even though the absolute values differ significantly. Except for C. elegans, PPI networks for all other species have larger average clustering coefficient comparing to the corresponding random clustering coefficient, indicating a non-random and hierarchical structure within these networks. September 22, 2018

29 Protein interaction networks of S. cerevisiae: with confidence.
Table 3 Protein interaction networks of S. cerevisiae: with confidence. NETWORK PROTEINS INTERACTIONS COMPONENTS GIANT COMPONENT YeastCore 2614 6379 66 2445 (93.5%) Yeast00 4770 15199 41 4687 (98.3%) September 22, 2018

30 Table 4 Protein interaction networks of D. melanogaster: different experimental systems NETWORK EXP SYSTEMS PROTEINS INTERACTIONS COMPONENTS GIANT COMPONENT Fly Combined 7938 25827 72 7793 (98.2%) Fly-E Enhancement 1054 1819 56 902 (85.6%) Fly-S Suppression 1121 2247 44 1020 (91.0%) Fly-TH Two Hybrid 5614 17544 12 5591 (99.6%) September 22, 2018

31 Affinity Precipitation
Table 5 Protein interaction networks of S. cerevisiae: different experimental systems NETWORK EXP SYSTEMS PROTEINS INTERACTIONS COMPONENTS GIANT COMPONENT Yeast Combined 4918 18119 48 4824 (98.1%) Yeast-AP Affinity Precipitation 2388 7405 39 2292 (96.0%) Yeast-SL Synthetic Lethality 1468 4773 44 1343 (91.5%) Yeast-TH Two Hybrid 3937 6358 138 3632(92.3%) September 22, 2018

32 NETWORK Kmax <k> <l> <D> <C> <Crand>
Table 6 Topology of protein interaction networks. NETWORK Kmax <k> <l> <D> <C> <Crand> D. melanogaster 178 6.08 4.39 0.5840 0.0159 0.0097 S. cerevisiae (Core) 111 4.88 5.00 0.6949 0.2990 0.0103 E. coli 152 8.12 3.73 0.8092 0.5889 0.1168 C. elegans 187 3.02 4.81 0.7889 0.0490 0.0462 H. pylori 54 3.87 4.14 0.6624 0.0255 0.0403 H. sapiens 33 2.49 6.80 0.7983 0.1658 0.0098 M. musculus 12 1.68 3.57 0.8670 0.1011 0.0062 September 22, 2018

33 NETWORK Kmax <k> <l> <D> <C>
Table 6 (conti.) Topology of protein interaction networks. NETWORK Kmax <k> <l> <D> <C> <Crand> S. cerevisiae 283 6.37 4.18 0.6001 0.0928 0.0196 S. cerevisiae (Core) 111 4.88 5.00 0.6949 0.2990 0.0103 Fly00 178 5.98 4.45 0.5939 0.0281 0.0095 Fly30 59 2.87 7.06 0.7227 0.0518 0.0015 Fly50 42 2.08 9.42 0.8010 0.0793 0.0008 September 22, 2018

34 Topology of protein interaction networks.
Table 6 (conti.) Topology of protein interaction networks. NETWORK Kmax <k> <l> <D> <C> <Crand> Fly 178 6.51 4.39 0.6005 0.0675 0.0104 Fly-E 110 3.45 4.44 0.8085 0.3441 0.0725 Fly-S 124 4.01 4.30 0.7875 0.3459 0.0735 Fly-TH 144 6.25 4.23 0.5870 0.0093 0.0123 Yeast 288 7.37 4.12 0.6000 0.1538 0.0240 Yeast-AP 69 6.20 4.43 0.6638 0.2646 0.0163 Yeast-SL 157 6.50 3.84 0.7150 0.2324 0.1600 Yeast-TH 3.23 4.96 0.7362 0.0869 0.0368 September 22, 2018

35 Observations:Average Topological Properties of the PPI Networks
Across species, all networks exhibit small values of average degree and diameters, even though the absolute values differ significantly. Except for C. elegans, PPI networks for all other species have larger average clustering coefficient comparing to the corresponding random clustering coefficient, indicating a non-random and hierarchical structure within these networks. Networks with higher confidence have larger diameters, larger average clustering coefficient, and a smakker average degree. They shift further away from random structure. September 22, 2018

36 Figure 1 Degree Distribution P(k): PPI of Different Organisms
September 22, 2018

37 Figure 2 Degree Distribution P(k) of yeast: with Confidence.
September 22, 2018

38 Figure 3 Degree Distribution P(k) of Fly: with Confidence.
September 22, 2018

39 Figure 4 Degree Distribution P(k): Methodology Difference.
September 22, 2018

40 Observations: Degree Distribution P(k)
The log-log plot clearly demonstrates the power law dependence of P(k) on degree k. For our analysis, we select to use directly the raw data, instead of following [4] with exponential cutoff. Without exponential cutoff, our regression analysis yields power law exponent γ between 1.31 and 2.76, in fairly good agreement with previously reported results. Using SPSS software package, we create a scatter plot of residues by fit values for the power law model. The result, shown clearly indicates a pattern in the data that is not captured by the model. This means that the power law is a model that has excellent fit statistics, but poor residuals, an indication of its inadequacy. September 22, 2018

41 Figure 5 Average clustering coefficient C(k): Different Organisms.
September 22, 2018

42 Figure 6 Average clustering coefficient C(k): with Confidence.
September 22, 2018

43 Figure 7 Average clustering coefficient C(k): Methodology Difference.
September 22, 2018

44 Observations: The Average Clustering Coefficient
indicate that while E. coli and S. cerevisiae PPI networks show somewhat weak power law distribution, networks of other species do not follow a power law. Different experimental systems and different confidence levels do not seem to change this non-scale-free behavior. September 22, 2018

45 Figure 8 Average vertex density D(k): Different Organisms
September 22, 2018

46 Figure 9 Average vertex density D(k): with Confidence.
September 22, 2018

47 Figure 10 Average vertex density D(k): Methodology Difference.
September 22, 2018

48 Observations: The Average Vertex Density
All networks display consistent power law behavior for the vertex density spectrum September 22, 2018

49 Topological Analysis Exponents Degree Distribution: P(k) ~ k-γ
Clustering Coeffient: C(k) ~ k-α Vertex Density: D(k) ~ k-β September 22, 2018

50 Statistical analysis of the protein interaction networks.
Table 7 Statistical analysis of the protein interaction networks. NETWORKS γ (R2) α (R2) β (R2) D. melanogaster 1.945 (0.923) 3.050 (0.311) 0.836 (0.989) E. coli 1.355 (0.882) 0.562 (0.656) 0.536 (0.756) C. elegans 1.599 (0.839) 0.625 (0.362) 0.833 (0.976) H. pylori 1.651 (0.899) 0.495 (0.373) 0.826 (0.985) H. sapiens 2.025 (0.931) 0.657 (0.190)* 0.626 (0.699) M. musculus 2.360 (0.931) 0.598 (0.431)* 0.689 (0.965) S. cerevisiae (Core) 1.977 (0.911) 0.893 (0.721) 0.759 (0.867) * P > 0.05 September 22, 2018

51 Statistical analysis of the protein interaction networks. NETWORKS
Table 7 (conti.) Statistical analysis of the protein interaction networks. NETWORKS γ (R2) α (R2) β (R2) Fly00 1.980 (0.930) 0.382 (0.194) 0.789 (0.913) Fly30 2.540 (0.931) 0.698 (0.265) 0.780 (0.918) Fly50 2.763 (0.915) 0.791 (0.375)* 0.783 (0.920) S. cerevisiae (Core) 1.977 (0.911) 0.893 (0.721) 0.759 (0.867) S. cerevisiae 1.792 (0.883) 1.106 (0.525) 0.894 (0.872) * P > 0.05 September 22, 2018

52 Statistical analysis of the protein interaction networks. NETWORKS
Table 7 (conti.) Statistical analysis of the protein interaction networks. NETWORKS γ (R2) α (R2) β (R2) Fly 1.947 (0.934) 0.555 (0.334) 0.758 (0.865) Fly-E 1.518 (0.858) 1.020 (0.539) 0.769 (0.886) Fly-S 1.527 (0.936) 0.879 (0.513) 0.747 (0.893) Fly-TH 1.912 (0.923) ND 0.783 (0.867) Yeast 1.761 (0.919) 0.752 (0.326) 0.728 (0.698) Yeast-AP 1.819 (0.904) 0.635 (0.301) 0.619 (0.664) Yeast-SL 1.311 (0.830) 0.650 (0.342) 0.674 (0.734) Yeast-TH 1.614 (0.843) 1.453 (0.664) 0.947 (0.918) *P > ND: Not Determined September 22, 2018

53 Residues vs fit values for P(k)~k-γ.
September 22, 2018

54 Discussion Our results confirmed that PPI networks have small diameters and small average degrees. All networks we evaluated display power law degree distribution. However, further statistical analysis indicates an inadequacy of such model in capturing certain features in the data. Most of the networks we evaluated also reveal a larger clustering coefficient, indicating non-random structure of these networks. However, the values of the clustering coefficient vary significantly across different species. This may result from the incompleteness and noise of the data, indicated by the significant differences in the clustering coefficient between networks with different confidence levels. In addition, networks from different experimental systems differ significantly in the clustering coefficient. The spectrum of the average clustering coefficient over degree k fails to exhibit scale free behavior in most of the networks evaluated. we did not observe the power law distribution of C(k) over degree k, but the power law behavior appears when we modify the C(k) to D(k). We expect this information will be helpful because we have already seen the application of vertex density in [23]. September 22, 2018

55 Conclusions & Discussion
PPI networks have small diameters and small average degrees. Most of the PPI networks display power law degree distribution. However, further statistical analysis indicates inadequacy of the power-law model. Most networks have a larger clustering coefficient, indicating the non-random structure of the networks. September 22, 2018

56 Conclusions & Discussion
Networks from different experimental systems differed significantly in the values of the clustering coefficient. The average clustering coefficient over degree k fails to exhibit the power law behavior in most of the networks. The vertex density distribution follows a power law for all the networks studied. September 22, 2018


Download ppt "Network Analysis of Protein-Protein Interactions"

Similar presentations


Ads by Google