Network Analysis of Protein-Protein Interactions

Slides:



Advertisements
Similar presentations
Course Evaluation Form About The Course -Go more slowly (||) -More lectures (||) -Problem Sets, Class Projects (|||) -Software tools About The Instructor.
Advertisements

Network analysis Sushmita Roy BMI/CS 576
The Architecture of Complexity: Structure and Modularity in Cellular Networks Albert-László Barabási University of Notre Dame title.
Emergence of Scaling in Random Networks Albert-Laszlo Barabsi & Reka Albert.
Analysis and Modeling of Social Networks Foudalis Ilias.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
UC Davis, May 18 th 2006 Introduction to Biological Networks Eivind Almaas Microbial Systems Division.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
Sedgewick & Wayne (2004); Chazelle (2005) Sedgewick & Wayne (2004); Chazelle (2005)
Global topological properties of biological networks.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Models and Algorithms for Complex Networks Networks and Measurements Lecture 3.
ANALYZING PROTEIN NETWORK ROBUSTNESS USING GRAPH SPECTRUM Jingchun Chen The Ohio State University, Columbus, Ohio Institute.
School of Information University of Michigan SI 614 Network subgraphs (motifs) Biological networks Lecture 11 Instructor: Lada Adamic.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
LECTURE 2 1.Complex Network Models 2.Properties of Protein-Protein Interaction Networks.
Bioinformatics Center Institute for Chemical Research Kyoto University
Network resilience.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
1 Lesson 12 Networks / Systems Biology. 2 Systems biology  Not only understanding components! 1.System structures: the network of gene interactions and.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Lecture II Introduction to complex networks Santo Fortunato.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
The simultaneous evolution of author and paper networks
Graph clustering to detect network modules
Network (graph) Models
Structures of Networks
Bioinformatics 3 V6 – Biological Networks are Scale- free, aren't they? Fri, Nov 2, 2012.
Lecture 1: Complex Networks
Topics In Social Computing (67810)
Biological networks CS 5263 Bioinformatics.
Frontiers of Network Science Class 15: Degree Correlations II
Applications of graph theory in complex systems research
Network biology : protein – protein interactions
Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.
How Do “Real” Networks Look?
Section 8.6: Clustering Coefficients
Assessing Hierarchical Modularity in Protein Interaction Networks
Biological Networks Analysis Degree Distribution and Network Motifs
Section 8.6 of Newman’s book: Clustering Coefficients
How Do “Real” Networks Look?
How Do “Real” Networks Look?
Social Network Analysis
Models of Network Formation
Models of Network Formation
Peer-to-Peer and Social Networks Fall 2017
How Do “Real” Networks Look?
Clustering Coefficients
Modelling Structure and Function in Complex Networks
SEG5010 Presentation Zhou Lanjun.
Network Science: A Short Introduction i3 Workshop
Network Models Michael Goodrich Some slides adapted from:
Advanced Topics in Data Mining Special focus: Social Networks
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

Network Analysis of Protein-Protein Interactions Xiaohua Tony Hu College of Information Science & Technology Drexel University Philadelphia, PA 19104 http://www.cis.drexel.edu/faculty/thu 215-8950551(O) 215-8952494(F)

Outline Introduction Research Goals Topological Analysis The algorithm - CommBuilder Experimental Results Conclusions & Discussion Q&A September 22, 2018

Introduction Biological Networks Protein-Protein Interaction (PPI) Networks Network Community Structure Community Detection Network Growing September 22, 2018

protein-protein interactions Bio-chemical reactions Biological Networks GENOME PROTEOME protein-protein interactions Citrate Cycle METABOLISM Bio-chemical reactions September 22, 2018

Biological Networks Modeling biological systems Genetic networks Gene association and expression Protein networks Protein structure and interactions Metabolic pathways Challenging Non-trivial and irregular Incomplete and noisy September 22, 2018

Related Work The “small world” model was first proposed by Watts and Strogatz, referring to the small average distance between any two vertices in the network. Barabasi and Albert discovered a highly heterogeneous PPI network with scale-free connectivity properties in yeast., PPI networks of S. cerevisiae, H. pylori, C.elegans, and D. melanogaster [11-13]. Thus, the scale-free network model has been well accepted. In PPI networks, not only the degree distribution exhibits power-law dependence, other topological properties have also been shown scale-free topology such as clustering coefficient. Yook and colleagues observe that the clustering coefficient of S. cerevisiae follows a power-law. However, not all research agrees on the power-law behavior in all PPI networks. Thomas and colleagues find that the connectivity distribution in a human PPI network does not follow power law. They argue that current belief of power law distribution may reflect a behavior of a sampled subgraph. From a slightly different angle, Colizza and colleagues also evaluate three PPI networks constructed from yeast data sets. Although they observe that the connectivity distribution follows power law, only one of the three networks exhibits approximate power law behavior for the clustering coefficient. Soffer and Vazquez find that the power law dependence of the clustering coefficient is to some extent caused by the degree correlations of the networks, with high degree vertices preferentially connecting with low degree vertices. September 22, 2018

PPI Networks – Why? Proteins are executors of genetic program and rarely act alone Functional assignments of uncharacterized proteins Targets of new drugs September 22, 2018

PPI Networks - Properties PPI network models Scale-free (Barabasi & Albert, 1999) Geometric random (Przulj et al, 2004) Tolerant to random errors but fragile against the removal of the most connected nodes (hubs) Modularity and community structure September 22, 2018

PPI Networks - Models A.-L. B. and Z.N. Oltvai, Nat. Rev. Gen.(2004) September 22, 2018

Yeast PPI Network Nodes: proteins Edges : physical Interactions H. Jeong et al, Nature 411, 41-42 (2001). September 22, 2018

PPI Networks - Topology H. Jeong et al., Nature 411, 41-42 (2001) September 22, 2018

PPI Networks - Topology Origin of scale-free gene duplication Preferential attachment Vazquez et al. 2003; Sole et al. 2001; Bhan et al. 2002. September 22, 2018

Network Community Structure Gathering of vertices into groups such that the connections within groups are denser than between groups (Girvan & Newman, 2002) An important property of PPI networks Delineation of functional groups/processes Transfer of information September 22, 2018

Community Detection The GN algorithm (Girvan & Newman, 2002) Based on betweenness High computational cost Well adopted Metabolic networks (Holme et al, 2003) Functional units Gene networks (Wilkinson & Huberman, 2004) Related genes September 22, 2018

Network Growing Genetic regulatory networks (Hashimoto et al, 2004) Based on probabilistic Boolean networks Web (Flake et al, 2002) Based on the self-organization of the network structure and a maximum flow method September 22, 2018

Graph Theory Modeling real-world phenomena, e.g. World Wide Web, electronic circuits, collaborations between scientists, co-citations, biological networks, etc. A mathematical formalism Global properties: e.g. diameter, clustering, degree distribution Local properties: vertex density, motif and graphlet September 22, 2018

Graph Theory - Models Random Model Small-world Model The probability of an edge between two nodes is distributed randomly. Erdos & Renyi Small-world Model Small diameters and large clustering coefficients Watts & Strogatz September 22, 2018

Graph Theory - Models Scale-free Model Random Geometric Model Degree distribution follows a power law of the form P(k) ~ k−γ. Robustness and fragility Preferential attachment (graph evolution) A-L Barabasi Random Geometric Model Nodes randomly distributed in a geometric space (Przulj, et al) September 22, 2018

Research Goals To analyze the topological properties of protein-protein interaction networks Different organisms Different experimental systems Different confidence levels September 22, 2018

Research Goals What is the community to which a given protein belongs? Desirable and computationally more feasible to study a community containing a few proteins of interest September 22, 2018

Topological Analysis Method Protein-Protein Interaction Networks Constructed from different data sets Statistical analysis of topological properties SPSS September 22, 2018

Topological Analysis Data Sets DIP – Database of Interacting Proteins Species-specific PPI data sets: D. melanogaster (fly), S. cerevisiae (yeast), E. coli, C. elegans (worm), H. pylori, H. sapiens (human), M. musculus (mouse) BIND – Biomolecular Interaction Network Database Fly PPI with assigned confidence scores (Giot, 2003) GRID – General Repository for Interaction Datasets Yeast and fly PPI (including experiment systems used to obtain the data) September 22, 2018

Topological Analysis The experimental systems-specific set includes (1) fly and yeast PPI networks, downloaded from the General Repository for Interactions Datasets (GRID). From fly data set, we constructed three PPI networks, representing interactions detected by one of the following experimental systems: Enhancement (Fly-E), Suppression (Fly-S), and Two Hybrid (Fly-TH). From yeast data set, we constructed PPI networks representing three experimental systems: Affinity Precipitation (Yeast-AP), Synthetic Lethality (Yeast-SL), and Two Hybrid (Yeast-TH). (2) We also constructed a network representing the entire set of protein interactions (Fly and Yeast). The confidence levels-specific set contains fly data set downloaded from the Biomolecular Interaction Network Database (BIND). We constructed three networks: one with confidence >= 0.5 (Fly50), one with confidence >= 0.3 (Fly30), and the third containing all interactions (Fly00). September 22, 2018

Topological Analysis Definitions Graph G(V, E) V: vertex set Vertex (or Node) Degree: number of edges connected to the vertex. G(V, E) V: vertex set E: edge set |V|, |E|: sizes V1 e.g. |V| = 4 |E| = 6 Edge September 22, 2018

Topological Analysis P(k) ~ k-γ Degree distribution P(k) the probability of a vertex has degree of k. power law: P(k) ~ k-γ Diameter (length) the shortest path from one vertex to another September 22, 2018

Topological Analysis Clustering coefficient (C) Vertex Density (D) Ci = 2ei / (ki*(ki – 1)) ei : # of edges between neighbors of vertex i ki : # of neighboring vertices of i i not included in both Vertex Density (D) Same as C but includes i September 22, 2018

Table 1 Protein interaction networks of different organisms. PROTEINS INTERACTIONS COMPONENTS GIANT COM D. Melanogaster 7441 22636 52 7330 (98.5%) S. cerevisiae (Core) 2614 6379 66 2445 (93.5%) E. coli 1640 6658 200 1396 (85.1%) C. elegans 2629 3970 99 2386 (90.8%) H. pylori 702 1359 9 686 (97.7%) H. sapiens 1059 1318 119 563 (53.2%) M. musculus 327 274 79 49 (15.0%) Table 1 shows the small sizes of giant components for H. sapiens and especially for M. musculus, meaning that we have a fairly large number of unconnected small subgraphs in these two networks. As one can expect, the size of the giant component decreases in higher confidence networks while the number of unconnected subgraphs increases. September 22, 2018

Table 2 Protein interaction networks of D. melanogaster – different confidence. NETWORK CONFIDENCE PROTEINS INTERACTIONS COMPONENTS GIANT COMPONENT Fly00 > 0 7064 21111 68 6929 (98.1%) Fly30 >= 0.3 6382 9157 213 5881 (92.1%) Fly50 >= 0.5 4689 4877 590 3068 (65.4%) Across species, all networks exhibit small values of average degree and diameters, even though the absolute values differ significantly. Except for C. elegans, PPI networks for all other species have larger average clustering coefficient comparing to the corresponding random clustering coefficient, indicating a non-random and hierarchical structure within these networks. September 22, 2018

Protein interaction networks of S. cerevisiae: with confidence. Table 3 Protein interaction networks of S. cerevisiae: with confidence. NETWORK PROTEINS INTERACTIONS COMPONENTS GIANT COMPONENT YeastCore 2614 6379 66 2445 (93.5%) Yeast00 4770 15199 41 4687 (98.3%) September 22, 2018

Table 4 Protein interaction networks of D. melanogaster: different experimental systems NETWORK EXP SYSTEMS PROTEINS INTERACTIONS COMPONENTS GIANT COMPONENT Fly Combined 7938 25827 72 7793 (98.2%) Fly-E Enhancement 1054 1819 56 902 (85.6%) Fly-S Suppression 1121 2247 44 1020 (91.0%) Fly-TH Two Hybrid 5614 17544 12 5591 (99.6%) September 22, 2018

Affinity Precipitation Table 5 Protein interaction networks of S. cerevisiae: different experimental systems NETWORK EXP SYSTEMS PROTEINS INTERACTIONS COMPONENTS GIANT COMPONENT Yeast Combined 4918 18119 48 4824 (98.1%) Yeast-AP Affinity Precipitation 2388 7405 39 2292 (96.0%) Yeast-SL Synthetic Lethality 1468 4773 44 1343 (91.5%) Yeast-TH Two Hybrid 3937 6358 138 3632(92.3%) September 22, 2018

NETWORK Kmax <k> <l> <D> <C> <Crand> Table 6 Topology of protein interaction networks. NETWORK Kmax <k> <l> <D> <C> <Crand> D. melanogaster 178 6.08 4.39 0.5840 0.0159 0.0097 S. cerevisiae (Core) 111 4.88 5.00 0.6949 0.2990 0.0103 E. coli 152 8.12 3.73 0.8092 0.5889 0.1168 C. elegans 187 3.02 4.81 0.7889 0.0490 0.0462 H. pylori 54 3.87 4.14 0.6624 0.0255 0.0403 H. sapiens 33 2.49 6.80 0.7983 0.1658 0.0098 M. musculus 12 1.68 3.57 0.8670 0.1011 0.0062 September 22, 2018

NETWORK Kmax <k> <l> <D> <C> Table 6 (conti.) Topology of protein interaction networks. NETWORK Kmax <k> <l> <D> <C> <Crand> S. cerevisiae 283 6.37 4.18 0.6001 0.0928 0.0196 S. cerevisiae (Core) 111 4.88 5.00 0.6949 0.2990 0.0103 Fly00 178 5.98 4.45 0.5939 0.0281 0.0095 Fly30 59 2.87 7.06 0.7227 0.0518 0.0015 Fly50 42 2.08 9.42 0.8010 0.0793 0.0008 September 22, 2018

Topology of protein interaction networks. Table 6 (conti.) Topology of protein interaction networks. NETWORK Kmax <k> <l> <D> <C> <Crand> Fly 178 6.51 4.39 0.6005 0.0675 0.0104 Fly-E 110 3.45 4.44 0.8085 0.3441 0.0725 Fly-S 124 4.01 4.30 0.7875 0.3459 0.0735 Fly-TH 144 6.25 4.23 0.5870 0.0093 0.0123 Yeast 288 7.37 4.12 0.6000 0.1538 0.0240 Yeast-AP 69 6.20 4.43 0.6638 0.2646 0.0163 Yeast-SL 157 6.50 3.84 0.7150 0.2324 0.1600 Yeast-TH 3.23 4.96 0.7362 0.0869 0.0368 September 22, 2018

Observations:Average Topological Properties of the PPI Networks Across species, all networks exhibit small values of average degree and diameters, even though the absolute values differ significantly. Except for C. elegans, PPI networks for all other species have larger average clustering coefficient comparing to the corresponding random clustering coefficient, indicating a non-random and hierarchical structure within these networks. Networks with higher confidence have larger diameters, larger average clustering coefficient, and a smakker average degree. They shift further away from random structure. September 22, 2018

Figure 1 Degree Distribution P(k): PPI of Different Organisms September 22, 2018

Figure 2 Degree Distribution P(k) of yeast: with Confidence. September 22, 2018

Figure 3 Degree Distribution P(k) of Fly: with Confidence. September 22, 2018

Figure 4 Degree Distribution P(k): Methodology Difference. September 22, 2018

Observations: Degree Distribution P(k) The log-log plot clearly demonstrates the power law dependence of P(k) on degree k. For our analysis, we select to use directly the raw data, instead of following [4] with exponential cutoff. Without exponential cutoff, our regression analysis yields power law exponent γ between 1.31 and 2.76, in fairly good agreement with previously reported results. Using SPSS software package, we create a scatter plot of residues by fit values for the power law model. The result, shown clearly indicates a pattern in the data that is not captured by the model. This means that the power law is a model that has excellent fit statistics, but poor residuals, an indication of its inadequacy. September 22, 2018

Figure 5 Average clustering coefficient C(k): Different Organisms. September 22, 2018

Figure 6 Average clustering coefficient C(k): with Confidence. September 22, 2018

Figure 7 Average clustering coefficient C(k): Methodology Difference. September 22, 2018

Observations: The Average Clustering Coefficient indicate that while E. coli and S. cerevisiae PPI networks show somewhat weak power law distribution, networks of other species do not follow a power law. Different experimental systems and different confidence levels do not seem to change this non-scale-free behavior. September 22, 2018

Figure 8 Average vertex density D(k): Different Organisms September 22, 2018

Figure 9 Average vertex density D(k): with Confidence. September 22, 2018

Figure 10 Average vertex density D(k): Methodology Difference. September 22, 2018

Observations: The Average Vertex Density All networks display consistent power law behavior for the vertex density spectrum September 22, 2018

Topological Analysis Exponents Degree Distribution: P(k) ~ k-γ Clustering Coeffient: C(k) ~ k-α Vertex Density: D(k) ~ k-β September 22, 2018

Statistical analysis of the protein interaction networks. Table 7 Statistical analysis of the protein interaction networks. NETWORKS γ (R2) α (R2) β (R2) D. melanogaster 1.945 (0.923) 3.050 (0.311) 0.836 (0.989) E. coli 1.355 (0.882) 0.562 (0.656) 0.536 (0.756) C. elegans 1.599 (0.839) 0.625 (0.362) 0.833 (0.976) H. pylori 1.651 (0.899) 0.495 (0.373) 0.826 (0.985) H. sapiens 2.025 (0.931) 0.657 (0.190)* 0.626 (0.699) M. musculus 2.360 (0.931) 0.598 (0.431)* 0.689 (0.965) S. cerevisiae (Core) 1.977 (0.911) 0.893 (0.721) 0.759 (0.867) * P > 0.05 September 22, 2018

Statistical analysis of the protein interaction networks. NETWORKS Table 7 (conti.) Statistical analysis of the protein interaction networks. NETWORKS γ (R2) α (R2) β (R2) Fly00 1.980 (0.930) 0.382 (0.194) 0.789 (0.913) Fly30 2.540 (0.931) 0.698 (0.265) 0.780 (0.918) Fly50 2.763 (0.915) 0.791 (0.375)* 0.783 (0.920) S. cerevisiae (Core) 1.977 (0.911) 0.893 (0.721) 0.759 (0.867) S. cerevisiae 1.792 (0.883) 1.106 (0.525) 0.894 (0.872) * P > 0.05 September 22, 2018

Statistical analysis of the protein interaction networks. NETWORKS Table 7 (conti.) Statistical analysis of the protein interaction networks. NETWORKS γ (R2) α (R2) β (R2) Fly 1.947 (0.934) 0.555 (0.334) 0.758 (0.865) Fly-E 1.518 (0.858) 1.020 (0.539) 0.769 (0.886) Fly-S 1.527 (0.936) 0.879 (0.513) 0.747 (0.893) Fly-TH 1.912 (0.923) ND 0.783 (0.867) Yeast 1.761 (0.919) 0.752 (0.326) 0.728 (0.698) Yeast-AP 1.819 (0.904) 0.635 (0.301) 0.619 (0.664) Yeast-SL 1.311 (0.830) 0.650 (0.342) 0.674 (0.734) Yeast-TH 1.614 (0.843) 1.453 (0.664) 0.947 (0.918) *P > 0.05 ND: Not Determined September 22, 2018

Residues vs fit values for P(k)~k-γ. September 22, 2018

Discussion Our results confirmed that PPI networks have small diameters and small average degrees. All networks we evaluated display power law degree distribution. However, further statistical analysis indicates an inadequacy of such model in capturing certain features in the data. Most of the networks we evaluated also reveal a larger clustering coefficient, indicating non-random structure of these networks. However, the values of the clustering coefficient vary significantly across different species. This may result from the incompleteness and noise of the data, indicated by the significant differences in the clustering coefficient between networks with different confidence levels. In addition, networks from different experimental systems differ significantly in the clustering coefficient. The spectrum of the average clustering coefficient over degree k fails to exhibit scale free behavior in most of the networks evaluated. we did not observe the power law distribution of C(k) over degree k, but the power law behavior appears when we modify the C(k) to D(k). We expect this information will be helpful because we have already seen the application of vertex density in [23]. September 22, 2018

Conclusions & Discussion PPI networks have small diameters and small average degrees. Most of the PPI networks display power law degree distribution. However, further statistical analysis indicates inadequacy of the power-law model. Most networks have a larger clustering coefficient, indicating the non-random structure of the networks. September 22, 2018

Conclusions & Discussion Networks from different experimental systems differed significantly in the values of the clustering coefficient. The average clustering coefficient over degree k fails to exhibit the power law behavior in most of the networks. The vertex density distribution follows a power law for all the networks studied. September 22, 2018