Download presentation
Presentation is loading. Please wait.
Published byNelson Harrison Modified over 9 years ago
1
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: http://www.scigen.org/csce555 University of South Carolina Department of Computer Science and Engineering 2008 www.cse.sc.edu. www.cse.sc.edu
2
In the beginning there was DNA… Liolios K, Tavernarakis N, Hugenholtz P, Kyrpides, NC. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. NAR 34, D332-334NAR 34, D332-334
3
…then came protein interactions Arabidopsis PPI network E. Coli PPI network Yeast PPI network
4
Comparative Genomics to Comparative Interactomics Evolutionary conservation implies functional relevance ◦ Sequence conservation implies functional conservation ◦ Network conservation implies functional conservation too! What new insights might we gain from network comparisons? (Why should we care?)
5
Network comparisons allow us to: Identify conserved functional modules Query for a module, ala BLAST Predict functions of a module Predict protein functions Validate protein interactions Predict protein interactions Only possible with network comparisons Possible with existing techniques, but improved with network comparisons
6
What is a Protein Interaction Network? Proteins are nodes Interactions are edges Edges may have weights Yeast PPI network H. Jeong et al. Lethality and centrality in protein networks. Nature 411, 41 (2001)
7
The Network Alignment Problem Given k different protein interaction networks belonging to different species, we wish to find conserved sub- networks within these networks Conserved in terms of protein sequence similarity (node similarity) and interaction similarity (network topology similarity)
8
Example Network Alignment Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006
9
General Framework For Network Alignment Algorithms Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006 Network construction Scoring function Alignment algorithm
10
Building Co-expression Networks 1.81 1 -.6 -.7 Gene A Gene B Gene C Gene B Gene A Gene C Pearson Correlation =.8 -.7 -.6 Genes Arrays Microarray data Expression Balaji S. Srinivasan
11
Two Algorithms NetworkBLAST (covered today) Sharan et al. Conserved patterns of protein interaction in multiple species. PNAS, 102(6):1974-1979, 2005. Græmlin Flannick et al. Græmlin: General and robust alignment of multiple large interaction networks. Genome Res 16: 1169-1181, 2006.
12
Overview of Sharan et al. Conserved patterns of protein interaction in multiple species. PNAS, 102(6):1974-1979, 2005.
13
Estimation of Interaction Probabilities In the preprocessing step, edges in the network are given a reliability score using a logistic regression model based on three features: 1.Number of times an interaction was observed 2.Pearson correlation coefficient between expression profiles 3.Proteins’ small world clustering coefficient
14
Network Alignment Graphs Construct a Network Alignment Graph to represent the alignment Nodes contain groups of sequence similar proteins from the k organisms Edges represent conserved interactions. An edge between two nodes is present if: 1.One pair of proteins directly interacts, the rest are distance at most 2 away 2.All protein pairs are of distance exactly 2 3.At least max(2, k – 1) protein pairs directly interact Tries to account for interaction deletions
15
Example Network Alignment Graph Nodes a b c a’ b’ c’ a’’ b’’ c’’ a b c a’ b’ c’ a’’ b’’ c’’ Network alignment graph Individual species’ PPI network Species XSpecies YSpecies Z
16
Scoring Function Sharan et al. devise a scoring scheme based on a likelihood model for the fit of a single sub-network to the given structure High scoring subgraphs correspond to structured sub-networks (cliques or pathways) Only network topology is scored, node similarity is not
17
Log Likelihood Ratio Model Measures the likelihood that a subgraph occurs if it is a conserved network vs. that if it were a randomly constructed network Randomly constructed network preserves degree distribution for nodes log Pr(Subgraph occurs | Conserved Network) Pr(Subgraph occurs | Random Network)
18
Log Likelihood Ratio Model (i) in a real subnetwork, each interaction should be present independently with high probability, and (ii) in a random subnetwork, the probability of an interaction between any two proteins depends on their total number of connections in the network.
19
Likelihood Ratio Scoring of a Protein Complex in a Single Species U : a subset of vertices (proteins) in the PPI graph O U : collection of all observations on vertex pairs in U O uv : interaction between proteins u, v observed M s : conserved network model M n : random network (null) model T uv : proteins u, v interact F uv : proteins u, v do not interact β : probability that proteins u, v interact in conserved model p uv : probability that edge u, v exists in a random model Probability of complex being observed in a conserved network model Probability of subgraph being observed in a random network model
20
Likelihood Ratio Scoring of a Protein Complex in a Single Species Hence, log likelihood for a complex occurring in a single species is given by For multiple complexes across different species, it is the sum of the log likelihoods L(A, B, C) = L(A) + L(B) + L(C)
21
Example of Complex Scoring Nodes a b c a’ b’ c’ a’’ b’’ c’’ a b c a’ b’ c’ a’’ b’’ c’’ Conserved complex A in the Network alignment graph Individual species’ PPI network L(A) = L(X1) + L(Y1) + L (Z1) Complex X1 in Species X Complex Y1 in Species Y Complex Z1 in Species Z
22
Alignment algorithm Problem of identifying conserved sub- networks reduces to finding high scoring subgraphs NP-complete problem Heuristic solution: ◦ Greedy extension of high scoring seeds ◦ (Does this sound familiar? BLAST?) ◦ Common to both papers discussed
23
Alignment algorithm 1. Find seeds for each node v in the alignment graph a.Find high scoring paths of 4 nodes by exhaustive search b.Greedily add 3 other nodes one by one, that maximally increase the score of the seed
24
Alignment algorithm 2. Iteratively add or remove nodes to increase the overall score of the node Original seeds are preserved Limit size of discovered subgraphs to 15 nodes Record up to 4 highest scoring subgraphs discovered around each node
25
Alignment algorithm 3. Filter subgraphs with a high degree of overlap Iteratively find high scoring subgraph and remove all highly overlapping ones remaining
26
Results Conserved network regions within yeast (orange ovals), fly (green rectangles) and worm (blue hexagons) PPI networks.
27
Results Prediction of protein function ‘Guilt by association’ If a conserved cluster or path is significantly enriched in a functional annotation Prediction of protein interactions Predictions based on 2 strategies: Evidence that proteins with similar sequences interact Co-occurrence of proteins in the same conserved cluster or path Experimental verification of Yeast interactions using Y2H yielded 40-62% success rate
28
Overview of Fast, scalable, network alignment ◦ Scales linearly in number of networks compared ◦ NetworkBLAST scales exponentially Supports efficient querying of modules Speed-sensitivity control via user defined parameter ◦ Not supported in NetworkBLAST
29
Input to the Algorithm Weighted protein interaction graphs ◦ Weights represent probability that proteins interact ◦ Constructed via network integration algorithm A phylogenetic tree relating the species in the desired alignment ◦ Used for progressive alignment
30
Key Ideas of Graimin Generating An Initial Alignment From The Seed Greedy Seed Extension Phase Progressive alignment technique using the phylogenetic tree
31
Results Functional module identification using network alignment
32
Results Multiple alignment of 10 networks showing possible cell division module Functional annotation using network alignment
33
The Future of Network Comparison Græmlin Græmlin? Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006
34
Summary The problem: Network comparison/comparative interactomes NetworkBlast algorithm Brief introduction fo The analogy between sequence comparison and network comparison
35
Reference & Acknowledgements Chuan Sheng Foo Sharan et al. Conserved patterns of protein interaction in multiple species. PNAS. February 8, 2005 | vol. 102 | no. 6 | 1974-1979
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.