Download presentation
Published byKarin Gray Modified over 9 years ago
1
Optimal Network Alignment with Graphlet Degree Vectors
Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science, University of California) Weng Leong Ng (Department of Computer Science, University of California), Wayne Hayes (Department of Computer Science, University of California && Department of Mathematics, Imperial College London) Nataša Pržulj (Department of Computing, Imperial College London) Cancer Informatics 2010 Presented by: Lila Shnaiderman
2
Motivation Lately, advances in experimental techniques:
yeast two-hybrid assay, Mass spectrometry of purified complexes, genome-wide chromatin immunoprecipitation, etc. So, increasing amounts of biological network data becoming available! Comparative analyses of biological networks have as large an impact as comparative genomics on: understanding of biology Evolution disease So, meaningful network comparisons across species becomes one of the foremost problems in evolutionary and systems biology!!!
3
Background Subgraph isomorphism problem: Network alignment: Unclear:
Is one graph exists as an exact subgraph of another graph. NP-complete complexity So, network comparisons are computationally infeasible… Network alignment: The most common network comparison method. Is more general problem: Find the best way to “fit” a graph into another graph (not an exact subgraph) Unclear: how to guide the alignment process how to measure the “goodness” of an inexact fit So, heuristic strategies must be sought
4
Background – alignment types
Local alignment: The majority of existing methods. match a small sub network from one network to one or more sub networks in another network. Can be ambiguous… Global alignment: Measures the overall similarity between two networks. Aligns every node in the smaller network to exactly one node in the larger network. most existing methods incorporate some a priori information external to network topology like protein sequence similarities in PPIs networks, etc. Best known global network alignment algorithm based solely on network topology: GRAph ALigner (GRAAL): uses a heuristic search strategy to quickly find approximate alignments
5
Current solution: H-GRAAL
Hungarian-algorithm based GRAAL More expensive Guaranteed to find optimal alignments relative to any fixed, deterministic cost function. Relies solely and explicitly on a strong and direct measure of network topological similarity. Applicable to any type of networks Allows to transfer the knowledge between aligned networks.
6
Graphlet degree vectors (1)
A small connected induced sub graph of a larger network. 4 5 1 2 6 7 3 8 G0 G1 G2 G3 G4 G5 13 12 11 10 9 14 G6 G7 G8
7
Graphlet degree vectors (2)
Graphlet degrees vector of node V: counts the number of different graphlets that the node touches (for all graphlets on 2 to 5 nodes). v v v v Orbit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 GDV(V)
8
Graphlet degree vectors (3)
1 2 orbit Orbit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 GDV(V)
9
Graphlet degree vectors (4)
1 2 v v Orbit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 GDV(V)
10
Graphlet degree vectors (4)
5 v ? 3 Orbit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 GDV(V) Orbit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 GDV(V)
11
Graphlet degree vectors (5)
4 5 v v Orbit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 GDV(V)
12
Graphlet degree vectors (6)
7 v v 8 v Orbit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 GDV(V) Orbit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 GDV(V)
13
Graphlet degree vectors (7)
11 10 9 v v What is the degree of node V (according to the vector)? The signature of node V There are 73 different orbits across all 2-5-node graphlets Orbit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 GDV(V)
14
Degree Vector - Signature
Many real-world Networks: Have a small-world nature So, degree Vector is an effective measure: Looks at network distance of 4 around a node Captures a large portion of network topology Thus, comparing two signatures: Highly constraining measure of local topological similarity between nodes.
15
Signature similarity For uG, ui: = Questions:
the ith coordinate of its signature vector. Distance: wi is the weight of orbit i. Accounts for dependencies between orbits higher weights to orbits that are not affected by many other orbits Questions: Why log? Why “+1”?
16
Distance and Similarity
Total Distance: in (0,1) O means: u,v identical Similarity: S(u,v) = 1-D(u,v)
17
H-GRAAL algorithm-definitions
G1 and G2 are networks: |V(G1)|<|V(G2)| Alignment of G1 to G2: set of ordered pairs (u,v), u ∈ V (G1) and v ∈ V (G2) no two ordered pairs share the same G1-node or the same G2-node. Each pair called aligned pair. Maximum alignment: Every G1-node is in some aligned pair From now on: alignment=maximum alignment
18
H-GRAAL algorithm H-GRAAL: Produces an alignment:
Hungarian-algorithm-based GRAph Aligner Produces an alignment: of minimum total cost between networks total cost: summed over all aligned pairs aligned pair cost: based on signature similarity The cost of aligning u and v: favors alignment of the densest parts of the networks; Reduced as the degrees of both nodes increase: higher degree nodes with similar signatures provide a tighter constraint α ∈ [0, 1]: weighs the cost-function contributions of the node signature similarity between u and v 1 − α: weights the contribution of nodes degrees.
19
Alignment Cost Any problem with this formula?
Cost=0: a pair of topologically identical nodes u and v Cost close to 2: a pair of topologically very different nodes. Any problem with this formula? T(u,v) for most nodes is very low: As, there is small number of hubs (highly-linked nodes), So max_deg(G1) and max_deg(G2) are much larger than deg(u) and deg(v).
20
Hungarian Algorithm solves the assignment problem in polynomial time:
Create two bipartite graphs V(G1), V(G2). Edge (u,v) from V(G1) to V(G2): labeled with the node alignment cost. Find perfect match between them (with minimal cost). More than one optimal alignment is possible: the particular found alignment is highly dependent on the implementation details of the underlying Hungarian algorithm. For example: the order of presenting the nodes to the algorithm
21
Finding Few Optimal Alignment
Can learn about all possible optimal matchings. Make H-GRAAL to give more alignments: “Remove” (u,v): raise the alignment cost of a node-pair (u,v) in A0 to +∞ Run H-GRAAL again Found alignment with higher cost than A0, “Remove” different edge. After trying to “remove” all edges, if not found alignment with optimal cost, no more optimal alignments exist. This process has too high complexity… O(|V(G1)|3x||E(G1)|) There exist a fix O(|V(G1)|2x||E(G1)|) (based on dynamic Hungarian algorithm). My remark: still very slow (can take months…)
22
Few Optimal Alignment algorithm
Optimizing aligned pair: Appears in at least one optimal alignment. The set of optimizing pairs: Can be computed in at worst O(n4) time. Can be easily parallelized. My remark: too slow…
23
Few Optimal Alignments - Analysis
Significance of aligned pair: According to number of optimizing pairs per u. If (u,v) were the only optimizing pair for u: every optimal alignment contains (u,v). I.e., (u,v) is highly significant. Core alignment: the set of all such special optimizing pairs. Large core alignment means: stable alignment.
24
Measures of alignment quality (1)
Edge correctness (EC) – percentage of edges in one graph that are aligned to edges in the other graph. To be able to measure the following measurements, must know the “true alignment” … Node correctness (NC) – percentage of nodes in one network that are correctly aligned to nodes in the other network Interaction correctness (IC) – percentage of interactions that are aligned correctly IC is stricter than EC: EC does not require that the alignment partners are the correct ones
25
Measures of alignment quality (2)
Usually the “true alignment” is not known So, can measure just EC… two alignments possibly can have similar ECs, where one alignment is “good” and the other is “bad” EC is not enough… To uncover regions of similar topology: the aligned edges must cluster together and form large and dense connected sub-graphs. Common connected sub-graph (CCS): connected sub-graph that appears in both networks Good alignment has: large and dense CCSs. Large EC
26
Statistical Significance
Random alignment of real-world networks: the probability of obtaining a given or better EC at random. Null model of random alignment: Random mapping g: E1 → V1 × V2. n1 = |V1|, n2 = |V2|, m1 = |E1|, and m2 = |E2|. p = n2 (n2 − 1)/2: the number of node pairs in G2 EC = x%: the edge correctness of the given alignment k = [m1 × x]: the number of aligned edges from G1 to edges in G2. P: the probability of successfully aligning k or more edges by chance (the tail of the hypergeometric distribution): .
27
More statistical Significance Metrics
H-GRAAL’s alignment of random model networks: Checks the significance of the alignment in compare to alignment of random networks: Align two PPI networks, align them with random networks, compare results. Biological Validation: find the number of aligned protein pairs sharing a Gene Ontology (GO) term. Compute its statistical significance. Significance of functional enrichments: Align metabolic networks of different species generate phylogenetic trees based on H-GRAALs ECs.
28
Results (1) H-GRAAL always produces better alignments than GRAAL for all values of α. using only degrees (α = 0) gives bad results. So, graphlet-based signatures are far more valuable than a measure based on degree alone.
29
Results (2) The largest common connected sub-graph in the alignment of the yeast and human PPI networks consisting of 1,290 interactions amongst 317 proteins. This network appears, in its entirety, in the PPI networks of both species.
30
Results (3) Statistics of H-GRAAL’s core yeast-human alignment for α = 0.5. The percentage of yeast proteins, out of 2,390 of them, that participate in n “optimizing pairs”. Shows the quality of H-GRAAL!
31
Results (4) Comparison of the phylogenetic trees for protists and fungies H-GRAAL’s and GRAAL’s tree are slightly different from the sequence-based one. Sequence-based trees are built based on: multiple alignment of gene sequences whole genome alignments.
32
Results (5) Multiple alignments have few problems:
Can be misleading due to gene rearrangements, inversions, transpositions, and translocations (at the substring level) Different species might have an unequal number of genes or genomes of vastly different lengths. Whole genome alignments can be misleading: Noncontiguous copies of a gene or non-decisive gene order. The trees are built incrementally from smaller pieces that are “patched” together probabilistically probabilistic errors expected. H-GRAAL’s and GRAAL’s have none of these. But There are noise problems Incompleteness of PPI networks. No reason to believe that the sequence-based tree or GRAAL’s one should a priori be considered the correct one
33
Conclusions Presented H-GRAAL algorithm for global alignment between networks Presented different statistics to evaluate the quality of the alignment. Experimented with different PPI networks, and not only PPI. Showed that H-GRAAL is the best known global alignment algorithm. H-GRAAL can have huge influence on researching biological networks!
34
Thank you for your attention!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.