Download presentation
Presentation is loading. Please wait.
Published byMalcolm Peake Modified over 9 years ago
1
School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T. Vogelstein Christos Faloutsos SDM, 2-5 May 2013, Texas-Austin, USA
2
CMU Duke Problem Definition: Graph Similarity Given: (i) 2 graphs with the same nodes and different edge sets (ii) node correspondence Find: similarity score s [0,1] © Danai Koutra (CMU) - SDM'13 2 GAGA GBGB
3
CMU Duke Problem Definition: Graph Similarity Given: (a) 2 graphs with the same nodes and different edge sets (b) node correspondence Find: similarity score, s [0,1] © Danai Koutra (CMU) - SDM'13 3 GAGA GBGB
4
CMU Duke Motivation (1) © Danai Koutra (CMU) - SDM'13 4 Discontinuity Detection Day 1 Day 2 Day 3 Day 4 Day 5 2 2 Classification 1 1 different brain wiring?
5
CMU Duke Motivation (2) © Danai Koutra (CMU) - SDM'13 5 Intrusion detection 4 4 Behavioral Patterns 3 3 FB message graph vs. wall-to-wall network
6
CMU Duke Problem: Graph Similarity Is there any obvious solution? © Danai Koutra (CMU) - SDM'13 6
7
CMU Duke One Solution Edge Overlap (EO) # of common edges (normalized or not) © Danai Koutra (CMU) - SDM'13 7 GAGA GBGB
8
CMU Duke … but “barbell”… EO(B10,mB10) == EO(B10,mmB10) © Danai Koutra (CMU) - SDM'13 8 GAGA GAGA GBGB G B’
9
CMU Duke Contributions Theory Axioms Desired Properties Practice D ELTA C ON algorithm Real-world applications Experiments on synthetic & real graphs © Danai Koutra (CMU) - SDM'13 9 Delta Connectivity
10
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 10
11
CMU Duke Intuition (1) STEP 1: Compute the pairwise node influence, S A & S B © Danai Koutra (CMU) - SDM'13 11 GAGA GBGB SA =SA = S B =
12
CMU Duke Intuition (2) STEP 2: Find the similarity between S A & S B. © Danai Koutra (CMU) - SDM'13 12 SA =SA = S B =
13
CMU Duke Intuition (2) STEP 2: Find the similarity between S A & S B. sim( S A, S B ) = 0.3 © Danai Koutra (CMU) - SDM'13 13 S B = SA =SA =
14
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 14
15
CMU Duke … many similarity functions can be defined… But … © Danai Koutra (CMU) - SDM'13 15 … what properties should a good similarity function have?
16
CMU Duke Axioms © Danai Koutra (CMU) - SDM'13 16 A1. Identity property sim(, ) = 1 A2. Symmetric property sim(, ) = sim(, ) A3. Zero property sim(, ) = 0
17
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 17
18
CMU Duke Desired Properties (1) © Danai Koutra (CMU) - SDM'13 18 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability
19
CMU Duke Desired Properties (2) © Danai Koutra (CMU) - SDM'13 19 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability Creation of disconnected components matters more than small connectivity changes.
20
CMU Duke Desired Properties (3) © Danai Koutra (CMU) - SDM'13 20 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability The bigger the edge weight, the more the edge change matters. w=5 w=1 ✗ ✗
21
CMU Duke Desired Properties (4) © Danai Koutra (CMU) - SDM'13 21 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability “Diminishing Returns”: The sparser the graphs, the more important is a ‘’fixed’’ change. n=5 GAGA GAGA GBGB GBGB
22
CMU Duke Desired Properties (1) © Danai Koutra (CMU) - SDM'13 22 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability Targeted changes are more important than random changes of the same extent. GAGA targeted G B’ random G B
23
CMU Duke How do state-of-the-art methods fare? © Danai Koutra (CMU) - SDM'13 23 MetricP1P2P3P4 Vertex/Edge Overlap ✗✗✗ ? Graph Edit Distance (XOR) ✗✗✗ ? Signature Similarity ✗✔✗ ? λ-distance (adjacency matrix) ✗✔✗ ? λ-distance (graph laplacian) ✗✔✗ ? λ-distance (normalized lapl.) ✗✔✗ ? D ELTA C ON 0 ✔✔✔✔ D ELTA C ON ✔✔✔✔ edge weight returns focus
24
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Experiments Applications Related Work Conclusions © Danai Koutra (CMU) - SDM'13 24
25
CMU Duke Proposed algorithm: D ELTA C ON 0 ①Find the pairwise node influence, S A & S B. © Danai Koutra (CMU) - SDM'13 25 SA =SA = S B = BASE ALGO
26
CMU Duke STEP 1: How to compute node influence? A1: Pagerank A2: Personalized Random Walk with Restart (RWR) A3: Lazy RWR A4: “Electrical network analogy” - resistances A5: Belief Propagation F A BP … © Danai Koutra (CMU) - SDM'13 26
27
CMU Duke STEP 1: Intuition of BP © Danai Koutra (CMU) - SDM'13 27 BACKGROUND iterative message-based method Iteration 1 Iteration 2 0 0 0 e.g., CS person
28
CMU Duke STEP 1: Fast BP (1) 1 d1 d2 d3 d1 d2 d3 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 ? ? 0 1 0 1 0 © Danai Koutra (CMU) - SDM'13 28 BACKGROUND i th row similar to RWR
29
CMU Duke STEP 1: Fast BP (1) 1 d1 d2 d3 d1 d2 d3 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 ? ? 0 1 0 1 0 © Danai Koutra (CMU) - SDM'13 29 BACKGROUND i th row similar to RWR strength of influence between neighbors
30
CMU Duke STEP 1: Fast BP (1) 1 d1 d2 d3 d1 d2 d3 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 ? ? 0 1 0 1 0 © Danai Koutra (CMU) - SDM'13 30 BACKGROUND i th row similar to RWR final influence from node i strength of influence between neighbors
31
CMU Duke STEP 1: Fast BP (2) 1 d1 d2 d3 d1 d2 d3 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 ? ? 0 1 0 1 0 © Danai Koutra (CMU) - SDM'13 31 i th row 1 0.2 0.1 0.3 1 0.2 0 0.5 1 1 0.2 0.1 0.3 1 0.2 0 0.5 1 OR pairwise influence matrix:
32
CMU Duke STEP 1: Why FaBP? 1)Sound theoretical background (MLE on marginals) 2)Fast: linear on the edges 3)Attenuating Neighboring Influence © Danai Koutra (CMU) - SDM'13 32 DETAILS
33
CMU Duke STEP 1: Why FaBP? 1)Sound theoretical background (MLE on marginals) 2)Fast: linear on the edges 3)Attenuating Neighboring Influence for small ε: © Danai Koutra (CMU) - SDM'13 33 1-hop 2-hops … ε > ε 2 >... 0<ε<1 INTUITION
34
CMU Duke Proposed algorithm: D ELTA C ON 0 ①Find the pairwise influence (F A BP), S A & S B. ②Find distance. © Danai Koutra (CMU) - SDM'13 34 SA,SBSA,SB = Matusita distance SA =SA = S B = BASE ALGO BASE ALGO
35
CMU Duke Proposed algorithm: D ELTA C ON 0 ①Apply F A BP to find the pairwise influence matrices, S A & S B. ②Find distance. ①Find similarity, © Danai Koutra (CMU) - SDM'13 35 SA,SBSA,SB = Matusita distance SA =SA = S B = BASE ALGO BASE ALGO
36
CMU Duke … but O(n 2 ) … © Danai Koutra (CMU) - SDM'13 36 f a s t e r ?
37
CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (1) © Danai Koutra (CMU) - SDM'13 37 1a Create g disjoint & covering node groups. 1 4 2 3 A = 4 3 2 1 Adjacency matrix FASTE R ALGO FASTE R ALGO
38
CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (2) © Danai Koutra (CMU) - SDM'13 38 1a Create g disjoint & covering node groups. 1b For group i, find node-group influence (F A BP) 1 4 2 3 FASTE R ALGO FASTE R ALGO
39
CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (3) © Danai Koutra (CMU) - SDM'13 39 1b e.g., for group 1, find node-group influence (F A BP): S’ A = 12341234 g r o u p s INTUITION SA =SA = 1 2 3 4 row-wise
40
CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (4) © Danai Koutra (CMU) - SDM'13 40 1a Create g disjoint & covering node groups. 1b For group i, find node-group influence (F A BP) 1c Create node-group influence matrices, S’ A & S’ B. 1 4 2 3 S’ B = S’ A = 12341234 12341234 g r o u p s FASTE R ALGO FASTE R ALGO
41
CMU Duke Proposed Algorithm: D ELTA C ON (5) © Danai Koutra (CMU) - SDM'13 41 1a Create g disjoint & covering node groups. 1b For group i, find node-group influence (F A BP) 1c Create node-group influence matrices, S’ A & S’ B. 1 4 2 3 FASTE R ALGO FASTE R ALGO S’ B = S’ A = 12341234 12341234 g r o u p s
42
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications ENRON: anomaly detection Brain Graphs: clustering Experiments Conclusions © Danai Koutra (CMU) - SDM'13 42
43
CMU Duke Temporal Anomaly Detection in ENRON (1) © Danai Koutra (CMU) - SDM'13 43 Nodes: employees Edges: email exchange D ELTA C ON similarities of consecutive timestamps Day 1 Day 2 Day 3 Day 4 Day 5 sim 1 sim 2 sim 3 sim 4
44
CMU Duke Temporal Anomaly Detection in ENRON (2) © Danai Koutra (CMU) - SDM'13 44 similarity consecutive days IMR
45
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications ENRON: anomaly detection Brain Graphs: clustering Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 45
46
CMU Duke Brain Connectivity Graph Clustering (1) © Danai Koutra (CMU) - SDM'13 46 114 aligned connectomes (FMRI) Nodes: 70 cortical regions Edges: connections Attributes: gender, IQ, age…
47
CMU Duke Brain Connectivity Graph Clustering (2) © Danai Koutra (CMU) - SDM'13 47 ①pairwise D ELTA C ON similarities ②hierarchical clustering ③t-test / ANOVA for given attributes Ward’s linkage
48
CMU Duke Brain Connectivity Graph Clustering (3) © Danai Koutra (CMU) - SDM'13 48 High CCI Low CCI t-test / ANOVA for given attributes p-value = 0.0057
49
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Scalability Conclusions © Danai Koutra (CMU) - SDM'13 49
50
CMU Duke Scalability Dataset: Kronecker graphs D ELTA C ON is linear on the edges + groups; O(g×n + g×(m 1 +m 2 ). Dataset: Kronecker graphs D ELTA C ON is linear on the edges + groups; O(g×n + g×(m 1 +m 2 ). # of edges = max{m 1,m 2 } runtime (min) © Danai Koutra (CMU) - SDM'13 50 SLOPE = 1 # of edges in G A & G B # of nodes
51
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 51
52
CMU Duke State-of-the-art Approaches Vertex/Edge Overlap [Papadimitriou, Dasdan, Garcia-Molina. JISA’10] Graph Edit Distance [Papadimitriou, Dasdan, Garcia-Molina. JISA’10] Signature Similarity (SimHash algorithm) [Papadimitriou, Dasdan, Garcia-Molina. JISA’10] λ-distance [Peabody ’03; Bunke, Dickinson, Kraetzl, Wallis ‘06] … © Danai Koutra (CMU) - SDM'13 52
53
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 53
54
CMU Duke Conclusions Theory Axioms Desired Properties Practice D ELTA C ON algorithm principled intuitive and scalable Real-world applications Experiments on synthetic & real graphs © Danai Koutra (CMU) - SDM'13 54 axioms properties linear on input Temporal anomaly detection + brain scans classification
55
CMU Duke Thank you! © Danai Koutra (CMU) - SDM'13 55
56
CMU Duke Backup slide (1): What if unknown correspondence? Graph matching + then DeltaCon …work in progress… Global Feature Extraction + comparison e.g., λ-distance [Peabody ‘03], [Macindoe & Richards ‘10] Local Feature Extraction + aggregation + comparison [Berlingerio et al. ’12] … © Danai Koutra (CMU) - SDM'13 56
57
CMU Duke Backup slide (2): Bounds Lemma: Lower bound. sim DC0 (G1; G2) ≤sim DC (G1; G2). Conjecture: Upper bound. Johnson-Lindenstrauss lemma © Danai Koutra (CMU) - SDM'13 57
58
CMU Duke Backup slide (3): # of groups - sensitivity © Danai Koutra (CMU) - SDM'13 58
59
CMU Duke Backup slide (5): Datasets Dataset# nodes# edges Synthetic graphs5-104-90 Kronecker graphs6K -1.6M66K – 67.1M Brain Graphs70800-1208 Enron36,692367,662 Epinions131,828841,372 Email EU265,214420,045 Web Google875,7145,105,039 AS Skitter1,696,41511,095,298 © Danai Koutra (CMU) - SDM'13 59
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.