School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T. Vogelstein Christos Faloutsos SDM, 2-5 May 2013, Texas-Austin, USA
CMU Duke Problem Definition: Graph Similarity Given: (i) 2 graphs with the same nodes and different edge sets (ii) node correspondence Find: similarity score s [0,1] © Danai Koutra (CMU) - SDM'13 2 GAGA GBGB
CMU Duke Problem Definition: Graph Similarity Given: (a) 2 graphs with the same nodes and different edge sets (b) node correspondence Find: similarity score, s [0,1] © Danai Koutra (CMU) - SDM'13 3 GAGA GBGB
CMU Duke Motivation (1) © Danai Koutra (CMU) - SDM'13 4 Discontinuity Detection Day 1 Day 2 Day 3 Day 4 Day Classification 1 1 different brain wiring?
CMU Duke Motivation (2) © Danai Koutra (CMU) - SDM'13 5 Intrusion detection 4 4 Behavioral Patterns 3 3 FB message graph vs. wall-to-wall network
CMU Duke Problem: Graph Similarity Is there any obvious solution? © Danai Koutra (CMU) - SDM'13 6
CMU Duke One Solution Edge Overlap (EO) # of common edges (normalized or not) © Danai Koutra (CMU) - SDM'13 7 GAGA GBGB
CMU Duke … but “barbell”… EO(B10,mB10) == EO(B10,mmB10) © Danai Koutra (CMU) - SDM'13 8 GAGA GAGA GBGB G B’
CMU Duke Contributions Theory Axioms Desired Properties Practice D ELTA C ON algorithm Real-world applications Experiments on synthetic & real graphs © Danai Koutra (CMU) - SDM'13 9 Delta Connectivity
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 10
CMU Duke Intuition (1) STEP 1: Compute the pairwise node influence, S A & S B © Danai Koutra (CMU) - SDM'13 11 GAGA GBGB SA =SA = S B =
CMU Duke Intuition (2) STEP 2: Find the similarity between S A & S B. © Danai Koutra (CMU) - SDM'13 12 SA =SA = S B =
CMU Duke Intuition (2) STEP 2: Find the similarity between S A & S B. sim( S A, S B ) = 0.3 © Danai Koutra (CMU) - SDM'13 13 S B = SA =SA =
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 14
CMU Duke … many similarity functions can be defined… But … © Danai Koutra (CMU) - SDM'13 15 … what properties should a good similarity function have?
CMU Duke Axioms © Danai Koutra (CMU) - SDM'13 16 A1. Identity property sim(, ) = 1 A2. Symmetric property sim(, ) = sim(, ) A3. Zero property sim(, ) = 0
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 17
CMU Duke Desired Properties (1) © Danai Koutra (CMU) - SDM'13 18 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability
CMU Duke Desired Properties (2) © Danai Koutra (CMU) - SDM'13 19 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability Creation of disconnected components matters more than small connectivity changes.
CMU Duke Desired Properties (3) © Danai Koutra (CMU) - SDM'13 20 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability The bigger the edge weight, the more the edge change matters. w=5 w=1 ✗ ✗
CMU Duke Desired Properties (4) © Danai Koutra (CMU) - SDM'13 21 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability “Diminishing Returns”: The sparser the graphs, the more important is a ‘’fixed’’ change. n=5 GAGA GAGA GBGB GBGB
CMU Duke Desired Properties (1) © Danai Koutra (CMU) - SDM'13 22 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability Targeted changes are more important than random changes of the same extent. GAGA targeted G B’ random G B
CMU Duke How do state-of-the-art methods fare? © Danai Koutra (CMU) - SDM'13 23 MetricP1P2P3P4 Vertex/Edge Overlap ✗✗✗ ? Graph Edit Distance (XOR) ✗✗✗ ? Signature Similarity ✗✔✗ ? λ-distance (adjacency matrix) ✗✔✗ ? λ-distance (graph laplacian) ✗✔✗ ? λ-distance (normalized lapl.) ✗✔✗ ? D ELTA C ON 0 ✔✔✔✔ D ELTA C ON ✔✔✔✔ edge weight returns focus
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Experiments Applications Related Work Conclusions © Danai Koutra (CMU) - SDM'13 24
CMU Duke Proposed algorithm: D ELTA C ON 0 ①Find the pairwise node influence, S A & S B. © Danai Koutra (CMU) - SDM'13 25 SA =SA = S B = BASE ALGO
CMU Duke STEP 1: How to compute node influence? A1: Pagerank A2: Personalized Random Walk with Restart (RWR) A3: Lazy RWR A4: “Electrical network analogy” - resistances A5: Belief Propagation F A BP … © Danai Koutra (CMU) - SDM'13 26
CMU Duke STEP 1: Intuition of BP © Danai Koutra (CMU) - SDM'13 27 BACKGROUND iterative message-based method Iteration 1 Iteration e.g., CS person
CMU Duke STEP 1: Fast BP (1) 1 d1 d2 d3 d1 d2 d ? ? © Danai Koutra (CMU) - SDM'13 28 BACKGROUND i th row similar to RWR
CMU Duke STEP 1: Fast BP (1) 1 d1 d2 d3 d1 d2 d ? ? © Danai Koutra (CMU) - SDM'13 29 BACKGROUND i th row similar to RWR strength of influence between neighbors
CMU Duke STEP 1: Fast BP (1) 1 d1 d2 d3 d1 d2 d ? ? © Danai Koutra (CMU) - SDM'13 30 BACKGROUND i th row similar to RWR final influence from node i strength of influence between neighbors
CMU Duke STEP 1: Fast BP (2) 1 d1 d2 d3 d1 d2 d ? ? © Danai Koutra (CMU) - SDM'13 31 i th row OR pairwise influence matrix:
CMU Duke STEP 1: Why FaBP? 1)Sound theoretical background (MLE on marginals) 2)Fast: linear on the edges 3)Attenuating Neighboring Influence © Danai Koutra (CMU) - SDM'13 32 DETAILS
CMU Duke STEP 1: Why FaBP? 1)Sound theoretical background (MLE on marginals) 2)Fast: linear on the edges 3)Attenuating Neighboring Influence for small ε: © Danai Koutra (CMU) - SDM' hop 2-hops … ε > ε 2 >... 0<ε<1 INTUITION
CMU Duke Proposed algorithm: D ELTA C ON 0 ①Find the pairwise influence (F A BP), S A & S B. ②Find distance. © Danai Koutra (CMU) - SDM'13 34 SA,SBSA,SB = Matusita distance SA =SA = S B = BASE ALGO BASE ALGO
CMU Duke Proposed algorithm: D ELTA C ON 0 ①Apply F A BP to find the pairwise influence matrices, S A & S B. ②Find distance. ①Find similarity, © Danai Koutra (CMU) - SDM'13 35 SA,SBSA,SB = Matusita distance SA =SA = S B = BASE ALGO BASE ALGO
CMU Duke … but O(n 2 ) … © Danai Koutra (CMU) - SDM'13 36 f a s t e r ?
CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (1) © Danai Koutra (CMU) - SDM' a Create g disjoint & covering node groups A = Adjacency matrix FASTE R ALGO FASTE R ALGO
CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (2) © Danai Koutra (CMU) - SDM' a Create g disjoint & covering node groups. 1b For group i, find node-group influence (F A BP) FASTE R ALGO FASTE R ALGO
CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (3) © Danai Koutra (CMU) - SDM' b e.g., for group 1, find node-group influence (F A BP): S’ A = g r o u p s INTUITION SA =SA = row-wise
CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (4) © Danai Koutra (CMU) - SDM' a Create g disjoint & covering node groups. 1b For group i, find node-group influence (F A BP) 1c Create node-group influence matrices, S’ A & S’ B S’ B = S’ A = g r o u p s FASTE R ALGO FASTE R ALGO
CMU Duke Proposed Algorithm: D ELTA C ON (5) © Danai Koutra (CMU) - SDM' a Create g disjoint & covering node groups. 1b For group i, find node-group influence (F A BP) 1c Create node-group influence matrices, S’ A & S’ B FASTE R ALGO FASTE R ALGO S’ B = S’ A = g r o u p s
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications ENRON: anomaly detection Brain Graphs: clustering Experiments Conclusions © Danai Koutra (CMU) - SDM'13 42
CMU Duke Temporal Anomaly Detection in ENRON (1) © Danai Koutra (CMU) - SDM'13 43 Nodes: employees Edges: exchange D ELTA C ON similarities of consecutive timestamps Day 1 Day 2 Day 3 Day 4 Day 5 sim 1 sim 2 sim 3 sim 4
CMU Duke Temporal Anomaly Detection in ENRON (2) © Danai Koutra (CMU) - SDM'13 44 similarity consecutive days IMR
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications ENRON: anomaly detection Brain Graphs: clustering Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 45
CMU Duke Brain Connectivity Graph Clustering (1) © Danai Koutra (CMU) - SDM' aligned connectomes (FMRI) Nodes: 70 cortical regions Edges: connections Attributes: gender, IQ, age…
CMU Duke Brain Connectivity Graph Clustering (2) © Danai Koutra (CMU) - SDM'13 47 ①pairwise D ELTA C ON similarities ②hierarchical clustering ③t-test / ANOVA for given attributes Ward’s linkage
CMU Duke Brain Connectivity Graph Clustering (3) © Danai Koutra (CMU) - SDM'13 48 High CCI Low CCI t-test / ANOVA for given attributes p-value =
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Scalability Conclusions © Danai Koutra (CMU) - SDM'13 49
CMU Duke Scalability Dataset: Kronecker graphs D ELTA C ON is linear on the edges + groups; O(g×n + g×(m 1 +m 2 ). Dataset: Kronecker graphs D ELTA C ON is linear on the edges + groups; O(g×n + g×(m 1 +m 2 ). # of edges = max{m 1,m 2 } runtime (min) © Danai Koutra (CMU) - SDM'13 50 SLOPE = 1 # of edges in G A & G B # of nodes
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 51
CMU Duke State-of-the-art Approaches Vertex/Edge Overlap [Papadimitriou, Dasdan, Garcia-Molina. JISA’10] Graph Edit Distance [Papadimitriou, Dasdan, Garcia-Molina. JISA’10] Signature Similarity (SimHash algorithm) [Papadimitriou, Dasdan, Garcia-Molina. JISA’10] λ-distance [Peabody ’03; Bunke, Dickinson, Kraetzl, Wallis ‘06] … © Danai Koutra (CMU) - SDM'13 52
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 53
CMU Duke Conclusions Theory Axioms Desired Properties Practice D ELTA C ON algorithm principled intuitive and scalable Real-world applications Experiments on synthetic & real graphs © Danai Koutra (CMU) - SDM'13 54 axioms properties linear on input Temporal anomaly detection + brain scans classification
CMU Duke Thank you! © Danai Koutra (CMU) - SDM'13 55
CMU Duke Backup slide (1): What if unknown correspondence? Graph matching + then DeltaCon …work in progress… Global Feature Extraction + comparison e.g., λ-distance [Peabody ‘03], [Macindoe & Richards ‘10] Local Feature Extraction + aggregation + comparison [Berlingerio et al. ’12] … © Danai Koutra (CMU) - SDM'13 56
CMU Duke Backup slide (2): Bounds Lemma: Lower bound. sim DC0 (G1; G2) ≤sim DC (G1; G2). Conjecture: Upper bound. Johnson-Lindenstrauss lemma © Danai Koutra (CMU) - SDM'13 57
CMU Duke Backup slide (3): # of groups - sensitivity © Danai Koutra (CMU) - SDM'13 58
CMU Duke Backup slide (5): Datasets Dataset# nodes# edges Synthetic graphs Kronecker graphs6K -1.6M66K – 67.1M Brain Graphs Enron36,692367,662 Epinions131,828841,372 EU265,214420,045 Web Google875,7145,105,039 AS Skitter1,696,41511,095,298 © Danai Koutra (CMU) - SDM'13 59