School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T.

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

Partitional Algorithms to Detect Complex Clusters
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Dept. of Computer Science Rutgers Node and Graph Similarity: Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU)
BiG-Align: Fast Bipartite Graph Alignment
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Graphs Graphs are the most general data structures we will study in this course. A graph is a more general version of connected nodes than the tree. Both.
Node labels as random variables prior belief observed neighbor potentials compatibility potentials Opinion Fraud Detection in Online Reviews using Network.
Endend endend Carnegie Mellon University Korea Advanced Institute of Science and Technology VoG: Summarizing and Understanding Large Graphs Danai Koutra.
CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep
CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU.
WindMine: Fast and Effective Mining of Web-click Sequences SDM 2011Y. Sakurai et al.1 Yasushi Sakurai (NTT) Lei Li (Carnegie Mellon Univ.) Yasuko Matsubara.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
© 2010 IBM Corporation Diversified Ranking on Large Graphs: An Optimization Viewpoint Hanghang Tong, Jingrui He, Zhen Wen, Ching-Yung Lin, Ravi Konuru.
Detecting Fraudulent Personalities in Networks of Online Auctioneers Duen Horng (“Polo”) Chau Shashank Pandit Christos Faloutsos School of Computer Science.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
Clustering (Part II) 11/26/07. Spectral Clustering.
1 Efficient Placement and Dispatch of Sensors in a Wireless Sensor Network Prof. Yu-Chee Tseng Department of Computer Science National Chiao-Tung University.
Fast Random Walk with Restart and Its Applications
CS Instance Based Learning1 Instance Based Learning.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Models of Influence in Online Social Networks
School of Electronics Engineering and Computer Science Peking University Beijing, P.R. China Ziqi Wang, Yuwei Tan, Ming Zhang.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
Random Walk with Restart (RWR) for Image Segmentation
Scalable and Fully Distributed Localization With Mere Connectivity.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Uncovering Overlap Community Structure in Complex Networks using Particle Competition Fabricio A. Liang
Spotting Culprits in Epidemics: How many and Which ones? B. Aditya Prakash Virginia Tech Jilles Vreeken University of Antwerp Christos Faloutsos Carnegie.
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
1 Presented by: Yuchen Bian MRWC: Clustering based on Multiple Random Walks Chain.
Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte 2008 SIAM Conference on Data Mining, April 25 th Atlanta, Georgia.
CS774. Markov Random Field : Theory and Application Lecture 02
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
Online Social Networks and Media
Dept. of Computer Science Rutgers Node Similarity, Graph Similarity and Matching: Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers)
Are All Brains Wired Equally Danai Koutra Yu GongJoshua VogelsteinChristos Faloutsos Motivation Connectomics -- creation of brain connectivity maps. Analysing.
Comparison of Tarry’s Algorithm and Awerbuch’s Algorithm CS 6/73201 Advanced Operating System Presentation by: Sanjitkumar Patel.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Graphs. Introduction Graphs are a collection of vertices and edges Graphs are a collection of vertices and edges The solid circles are the vertices A,
Single-Pass Belief Propagation
Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University.
Source: CSE 214 – Computer Science II Graphs.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins1 Fast Discovery of Connection Subgraphs Christos Faloutsos (CMU) Kevin McCurley (IBM) Andrew Tomkins.
Large Graph Mining: Power Tools and a Practitioner’s guide
MEIKE: Influence-based Communities in Networks
Distributed voting application for handheld devices
Node Similarity Ralucca Gera,
Community detection in graphs
Friend Recommendation with a Target User in Social Networking Services
Large Graph Mining: Power Tools and a Practitioner’s guide
Degree and Eigenvector Centrality
Section 7.12: Similarity By: Ralucca Gera, NPS.
Centrality in Social Networks
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Graph and Tensor Mining for fun and profit
Enumeration problems of networks
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
GANG: Detecting Fraudulent Users in OSNs
Learning to Rank Typed Graph Walks: Local and Global Approaches
David Kauchak CS158 – Spring 2019
Presentation transcript:

School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T. Vogelstein Christos Faloutsos SDM, 2-5 May 2013, Texas-Austin, USA

CMU Duke Problem Definition: Graph Similarity Given: (i) 2 graphs with the same nodes and different edge sets (ii) node correspondence Find: similarity score s [0,1] © Danai Koutra (CMU) - SDM'13 2 GAGA GBGB

CMU Duke Problem Definition: Graph Similarity Given: (a) 2 graphs with the same nodes and different edge sets (b) node correspondence Find: similarity score, s [0,1] © Danai Koutra (CMU) - SDM'13 3 GAGA GBGB

CMU Duke Motivation (1) © Danai Koutra (CMU) - SDM'13 4 Discontinuity Detection Day 1 Day 2 Day 3 Day 4 Day Classification 1 1 different brain wiring?

CMU Duke Motivation (2) © Danai Koutra (CMU) - SDM'13 5 Intrusion detection 4 4 Behavioral Patterns 3 3 FB message graph vs. wall-to-wall network

CMU Duke Problem: Graph Similarity Is there any obvious solution? © Danai Koutra (CMU) - SDM'13 6

CMU Duke One Solution Edge Overlap (EO) # of common edges (normalized or not) © Danai Koutra (CMU) - SDM'13 7 GAGA GBGB

CMU Duke … but “barbell”… EO(B10,mB10) == EO(B10,mmB10) © Danai Koutra (CMU) - SDM'13 8 GAGA GAGA GBGB G B’

CMU Duke Contributions Theory  Axioms  Desired Properties Practice  D ELTA C ON algorithm  Real-world applications  Experiments on synthetic & real graphs © Danai Koutra (CMU) - SDM'13 9 Delta Connectivity

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 10

CMU Duke Intuition (1) STEP 1: Compute the pairwise node influence, S A & S B © Danai Koutra (CMU) - SDM'13 11 GAGA GBGB SA =SA = S B =

CMU Duke Intuition (2) STEP 2: Find the similarity between S A & S B. © Danai Koutra (CMU) - SDM'13 12 SA =SA = S B =

CMU Duke Intuition (2) STEP 2: Find the similarity between S A & S B. sim( S A, S B ) = 0.3 © Danai Koutra (CMU) - SDM'13 13 S B = SA =SA =

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 14

CMU Duke … many similarity functions can be defined… But … © Danai Koutra (CMU) - SDM'13 15 … what properties should a good similarity function have?

CMU Duke Axioms © Danai Koutra (CMU) - SDM'13 16 A1. Identity property sim(, ) = 1 A2. Symmetric property sim(, ) = sim(, ) A3. Zero property sim(, ) = 0

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 17

CMU Duke Desired Properties (1) © Danai Koutra (CMU) - SDM'13 18 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability

CMU Duke Desired Properties (2) © Danai Koutra (CMU) - SDM'13 19 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability Creation of disconnected components matters more than small connectivity changes.

CMU Duke Desired Properties (3) © Danai Koutra (CMU) - SDM'13 20 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability The bigger the edge weight, the more the edge change matters. w=5 w=1 ✗ ✗

CMU Duke Desired Properties (4) © Danai Koutra (CMU) - SDM'13 21 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability “Diminishing Returns”: The sparser the graphs, the more important is a ‘’fixed’’ change. n=5 GAGA GAGA GBGB GBGB

CMU Duke Desired Properties (1) © Danai Koutra (CMU) - SDM'13 22 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability Targeted changes are more important than random changes of the same extent. GAGA targeted G B’ random G B

CMU Duke How do state-of-the-art methods fare? © Danai Koutra (CMU) - SDM'13 23 MetricP1P2P3P4 Vertex/Edge Overlap ✗✗✗ ? Graph Edit Distance (XOR) ✗✗✗ ? Signature Similarity ✗✔✗ ? λ-distance (adjacency matrix) ✗✔✗ ? λ-distance (graph laplacian) ✗✔✗ ? λ-distance (normalized lapl.) ✗✔✗ ? D ELTA C ON 0 ✔✔✔✔ D ELTA C ON ✔✔✔✔ edge weight returns focus

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Experiments Applications Related Work Conclusions © Danai Koutra (CMU) - SDM'13 24

CMU Duke Proposed algorithm: D ELTA C ON 0 ①Find the pairwise node influence, S A & S B. © Danai Koutra (CMU) - SDM'13 25 SA =SA = S B = BASE ALGO

CMU Duke STEP 1: How to compute node influence? A1: Pagerank A2: Personalized Random Walk with Restart (RWR) A3: Lazy RWR A4: “Electrical network analogy” - resistances A5: Belief Propagation F A BP … © Danai Koutra (CMU) - SDM'13 26

CMU Duke STEP 1: Intuition of BP © Danai Koutra (CMU) - SDM'13 27 BACKGROUND iterative message-based method Iteration 1 Iteration e.g., CS person

CMU Duke STEP 1: Fast BP (1) 1 d1 d2 d3 d1 d2 d ? ? © Danai Koutra (CMU) - SDM'13 28 BACKGROUND i th row similar to RWR

CMU Duke STEP 1: Fast BP (1) 1 d1 d2 d3 d1 d2 d ? ? © Danai Koutra (CMU) - SDM'13 29 BACKGROUND i th row similar to RWR strength of influence between neighbors

CMU Duke STEP 1: Fast BP (1) 1 d1 d2 d3 d1 d2 d ? ? © Danai Koutra (CMU) - SDM'13 30 BACKGROUND i th row similar to RWR final influence from node i strength of influence between neighbors

CMU Duke STEP 1: Fast BP (2) 1 d1 d2 d3 d1 d2 d ? ? © Danai Koutra (CMU) - SDM'13 31 i th row OR pairwise influence matrix:

CMU Duke STEP 1: Why FaBP? 1)Sound theoretical background (MLE on marginals) 2)Fast: linear on the edges 3)Attenuating Neighboring Influence © Danai Koutra (CMU) - SDM'13 32 DETAILS

CMU Duke STEP 1: Why FaBP? 1)Sound theoretical background (MLE on marginals) 2)Fast: linear on the edges 3)Attenuating Neighboring Influence for small ε: © Danai Koutra (CMU) - SDM' hop 2-hops … ε > ε 2 >... 0<ε<1 INTUITION

CMU Duke Proposed algorithm: D ELTA C ON 0 ①Find the pairwise influence (F A BP), S A & S B. ②Find distance. © Danai Koutra (CMU) - SDM'13 34 SA,SBSA,SB = Matusita distance SA =SA = S B = BASE ALGO BASE ALGO

CMU Duke Proposed algorithm: D ELTA C ON 0 ①Apply F A BP to find the pairwise influence matrices, S A & S B. ②Find distance. ①Find similarity, © Danai Koutra (CMU) - SDM'13 35 SA,SBSA,SB = Matusita distance SA =SA = S B = BASE ALGO BASE ALGO

CMU Duke … but O(n 2 ) … © Danai Koutra (CMU) - SDM'13 36 f a s t e r ?

CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (1) © Danai Koutra (CMU) - SDM' a Create g disjoint & covering node groups A = Adjacency matrix FASTE R ALGO FASTE R ALGO

CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (2) © Danai Koutra (CMU) - SDM' a Create g disjoint & covering node groups. 1b For group i, find node-group influence (F A BP) FASTE R ALGO FASTE R ALGO

CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (3) © Danai Koutra (CMU) - SDM' b e.g., for group 1, find node-group influence (F A BP): S’ A = g r o u p s INTUITION SA =SA = row-wise

CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (4) © Danai Koutra (CMU) - SDM' a Create g disjoint & covering node groups. 1b For group i, find node-group influence (F A BP) 1c Create node-group influence matrices, S’ A & S’ B S’ B = S’ A = g r o u p s FASTE R ALGO FASTE R ALGO

CMU Duke Proposed Algorithm: D ELTA C ON (5) © Danai Koutra (CMU) - SDM' a Create g disjoint & covering node groups. 1b For group i, find node-group influence (F A BP) 1c Create node-group influence matrices, S’ A & S’ B FASTE R ALGO FASTE R ALGO S’ B = S’ A = g r o u p s

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications ENRON: anomaly detection Brain Graphs: clustering Experiments Conclusions © Danai Koutra (CMU) - SDM'13 42

CMU Duke Temporal Anomaly Detection in ENRON (1) © Danai Koutra (CMU) - SDM'13 43 Nodes: employees Edges: exchange D ELTA C ON similarities of consecutive timestamps Day 1 Day 2 Day 3 Day 4 Day 5 sim 1 sim 2 sim 3 sim 4

CMU Duke Temporal Anomaly Detection in ENRON (2) © Danai Koutra (CMU) - SDM'13 44 similarity consecutive days IMR

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications ENRON: anomaly detection Brain Graphs: clustering Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 45

CMU Duke Brain Connectivity Graph Clustering (1) © Danai Koutra (CMU) - SDM' aligned connectomes (FMRI) Nodes: 70 cortical regions Edges: connections Attributes: gender, IQ, age…

CMU Duke Brain Connectivity Graph Clustering (2) © Danai Koutra (CMU) - SDM'13 47 ①pairwise D ELTA C ON similarities ②hierarchical clustering ③t-test / ANOVA for given attributes Ward’s linkage

CMU Duke Brain Connectivity Graph Clustering (3) © Danai Koutra (CMU) - SDM'13 48 High CCI Low CCI t-test / ANOVA for given attributes p-value =

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Scalability Conclusions © Danai Koutra (CMU) - SDM'13 49

CMU Duke Scalability Dataset: Kronecker graphs D ELTA C ON is linear on the edges + groups; O(g×n + g×(m 1 +m 2 ). Dataset: Kronecker graphs D ELTA C ON is linear on the edges + groups; O(g×n + g×(m 1 +m 2 ). # of edges = max{m 1,m 2 } runtime (min) © Danai Koutra (CMU) - SDM'13 50 SLOPE = 1 # of edges in G A & G B # of nodes

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 51

CMU Duke State-of-the-art Approaches Vertex/Edge Overlap [Papadimitriou, Dasdan, Garcia-Molina. JISA’10] Graph Edit Distance [Papadimitriou, Dasdan, Garcia-Molina. JISA’10] Signature Similarity (SimHash algorithm) [Papadimitriou, Dasdan, Garcia-Molina. JISA’10] λ-distance [Peabody ’03; Bunke, Dickinson, Kraetzl, Wallis ‘06] … © Danai Koutra (CMU) - SDM'13 52

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 53

CMU Duke Conclusions Theory  Axioms  Desired Properties Practice  D ELTA C ON algorithm principled intuitive and scalable  Real-world applications  Experiments on synthetic & real graphs © Danai Koutra (CMU) - SDM'13 54 axioms properties linear on input Temporal anomaly detection + brain scans classification

CMU Duke Thank you! © Danai Koutra (CMU) - SDM'13 55

CMU Duke Backup slide (1): What if unknown correspondence? Graph matching + then DeltaCon …work in progress… Global Feature Extraction + comparison e.g., λ-distance [Peabody ‘03], [Macindoe & Richards ‘10] Local Feature Extraction + aggregation + comparison [Berlingerio et al. ’12] … © Danai Koutra (CMU) - SDM'13 56

CMU Duke Backup slide (2): Bounds Lemma: Lower bound. sim DC0 (G1; G2) ≤sim DC (G1; G2). Conjecture: Upper bound. Johnson-Lindenstrauss lemma © Danai Koutra (CMU) - SDM'13 57

CMU Duke Backup slide (3): # of groups - sensitivity © Danai Koutra (CMU) - SDM'13 58

CMU Duke Backup slide (5): Datasets Dataset# nodes# edges Synthetic graphs Kronecker graphs6K -1.6M66K – 67.1M Brain Graphs Enron36,692367,662 Epinions131,828841,372 EU265,214420,045 Web Google875,7145,105,039 AS Skitter1,696,41511,095,298 © Danai Koutra (CMU) - SDM'13 59