Download presentation
Presentation is loading. Please wait.
Published byWilliam Griffin Simon Modified over 9 years ago
1
endend endend Carnegie Mellon University Korea Advanced Institute of Science and Technology VoG: Summarizing and Understanding Large Graphs Danai Koutra Jilles Vreeken U Kang Christos Faloutsos SDM, 24-26 April 2014, Philadelphia, USA © 2014, Danai Koutra
2
endend endend Problem Definition: Graph Summarization © Danai Koutra – SDM’142 Given: a graph Find: ≈ important graph structures. a succinct summary with possibly overlapping subgraphs
3
endend endend Why graph summarization? Visualization Guiding attention © Danai Koutra – SDM’143
4
endend endend Why graph summarization? © Danai Koutra – SDM’144 Graph Understanding
5
endend endend Application: Wikipedia controversy © Danai Koutra – SDM’145 Nodes: wiki editors Edges: co-edited I don’t see anything!
6
endend endend Application: Wikipedia controversy © Danai Koutra – SDM’146 Stars: admins, bots, heavy users Bipartite cores: edit wars Nodes: wiki editors Edges: co-edited Kiev vs. Kyiv vandals
7
endend endend Roadmap Main Idea Proposed Algorithm: V O G V O G: Step-by-Step Experiments Conclusions © Danai Koutra – SDM’147
8
endend endend Main Idea 1)Use a graph vocabulary: 2)Best graph summary optimal compression (MDL) © Danai Koutra – SDM’148 Shortest lossless description
9
endend endend Minimum Description Length Principle © Danai Koutra – SDM’149 B ACKGROUND Given a set of models M, the best model M ε M is argmin L(M) + L(D|M) M # bits for M # bits for the data using M M M
10
endend endend Minimum Description Length Principle © Danai Koutra – SDM’1410 E XAMPLE a 1 x + a 0 L(M) + L(D|M) a 10 x 10 + a 9 x 9 + … + a 0 errors { } [Example adapted from Akoglu’12]
11
endend endend Formally: Minimum Graph Description © Danai Koutra – SDM’1411 Given: - a graph G with adjacency matrix A - vocabulary Ω Find: model M s.t. min L(G,M) = min L(M) + L(E) Model M Adjacency A Error E
12
endend endend Roadmap Main Idea Proposed Algorithm: V O G V O G: Step-by-Step Experiments Conclusions © Danai Koutra – SDM’1412
13
endend endend VoG: Overview © Danai Koutra – SDM’1413 argmin ≈ ≈?
14
endend endend VoG: Overview © Danai Koutra – SDM’1414 some criterion Summary
15
endend endend Roadmap Main Idea Proposed Algorithm: V O G V O G: Step-by-Step Experiments Conclusions © Danai Koutra – SDM’1415
16
endend endend © Danai Koutra – SDM’1416 … How can we get them? We need candidate structures…
17
endend endend © Danai Koutra – SDM’1417 Could use: ANY graph decomposition method We adapted a node reordering method: SlashBurn Step 1: Graph Decomposition
18
endend endend SlashBurn-based Graph Decomposition © Danai Koutra – SDM’1418 Slash top-k hubs, burn edges Repeat on the remaining GCC [SlashBurn: U Kang and Christos Faloutsos. ICDM’11] candidate structures BeforeAfter GCC Notice that the structures can overlap!
19
endend endend © Danai Koutra – SDM’1419 Now, how can we ‘label’ them? We got candidate structures.
20
endend endend Step 2: Graph Labeling © Danai Koutra – SDM’1420 ≈? argmin ≈ 1 2
21
endend endend Graph Representation © Danai Koutra – SDM’1421 hub? “best” node split? 45 80 n “best” node ordering? 1 1 n... missing edges? DETAILS
22
endend endend Graph Representation © Danai Koutra – SDM’1422 hub Hub: top-deg node Spokes: the rest Hub: top-deg node Spokes: the rest L N (|st|−1) + logn + log( ) + L(E+ ) + L(E− ) # of spokes hub ID n−1 |st|−1 spokes IDs extra missing Errors Star structure 6 n=7 DETAILS
23
endend endend Graph Representation © Danai Koutra – SDM’1423 Max bipartite graph: NP-hard Heuristic: Belief Propagation with heterophily for node classification (blue/red) Max bipartite graph: NP-hard Heuristic: Belief Propagation with heterophily for node classification (blue/red) + logn + log( ) + L(E+ ) + L(E− ) # of blue nodes n−1 |st|−1 their IDs extra missing Errors # of red nodes Bipartite graph structure DETAILS
24
endend endend Graph Representation © Danai Koutra – SDM’1424 1 45 80 n 1 n... Longest path: NP-hard Heuristic: BFS + local search Longest path: NP-hard Heuristic: BFS + local search + + extra missing Errors Chain structure DETAILS
25
endend endend Step 2: Graph Labeling © Danai Koutra – SDM’1425 ≈? argmin ≈
26
endend endend Step 3: Summary Assembly © Danai Koutra – SDM’1426 Summary
27
endend endend Concepts © Danai Koutra – SDM’1427 = # bits as structure - # bits as noise compression gain Savings DETAILS
28
endend endend Step 3: Summary Assembly © Danai Koutra – SDM’1428 Summary
29
endend endend Concepts © Danai Koutra – SDM’1429 Summary Encoding cost L( M ) = L N ( |M|+1 ) + log ( ) + Σ ( -logP ( x(s)|M ) + L(s) ) |M|+1 |Ω|+1 s # of structures per type for each structure its encoding length its connectivity its type 3 # of structures per type for each structure its encoding length : 1 DETAILS
30
endend endend Step 3: Summary Assembly © Danai Koutra – SDM’1430 L(M) structures … DETAILS
31
endend endend Roadmap Main Idea Encoding Schema Proposed Algorithm: V O G Experiments Conclusions © Danai Koutra – SDM’1431
32
endend endend Application: Enron © Danai Koutra – SDM’1432 Top-3 Stars klay kenneth.lay @enron.com Top-1 NBC Ski excursion
33
endend endend Runtime © Danai Koutra – SDM’1433 VOG is near-linear on the number of edges of the input graph.
34
endend endend Roadmap Main Idea Encoding Schema Proposed Algorithm: V O G Experiments Conclusions © Danai Koutra – SDM’1434
35
endend endend Conclusions 35 Formulation: info-theoretic graph summarization approach Algorithm: VoG is near-linear on the edges Experiments on real graphs © Danai Koutra - SDM’14
36
endend endend © Danai Koutra – SDM’1436 Code www.cs.cmu.edu/~dkoutra/SRC/vog.tar
37
endend endend Thank you! Questions? www.cs.cmu.edu/~dkoutra/pub.htm danai@cs.cmu.edu © Danai Koutra - SDM’1437
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.