Download presentation
Presentation is loading. Please wait.
Published byΠτολεμαῖος Σκλαβούνος Modified over 5 years ago
1
Efficient Subgraph Similarity All-Matching
Computer Science and Engineering Gaoping Zhu, Ke Zhu, Wenjie Zhang, Xuemin Lin, Chuan Xiao The University of New South Wales
2
Outline Introduction Preliminary Framework Algorithms Experiments
Conclusions
3
Introduction – Graph Data
Chem-informatics Chemical Compounds (Small Size) Bio-informatics Protein Interaction Networks (Medium Size) Internet World Wide Web (Large Size)
4
Introduction – Subgraph All-Matching
Problem Subgraph exact all-matching enumerates all exact matches of a query graph q in a large data graph G. Subgraph similarity all-matching enumerates all similarity matches of a query graph q in a large data graph G. Motivations Noisy query graphs due to erroneous user input. Noisy data graphs due to imprecise collection.
5
Preliminaries Edge Edit Distance
The edge edit distance from a graph g1 to another graph g2 is the minimum number of edge insertions required to transform g1 to g2. GED (p1, q) = 0, GED (p2, q) = 1. B B B B B B A C A C A C q p1 p2
6
Preliminaries Feasible Pattern
Given a distance threshold δ, p is called a feasible pattern of q if p is a connected subgraph of q with no missing vertex and GED (p, q) ≤ δ. The feasible patterns of q are p1, p2, p3, p4 for δ = 1. B B B B B B B B B B A C A C A C A C A C q, δ = 1 p1 p2 p3 p4
7
Preliminaries Similarity Matches
A similarity match of q in G is a subgraph isomorphic mapping from any feasible pattern p to q. Must consider any feasible pattern! Exact matches of q in G are also similarity matches! similarity match similarity match B A B C A B C A C B B B B B G A C C A q p1 p2 exact match
8
SAPPER [VLDB’10] Enumerate Phase Search Phase Results B A C D p1 Mp1
δ = 1 B A C D q B A C D p2 Mp2 G … … … … Mp5 B A C D p5
9
Motivation I : Effective Search Order
B B A B A D B A D A 1 match 1 match 1 match 1 match 4 matches q B A C D v1 v2 v3 v4 v5 v6 v7 v8 B A D B A D 27 matches 12 matches Search Order One {v1, v2, v3, v4, v5, v6, v7, v8} 47 intermediate matches A C B B Search Order Two {v4, v3, v5, v6, v2, v1, v7, v8} 350 intermediate matches B B B B D B G
10
Motivation II : Sharing Computation
f2 f'2 f1 B A C D v1 v2 v3 v4 v5 v6 v7 v8 f1 B A C D v1 v2 v3 v4 v5 v6 v7 v8 p p' Query Execution Plan One : search p and p’ separately Share no computation Query Execution Plan Two : search f1, f2 and f’2 and then join Share the computation on f1
11
Framework - DecQ Query Decomposition (Phase One)
Decompose the query graph q into a set of selective edge-disjoint sub-queries Q = { f1, …, fn }, called fragments. q Query Graph Decompose Fragments f1 f2 f3 f4
12
Framework - DecQ Local Matching (Phase Two)
Enumerate all local (feasible) patterns f’ of each fragment f and apply depth-first search on each pattern f’ to obtain the local matches (exact matches of f’ in G). f Fragments Enumerate f’a f'b f'c f'd Local Patterns Depth-first Search Mf’a Mf’b Mf’c Mf’d Local Matches
13
Framework - DecQ Global Matching (Phase Three)
Enumerate all global (feasible) patterns p and merge the local matches of decomposed local patterns of p to obtain the global matches (exact matches of p in G). p Mp Global Matches Merge Local Matches Mf’1 Mf’2 Mf’3 Mf’4 Retrieve f'1 f'2 f'3 f'4 Local Patterns
14
Algorithms Local matching Effective Search Order
Enumerate all local patterns f’ of each fragment f. Search all exact matches of each f’ by depth-first search fashion with effective search order. Effective Search Order It is NP-complete to find an search order with minimum number of intermediate matches produced in the depth-first search.
15
Algorithms Estimating Exact Matches of a Graph
Given a graph f’, assume M(v) / M(e) contains all mappings in G of a vertex v / edge e in f’. For each edge (u, v) in f’, given any u’ in M(u) and v’ in M(v), the probability that there is an edge (u’, v’) in G is: The estimated number of exact matches of f’ in G can be represented by
16
Algorithms Approximating Optimal Search Order
A search order grow a local pattern f’ vertex by vertex. Greedy heuristic: select the vertex v such that the number of estimated exact matches of the current subgraph s of f’ is minimized. s3 f’ s1 s2
17
Algorithms Global Matching
A global pattern p can be either a minimal or a non-minimal pattern. A minimal pattern p does not have one subgraph p’, which is also a global pattern with one missing edge in p. A non-minimal pattern p has at least one subgraph p’, which is also a global pattern with one missing edge in p.
18
Algorithms Processing Minimal Patterns
For a minimal pattern p, we decompose p into a set of local patterns and merge the local matches to obtain global matches Mp. p' p store the matches of (f’3 ∪ f’4) reuse the matches of (f’3 ∪ f’4) f'1 f'2 f'3 f'4 f‘’1 f‘’2 f'3 f'4 M’1 M’2 M’3 M’4 M’’1 M’’2 M’3 M’4
19
Algorithms Processing Non-minimal Patterns
For a non-minimal pattern p, we pick the child pattern p’ of p with the smallest Mp. We check if the missing edge exists in each exact match of p’ in G. If so, this match is validated as an exact match of p in G. B B A C B B B B B B A C A C C A p p’
20
Algorithms Decomposition & Query Execution Plan Recursive Bisection
Each decomposition of a global pattern p corresponds to a query execution plan of p. (i.e., as in RDBMS) It is costly to generate a good query execution plan for each global pattern p of q . Recursive Bisection We use heuristic solution to recursively bisect q into a set Q of edge-disjoint fragments. Bisect a graph into two subgraphs such that their graph size are balanced.
21
Experiments Real Data Synthetic Data
Data Graph : HPRD (Human Protein Interaction Network, |V(G)| = 9,460 vertices, |E(G)| = 37,081 with vertices labeled by GO Term) Query Graphs : selected subgraphs from HPRD network with 1-3 inserted “noisy” edge. Synthetic Data Data Graphs : obtained by synthetic graph generator Query graphs : selected subgraphs from data graphs with 1-3 inserted “noisy” edge.
22
Experiments Evaluated Algorithms Default Settings SAPPER
ROND (Random search Order No Decomposition) EOND (Effective search Order No Decomposition) DecQ (Effective Search Order and Decomposition) Default Settings |E(q)| = 40, avg. deg(q) = 4 |E(G)| = 5k, avg. deg(G) = 12, |ΣL| = 100 δ = 2
23
Experiments Varying Error Threshold
24
Experiments Varying Query Settings
25
Experiments Varying Data Graph Settings
26
Experiments Comparing with SAPPER
27
Conclusions A novel framework DecQ for subgraph similarity all-matching. Effective search order for local matching with depth-first search fashion. Effective query decomposition plan for global matching with computation sharing.
28
Thank You! Any Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.