Download presentation
Presentation is loading. Please wait.
1
On Efficient Graph Substructure Selection
School of Computer Science and Engineering Xiang Zhao† H. Shang§ W. Zhang † X. Lin† W. Xiao ‡ † The University of New South Wales, Australia § The University of Tokyo, Japan ‡ National University of Defense Technology, China
2
Outline Motivation Preliminaries Proposed Method Experiments
Conclusion 2
3
Single-labeled Graphs
Motivation Single-labeled Graphs Multi-attributed Graphs 3
4
Preliminaries A data graph is a multi-attributed graph r = (Vr, Er, lr), where Vr is the vertex set, Er ⊆ Vr×Vr is the edge set, and lr is an attribute function. v ∈ Vr has an attribute set Av, and each attribute a ∈ Av is assigned a value lr(a). Note v may not have a particular attribute a, in which case lr(a) = nil. 4
5
Example 5
6
Preliminaries (Cont.) A query graph is denoted by s = (Vs, Es, φs), where Vs is the vertex set, Es ⊆ Vs×Vs is the edge set, and φs is an attribute selection function. Es enforces the connection constraints φs exerts the attribute constraints. v ∈ Vs has an attribute set Av such that a ∈ Av is assigned a selection condition φs(a). 6
7
Example 7
8
Preliminaries (Cont.) Given u ∈ Vr, v ∈ Vs, u satisfies v on attribute a, provided lr(a) = φs(a), if φs(a) defines an equality condition; or lr(a) ∈ φs(a), if φs(a) defines a range condition; or lr(a) ⊆ φs(a), if φs (a) defines a set containment condition; or arbitrary value in the domain, if φs (a) is a wildcard. u matches v, if u’s attribute values satisfies v’s corresponding condition conjunctively, for all v’s attribute constraints. 8
9
Problem Statement Given a data graph r and a query graph s, a substructure mapping is an injection f : Vs → Vr such that ∀v ∈ Vs, f(v) ∈ Vr, f(v) matches v; and ∀(u, v) ∈ Es, (f(u), f(v)) ∈ Er. Given a set of data graph as database R and a query graph s, the problem of substructure selection finds all substructure mappings from the query graph s to each data graph r in R. 9
10
Example 10
11
Basic Algorithmic Framework
Backtrack algorithm following DFS fashion Order the query vertices into a sequence Try to match query vertices iteratively Test attribute and connection constraints Success, match the next vertex in succession Failure, backtrack and map the previous query vertex to another data vertex Terminates when all full mappings are found 11
12
Example 12
13
Advanced Techniques Performance issue: Proposed optimization:
Not leverage any index Naïve vertex-a-time extension Proposed optimization: Two-tier index and construction algorithm Cost-based query processing algorithm 13
14
Substructure Selection Index
Template graphs SS-index Mappings 14
15
Primitive Operations Index retrieval Graph scan Attribute validation
Connection validation Mapping extension Mapping join Given a substructure selection problem, finding the query execution plan with minimum cost is NP-hard. 15
16
Practical Plan Generation
Guidelines: A proper execution order reduces the overall processing cost Joining overlapped mapping reduces intermediate results Eager constraints validation reduces intermediate results Early validation of selective constraints reduces intermediate results A cost-based heuristic algorithm that iteratively selects the template graph with minimum cost till all vertices are covered 16
17
Experimental Results on AIDS
Tree patterns provides the best balance between index size and runtime performance As expected, SS-index builds index faster than GADDI, but slower than QuickSI 17
18
Experimental Results (cont.)
SS-search effectively reduces the query response time when query becomes larger, due to the effect of SS-index 18
19
Experimental Results (cont.)
SS-search performs the best when Varying the percentage of vertices with attribute constraints Varying the average range of selection conditions 19
20
Experimental Results (cont.)
SS-search scales well with large database, better than alternatives SS-index can be as scalable as QuickSI if use only a portion of the database for mining frequent substructures 20
21
Conclusion A new type of query substructure selection that handles general structure search on multi-attributed graphs A novel structure SS-index to speed up the online computation via judiciously materializing partial embeddings SS-search algorithm that dynamically composes effective query execution plan for efficient mapping discovery 21
22
Thank you! Questions? 22
23
Related Work Many indices for subgraph search, e.g., gIndex[SIGMOD 2004], FG-index[SIGMOD 2007], CT-index[ICDE 2011], CP-index[CIKM 2011] Containment search or all-matching computation Single-labeled graphs DELTA[CIKM 2012] is the only work that deals with multi-labeled Transform into a multi-dimensional indexing problem Fixed attribute set 23
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.