Download presentation
Presentation is loading. Please wait.
Published byRodney Bradley Modified over 9 years ago
1
gStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 1 Peking University, 2 Hong Kong University of Science and Technology, 3 University of Waterloo
2
Outline Background & Related Work Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions 2
3
Outline Background & Related Work Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions 3
4
Semantic Web 4 “Semantic Web Technologies” is a collection of standard technologies to realize a Web of Data.
5
RDF Data Model 5 URI Literals
6
RDF Graph 6 Entity Vertex Literal Vertex
7
SPARQL Queries 7 SPARQL Query: Select ?name Where { ?m ?name. ?m “1809-02-12”. ?m “1865- 04-15”. } Query Graph
8
Subgraph Match vs. SPARQL Queries 8
9
Naïve Triple Store 9 SPARQL Query: Select ?name Where { ?m ?name. ?m “1809-02-12”. ?m “1865-04-15”. } SQL: Select T3.Subject From T as T1, T as T2, T as T3 Where T1.Predict=“BornOnDate” and T1.Object=“1809-02-12” and T2.Predict=“DiedOnDate” and T2.Object=“1865-04-15” and T3. Predict=“hasName” and T1.Subject = T2.Subject and T2. Subject= T3.subject Too many Self-Joins
10
Existing Solutions Three categories of solutions are proposed to speed up query processing: 1.Property Table; Jena [K. Wilkinson et al. SWDB 03], … 2. Vertically Partitioned Solution; SW-store [D. J. Abadi et al. VLDB 07],… 3. Exhaustive-Indexing RDF-3x [T. Neumann et al. VLDB 08], Hexastore [C. Weiss et al. VLDB 08 ],… 10
11
Existing Solutions-Property Table 11 SPARQL Query: Select ?name Where { ?m ?name. ?m “1809-02-12”. ?m “1865-04-15”. } SQL: Select People.hasName from People where People.BornOnDate = “1809-02-12” and People.DiedOnDate = “1865-04-15”. Reducing # of join steps
12
Existing Solutions- Vertically Partitioned Solution 12 Fast Merge Join
13
Existing Solutions- Exhaustive-Indexing Each SPARQL query statement can be translated into one “range query”. SPARQL Query: Select ?name Where { ?m ?name. ?m “1809-02-12”. ?m “1865-04-15”. } 13 Range query & Merge Join
14
Some Limitations 1.Difficult to handle ``wildcard queries’’. 2.Difficult to handle updates. 14
15
Outline Background & Related Work Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions 15
16
Intuition of gStore 16 Finding Matches over a Large Graph is not a trivial task.
17
Preliminaries 17 Entity Vertex Literal Vertex
18
Preliminaries RDF graph 18
19
Preliminaries Query Graph 19
20
Preliminaries match 20
21
Preliminaries Problem definition 21
22
Storage Schema in gStore 22 Encoding all neibhors into a “bit-string”, called signature.
23
Encoding Technique (1) |eSig(e).e| = M. we employ m different string hash functions Hi (i = 1,...,m) For each hash function Hi, we set the (Hi(eLabel) MOD M)-th bit in eS ig(e).e to be ‘1’ Encoding Sig(e).n is the same – |eSig(e).n| = N – n different hash functions 23
24
Encoding Technique (2) 24 “Abr”, “bra”, ”rah”, ”aha”, …., ( hasName, “Abraham Lincoln”) 0010 0000 0000 0000 0010 0000 0000 1000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0001 1000 0010 0100 0001 OR 1000 0010 0100 0001 ( BornOnDate, “1809-02-12”) 0100 0000 00000100 0010 0100 1000 ( DiedOnDate, “1865-04-15”) 0000 1000 00000000 0010 0100 0000 ( DiedIn, “y:Washington_D.c”) 0000 0010 00001000 0010 0100 0001 0110 1010 00001100 0010 0100 1001 OR
25
Encoding Technique (3) 25
26
Encoding Technique (4) 26 Finding Matches over signature graph G* Verify Each Match in RDF Graph G
27
Encoding Technique (5) 27
28
Outline Background & Related Work Overview of gStore Encoding Technique VS-tree & Query Algorithm Experiments Conclusions 28
29
A Straightforward Solution (1) 29 001 004 006 002 003 006 u1u1 u2u2 L1L1 L2L2
30
A Straightforward Solution (2) 30 001 004 006 002 003 006 Large Join Space ! L1L1 L2L2
31
VS-tree 31
32
VS-Tree query definition 32
33
Pruning Technique 33 u1u1 u2u2 10010 001 004 006 002 003 006 Reduced Join Space!
34
Query Algorithm-Top-Down 34
35
Optimized method Too many super edges Which level to start search No brute-force enumeration 35
36
VS*-Tree Insert The criterion in the VS-tree only depends on the Hamming distance between the signatures of u and the node in VS-tree. the criterion in VS ∗ - tree depends on both node signatures and G ∗ ’s structure 36
37
Updates- Insertion in G* 37
38
Updates- Insertion in VS*-tree 38
39
VS*-Tree split the B+1 entities of the node will be partitioned into two new nodes, where B is the maximal fanout for a node in VS ∗ -tree. 1. we find two entities that have the maximal Hamming distance between them as two seed nodes 2. we associate each left entry with the nearest seed node, according to Equation 1. 39
40
VS*-Tree deletion Similar to split if some node d has less than b entries, where b is the minimal fanout of node in VS ∗ -tree, then d is deleted and its entries are reinserted into VS ∗ -tree. 40
41
Updates- Deletion in VS*-tree 41 To be deleted
42
Which Level To Begin a concept “pruning power” of G I with regard to Q ∗ denoted as P(Q ∗,G I ) 42
43
Estimate P(Q*,G I ) 43
44
Finding Valid Child States propose a DFS strategy to find all valid child states of J. start a DFS over G ∗ beginning from some vertex vi 44
45
45
46
Outline Background & Related Work Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions 46
47
Datasets 47 Triple #Size Yago20 million3.1GB DBLP8 million0.8 GB
48
48 Offline Performance
49
Exact Queries 49
50
Wildcard Queries 50
51
Outline Background & Related Work Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions 51
52
Conclusions Vertex Encoding Technique; An Efficient index Structure: VS-tree; A Novel Filtering Technique. 52
53
53
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.