Presentation is loading. Please wait.

Presentation is loading. Please wait.

GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 1 Peking University, 2 Hong.

Similar presentations


Presentation on theme: "GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 1 Peking University, 2 Hong."— Presentation transcript:

1 gStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 1 Peking University, 2 Hong Kong University of Science and Technology, 3 University of Waterloo

2 Outline Background & Related Work Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions 2

3 Outline Background & Related Work Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions 3

4 Semantic Web 4 “Semantic Web Technologies” is a collection of standard technologies to realize a Web of Data.

5 RDF Data Model 5 URI Literals

6 RDF Graph 6 Entity Vertex Literal Vertex

7 SPARQL Queries 7 SPARQL Query: Select ?name Where { ?m ?name. ?m “1809-02-12”. ?m “1865- 04-15”. } Query Graph

8 Subgraph Match vs. SPARQL Queries 8

9 Naïve Triple Store 9 SPARQL Query: Select ?name Where { ?m ?name. ?m “1809-02-12”. ?m “1865-04-15”. } SQL: Select T3.Subject From T as T1, T as T2, T as T3 Where T1.Predict=“BornOnDate” and T1.Object=“1809-02-12” and T2.Predict=“DiedOnDate” and T2.Object=“1865-04-15” and T3. Predict=“hasName” and T1.Subject = T2.Subject and T2. Subject= T3.subject Too many Self-Joins

10 Existing Solutions Three categories of solutions are proposed to speed up query processing: 1.Property Table; Jena [K. Wilkinson et al. SWDB 03], … 2. Vertically Partitioned Solution; SW-store [D. J. Abadi et al. VLDB 07],… 3. Exhaustive-Indexing RDF-3x [T. Neumann et al. VLDB 08], Hexastore [C. Weiss et al. VLDB 08 ],… 10

11 Existing Solutions-Property Table 11 SPARQL Query: Select ?name Where { ?m ?name. ?m “1809-02-12”. ?m “1865-04-15”. } SQL: Select People.hasName from People where People.BornOnDate = “1809-02-12” and People.DiedOnDate = “1865-04-15”. Reducing # of join steps

12 Existing Solutions- Vertically Partitioned Solution 12 Fast Merge Join

13 Existing Solutions- Exhaustive-Indexing Each SPARQL query statement can be translated into one “range query”. SPARQL Query: Select ?name Where { ?m ?name. ?m “1809-02-12”. ?m “1865-04-15”. } 13 Range query & Merge Join

14 Some Limitations 1.Difficult to handle ``wildcard queries’’. 2.Difficult to handle updates. 14

15 Outline Background & Related Work Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions 15

16 Intuition of gStore 16 Finding Matches over a Large Graph is not a trivial task.

17 Preliminaries 17 Entity Vertex Literal Vertex

18 Preliminaries RDF graph 18

19 Preliminaries Query Graph 19

20 Preliminaries match 20

21 Preliminaries Problem definition 21

22 Storage Schema in gStore 22 Encoding all neibhors into a “bit-string”, called signature.

23 Encoding Technique (1) |eSig(e).e| = M. we employ m different string hash functions Hi (i = 1,...,m) For each hash function Hi, we set the (Hi(eLabel) MOD M)-th bit in eS ig(e).e to be ‘1’ Encoding Sig(e).n is the same – |eSig(e).n| = N – n different hash functions 23

24 Encoding Technique (2) 24 “Abr”, “bra”, ”rah”, ”aha”, …., ( hasName, “Abraham Lincoln”) 0010 0000 0000 0000 0010 0000 0000 1000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0001 1000 0010 0100 0001 OR 1000 0010 0100 0001 ( BornOnDate, “1809-02-12”) 0100 0000 00000100 0010 0100 1000 ( DiedOnDate, “1865-04-15”) 0000 1000 00000000 0010 0100 0000 ( DiedIn, “y:Washington_D.c”) 0000 0010 00001000 0010 0100 0001 0110 1010 00001100 0010 0100 1001 OR

25 Encoding Technique (3) 25

26 Encoding Technique (4) 26 Finding Matches over signature graph G* Verify Each Match in RDF Graph G

27 Encoding Technique (5) 27

28 Outline Background & Related Work Overview of gStore Encoding Technique VS-tree & Query Algorithm Experiments Conclusions 28

29 A Straightforward Solution (1) 29 001 004 006 002 003 006 u1u1 u2u2 L1L1 L2L2

30 A Straightforward Solution (2) 30 001 004 006 002 003 006 Large Join Space !  L1L1 L2L2

31 VS-tree 31

32 VS-Tree query definition 32

33 Pruning Technique 33 u1u1 u2u2 10010 001 004 006 002 003 006 Reduced Join Space!

34 Query Algorithm-Top-Down 34

35 Optimized method Too many super edges Which level to start search No brute-force enumeration 35

36 VS*-Tree Insert The criterion in the VS-tree only depends on the Hamming distance between the signatures of u and the node in VS-tree. the criterion in VS ∗ - tree depends on both node signatures and G ∗ ’s structure 36

37 Updates- Insertion in G* 37

38 Updates- Insertion in VS*-tree 38

39 VS*-Tree split the B+1 entities of the node will be partitioned into two new nodes, where B is the maximal fanout for a node in VS ∗ -tree. 1. we find two entities that have the maximal Hamming distance between them as two seed nodes 2. we associate each left entry with the nearest seed node, according to Equation 1. 39

40 VS*-Tree deletion Similar to split if some node d has less than b entries, where b is the minimal fanout of node in VS ∗ -tree, then d is deleted and its entries are reinserted into VS ∗ -tree. 40

41 Updates- Deletion in VS*-tree 41 To be deleted

42 Which Level To Begin a concept “pruning power” of G I with regard to Q ∗ denoted as P(Q ∗,G I ) 42

43 Estimate P(Q*,G I ) 43

44 Finding Valid Child States propose a DFS strategy to find all valid child states of J. start a DFS over G ∗ beginning from some vertex vi 44

45 45

46 Outline Background & Related Work Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions 46

47 Datasets 47 Triple #Size Yago20 million3.1GB DBLP8 million0.8 GB

48 48 Offline Performance

49 Exact Queries 49

50 Wildcard Queries 50

51 Outline Background & Related Work Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions 51

52 Conclusions Vertex Encoding Technique; An Efficient index Structure: VS-tree; A Novel Filtering Technique. 52

53 53


Download ppt "GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 1 Peking University, 2 Hong."

Similar presentations


Ads by Google