GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 1 Peking University, 2 Hong.

Slides:



Advertisements
Similar presentations
Indexing DNA Sequences Using q-Grams
Advertisements

Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 gStore: Answering SPARQL Queries Via Subgraph Matching 1 Peking University, 2 Hong.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
RDF-3X: a RISC style Engine for RDF Ref: Thomas Neumann and Gerhard Weikum [PVLDB’08 ] Presented by: Pankaj Vanwari Course: Advanced Databases (CS 632)
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Knowledge Graph: Connecting Big Data Semantics
Searching on Multi-Dimensional Data
Da Yan, Zhou Zhao and Wilfred Ng The Hong Kong University of Science and Technology.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
2-dimensional indexing structure
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Reza Sherkat ICDE061 Reza Sherkat and Davood Rafiei Department of Computing Science University of Alberta Canada Efficiently Evaluating Order Preserving.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Semantic Web Query Processing with Relational Databases Artem Chebotko Department of Computer Science Wayne State University.
Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics, University of Zurich Summarized by: Arpit Gagneja.
Graph Data Management Lab, School of Computer Scalable SPARQL Querying of Large RDF Graphs Xu Bo
Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim
GRIN – A Graph Based RDF Index Octavian Udrea Andrea Pugliese V. S. Subrahmanian Presented by Tulika Thakur.
Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Abadi, Adam Marcus, Samuel R. Madden, Kate Hollenbach VLDB, 2007 Oct 15, 2014.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor.
Hexastore: Sextuple Indexing for Semantic Web Data Management
Space Efficient Data Structures for Dynamic Orthogonal Range Counting Meng He and J. Ian Munro University of Waterloo.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.
GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao {
1 Exact Top-k Nearest Keyword Search in Large Networks Minhao Jiang†, Ada Wai-Chee Fu‡, Raymond Chi-Wing Wong† † The Hong Kong University of Science and.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Keyword Query Routing.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
C-Store: RDF Data Management Using Column Stores Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009.
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
FlexTable: Using a Dynamic Relation Model to Store RDF Data IDS Lab. Seungseok Kang.
RDF-3X : RISC-Style RDF Database Engine
RDF-3X : a RISC-style Engine for RDF Thomas Neumann, Gerhard Weikum Max-Planck-Institute fur Informatik, Max-Planck-Institute fur Informatik PVLDB ‘08.
RDF-3X: a RISC-style Engine for RDF Presented by Thomas Neumann, Gerhard Weikum Max-Planck-Institut fur Informatik Saarbrucken, Germany Session 19: System.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Research Meeting Jaeseok Myung. Copyright  2009 by CEBT Summary  TA DB : project 3, midterm(24 명 응시 ) WEC : report, project (android), classroom,
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
An Effective SPARQL Support over Relational Database Jing Lu, Feng Cao, Li Ma, Yong Yu, Yue Pan SWDB-ODBIS 2007 SNU IDB Lab. Hyewon Lim July 30 th, 2009.
GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria.
RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.
RDF languages and storages part 2 - indexing semi-structure data Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Spatial Approximate String Search. Abstract This work deals with the approximate string search in large spatial databases. Specifically, we investigate.
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
RE-Tree: An Efficient Index Structure for Regular Expressions
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Probabilistic Data Management
TT-Join: Efficient Set Containment Join
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
XML-Based RDF Data Management for Efficient Query Processing
On Efficient Graph Substructure Selection
RDF Stores S. Sakr and G. A. Naymat.
Lu Xing CS59000GDM Sept 7th, 2018.
Efficient Subgraph Similarity All-Matching
A Small and Fast IP Forwarding Table Using Hashing
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Optimal Partitioning of Data Chunks in Deduplication Systems
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
An Efficient Partition Based Method for Exact Set Similarity Joins
Presentation transcript:

gStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao Peking University, 2 Hong Kong University of Science and Technology, 3 University of Waterloo

Outline Background & Related Work Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions 2

Outline Background & Related Work Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions 3

Semantic Web 4 “Semantic Web Technologies” is a collection of standard technologies to realize a Web of Data.

RDF Data Model 5 URI Literals

RDF Graph 6 Entity Vertex Literal Vertex

SPARQL Queries 7 SPARQL Query: Select ?name Where { ?m ?name. ?m “ ”. ?m “ ”. } Query Graph

Subgraph Match vs. SPARQL Queries 8

Naïve Triple Store 9 SPARQL Query: Select ?name Where { ?m ?name. ?m “ ”. ?m “ ”. } SQL: Select T3.Subject From T as T1, T as T2, T as T3 Where T1.Predict=“BornOnDate” and T1.Object=“ ” and T2.Predict=“DiedOnDate” and T2.Object=“ ” and T3. Predict=“hasName” and T1.Subject = T2.Subject and T2. Subject= T3.subject Too many Self-Joins

Existing Solutions Three categories of solutions are proposed to speed up query processing: 1.Property Table; Jena [K. Wilkinson et al. SWDB 03], … 2. Vertically Partitioned Solution; SW-store [D. J. Abadi et al. VLDB 07],… 3. Exhaustive-Indexing RDF-3x [T. Neumann et al. VLDB 08], Hexastore [C. Weiss et al. VLDB 08 ],… 10

Existing Solutions-Property Table 11 SPARQL Query: Select ?name Where { ?m ?name. ?m “ ”. ?m “ ”. } SQL: Select People.hasName from People where People.BornOnDate = “ ” and People.DiedOnDate = “ ”. Reducing # of join steps

Existing Solutions- Vertically Partitioned Solution 12 Fast Merge Join

Existing Solutions- Exhaustive-Indexing Each SPARQL query statement can be translated into one “range query”. SPARQL Query: Select ?name Where { ?m ?name. ?m “ ”. ?m “ ”. } 13 Range query & Merge Join

Some Limitations 1.Difficult to handle ``wildcard queries’’. 2.Difficult to handle updates. 14

Outline Background & Related Work Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions 15

Intuition of gStore 16 Finding Matches over a Large Graph is not a trivial task.

Preliminaries 17 Entity Vertex Literal Vertex

Preliminaries RDF graph 18

Preliminaries Query Graph 19

Preliminaries match 20

Preliminaries Problem definition 21

Storage Schema in gStore 22 Encoding all neibhors into a “bit-string”, called signature.

Encoding Technique (1) |eSig(e).e| = M. we employ m different string hash functions Hi (i = 1,...,m) For each hash function Hi, we set the (Hi(eLabel) MOD M)-th bit in eS ig(e).e to be ‘1’ Encoding Sig(e).n is the same – |eSig(e).n| = N – n different hash functions 23

Encoding Technique (2) 24 “Abr”, “bra”, ”rah”, ”aha”, …., ( hasName, “Abraham Lincoln”) OR ( BornOnDate, “ ”) ( DiedOnDate, “ ”) ( DiedIn, “y:Washington_D.c”) OR

Encoding Technique (3) 25

Encoding Technique (4) 26 Finding Matches over signature graph G* Verify Each Match in RDF Graph G

Encoding Technique (5) 27

Outline Background & Related Work Overview of gStore Encoding Technique VS-tree & Query Algorithm Experiments Conclusions 28

A Straightforward Solution (1) u1u1 u2u2 L1L1 L2L2

A Straightforward Solution (2) Large Join Space !  L1L1 L2L2

VS-tree 31

VS-Tree query definition 32

Pruning Technique 33 u1u1 u2u Reduced Join Space!

Query Algorithm-Top-Down 34

Optimized method Too many super edges Which level to start search No brute-force enumeration 35

VS*-Tree Insert The criterion in the VS-tree only depends on the Hamming distance between the signatures of u and the node in VS-tree. the criterion in VS ∗ - tree depends on both node signatures and G ∗ ’s structure 36

Updates- Insertion in G* 37

Updates- Insertion in VS*-tree 38

VS*-Tree split the B+1 entities of the node will be partitioned into two new nodes, where B is the maximal fanout for a node in VS ∗ -tree. 1. we find two entities that have the maximal Hamming distance between them as two seed nodes 2. we associate each left entry with the nearest seed node, according to Equation 1. 39

VS*-Tree deletion Similar to split if some node d has less than b entries, where b is the minimal fanout of node in VS ∗ -tree, then d is deleted and its entries are reinserted into VS ∗ -tree. 40

Updates- Deletion in VS*-tree 41 To be deleted

Which Level To Begin a concept “pruning power” of G I with regard to Q ∗ denoted as P(Q ∗,G I ) 42

Estimate P(Q*,G I ) 43

Finding Valid Child States propose a DFS strategy to find all valid child states of J. start a DFS over G ∗ beginning from some vertex vi 44

45

Outline Background & Related Work Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions 46

Datasets 47 Triple #Size Yago20 million3.1GB DBLP8 million0.8 GB

48 Offline Performance

Exact Queries 49

Wildcard Queries 50

Outline Background & Related Work Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions 51

Conclusions Vertex Encoding Technique; An Efficient index Structure: VS-tree; A Novel Filtering Technique. 52

53