GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao {

Slides:



Advertisements
Similar presentations
Jiaheng Lu, Ting Chen and Tok Wang Ling National University of Singapore Finding all the occurrences of a twig.
Advertisements

Indexing DNA Sequences Using q-Grams
Applied Temporal RDF: Efficient Temporal Querying using SPARQL Jonas Tappolet and Abraham Bernstein ESWC 2009.
Evaluating “find a path” reachability queries P. Bouros 1, T. Dalamagas 2, S.Skiadopoulos 3, T. Sellis 1,2 1 National Technical University of Athens 2.
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 gStore: Answering SPARQL Queries Via Subgraph Matching 1 Peking University, 2 Hong.
© Copyright IBM Corporation 2014 Getting started with Rational Engineering Lifecycle Manager queries Andy Lapping – Technical sales and solutions Joanne.
RDF-3X: a RISC style Engine for RDF Ref: Thomas Neumann and Gerhard Weikum [PVLDB’08 ] Presented by: Pankaj Vanwari Course: Advanced Databases (CS 632)
Knowledge Graph: Connecting Big Data Semantics
STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA Andrea Pugliese.
Mining Graphs.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Chen Chen 1, Cindy X. Lin 1, Matt Fredrikson 2, Mihai Christodorescu 3, Xifeng Yan 4, Jiawei Han 1 1 University of Illinois at Urbana-Champaign 2 University.
Continuous Data Stream Processing  Music Virtual Channel – extensions  Data Stream Monitoring – tree pattern mining  Continuous Query Processing – sequence.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics, University of Zurich Summarized by: Arpit Gagneja.
Graph Data Management Lab, School of Computer Scalable SPARQL Querying of Large RDF Graphs Xu Bo
Moving Objects Databases Nilanshu Dharma Shalva Singh.
Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
GRIN – A Graph Based RDF Index Octavian Udrea Andrea Pugliese V. S. Subrahmanian Presented by Tulika Thakur.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.
VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor.
Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools Mohammad Farhan Husain, Latifur Khan, Murat Kantarcioglu and Bhavani Thuraisingham.
KIT – University of the State of Baden-Württemberg and National Large-scale Research Center of the Helmholtz Association Institute of Applied Informatics.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Efficient P2P Searches Using Result-Caching From U. of Maryland. Presented by Lintao Liu 2/24/03.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Keyword Query Routing.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Large-scale Linked Data Management Marko Grobelnik, Andreas Harth (Günter Ladwig), Dumitru Roman Big Linked Data Tutorial Semantic Days 2012.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Efficient Semantic Based Content Search in P2P Network Heng Tao Shen, Yan Feng Shu, and Bei Yu.
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
FlexTable: Using a Dynamic Relation Model to Store RDF Data IDS Lab. Seungseok Kang.
GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao Peking University, 2 Hong.
Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.
VLDB2005 CMS-ToPSS: Efficient Dissemination of RSS Documents Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
An Effective SPARQL Support over Relational Database Jing Lu, Feng Cao, Li Ma, Yong Yu, Yue Pan SWDB-ODBIS 2007 SNU IDB Lab. Hyewon Lim July 30 th, 2009.
Evolving RBF Networks via GP for Estimating Fitness Values using Surrogate Models Ahmed Kattan Edgar Galvan.
GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria.
Tool for Ontology Paraphrasing, Querying and Visualization on the Semantic Web Project By Senthil Kumar K III MCA (SS)‏
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Outline Introduction State-of-the-art solutions Equi-Truss Experiments
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
RE-Tree: An Efficient Index Structure for Regular Expressions
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Probabilistic Data Management
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
XML-Based RDF Data Management for Efficient Query Processing
On Efficient Graph Substructure Selection
Lu Xing CS59000GDM Sept 7th, 2018.
Efficient Subgraph Similarity All-Matching
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Presentation transcript:

gStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao {

Agenda Introduction Preliminaries Overview of gStore Storage Scheme and Encoding Technique Indexing Structure and Query Algorithm Optimized methods Experiments and their results Conclusions

Introduction -1/4 What is RDF? – Building block of semantic web – Represented as a collection of triples : (Subject,Property,Object) Prefix: y= SubjectPropertyObject y:Abraham LincolnhasNameAbraham Lincoln y:Abraham LincolnBornOnDate y:Abraham LincolnDiedOnDate y:Abraham LincolnDiedIny:Washington_D.C hasName“Washington D.C” y:Washington_D.CFoundYear1790 y:Washington_D.Crdf:typey:city y:United_StateshasName“United States” y:United_StateshasCapitaly:Washington_D.C y:United_Statesrdf:typeCountry y:Reese_Witherspoonrdf:typey:Actor y:Reese_WitherspoonBornOnDate“ ” y:Reese_WitherspoonBornIny:New_Orleans_Louisiana y:Reese_WitherspoonhasName“Reese Witherspoon” y:New_Orleans_LouisianaFoundYear1718 y:New_Orleans_Louisianardf:typey:city y:New_Orleans_LouisianalocatedIny:United_States

Introduction 2/4:RDF Graph

Introduction - 3/4 What is SPARQL? Sample query: Select ?name Where { ?m ?name. ?m “ ” ?m “ ” } Query with wildcards: Select ?name Where { ?m ?name. ?m ?bd. ?m ?dd. FILTER regex(str(?bd), “02-12”), regex(str(?dd), “04-15”) }

Introduction - 4/4 Problems with existing solutions: – they cannot answer SPARQL queries with wildcards in a scalable manner – they cannot handle frequent updates in RDF repositories Answering with subgraph matching – Modeling RDF data and Query as two graphs – Cannot use regular graph pattern matching – Answering SPARQL query ≈ subgraph matching

Preliminaries RDF graph, G, is denoted as G=(V, L V, E, L E ) Query graph, Q, is denoted as Q=(V, L V, E, L E )

G(u 1, u 2,…, u n ) is a match of Q(v 1, v 2,…, v n ) if: – v i is a literal vertex, v i and u i have the same literal value – v i is a class/entity vertex, v i and u i have the same URI – v i is a parameter vertex, there is no constraint over u i – v i is a wildcard vertex, v i is a substring of u i and u i is a literal value – there is an edge from v i to v j in Q with the property p, there is also an edge from u i to u j in G with the same property p Preliminaries Cont’d

Overview of gstore Work directly on RDF graph and SPARQL Query graph Use a signature-based encoding of each entity and class vertex to speed up matching Filter and evaluate – Use a false-positive algorithm to prune nodes and obtain a set of candidates; then verify each candidate Use an index (VS ∗ -tree) over the data signature graph (has light maintenance load) for efficient pruning

Storage Scheme & Encoding Technique Storage Scheme

Storage Scheme & Encoding Technique Encoding technique (hasName, “Abraham Lincoln”)

Storage Scheme & Encoding Technique Encoding technique (hasName, “Abraham Lincoln”) “Abr” “bra” “rah”

Storage Scheme & Encoding Technique Encoding technique (hasName, “Abraham Lincoln”) “Abr” “bra” “rah”

Storage Scheme & Encoding Technique Encoding technique (hasName, “Abraham Lincoln”) “Abr” “bra” “rah” OR

Storage Scheme & Encoding Technique Encoding technique (hasName, “Abraham Lincoln”)

Storage Scheme & Encoding Technique Encoding technique (hasName, “Abraham Lincoln”) (BornOnDate, " ") (DiedOnDate, " ") (DiedIn, y:Washington DC) OR

Indexing Structure and Query Algorithm

Data Signature Graph G*

Converting Q to Q*

Filter and Evaluate Find matches of Q* over G*(CL) Verify each match in RDF against G(RS)

Generating Candidate List(CL) Two step process: – for each vertex v i ∈ V (Q ∗ ), we find a list R i = {u i1, u i2,..., u in }, where v i &u i= v i, u i ∈ V(G*) and u ij ∈ R i – do a multi-way join to get the candidate list Use S-trees – Height-balanced tree over signatures – Does not support second step - expensive Vs-tree and Vs*-tree – Multi-resolution summary graph based on S-tree – Supports both steps efficiently

S-tree Solution d13d13 d23d23 d33d33 d43d43 d12d12 d22d22 d13d

S-tree Solution d13d13 d23d23 d33d33 d43d43 d12d12 d22d22 d13d

S-tree Solution d13d13 d23d23 d33d33 d43d43 d12d12 d22d22 d13d

S-tree Solution d13d13 d23d23 d33d33 d43d43 d12d12 d22d22 d13d

S-tree Solution d13d13 d23d23 d33d33 d43d43 d12d12 d22d22 d13d & 006

VS-tree Solution d13d13 d23d23 d33d33 d43d43 d12d12 d22d22 d11d

VS-tree Solution

VS-tree Solution d 1 1 Xd11d11

VS-tree Solution d 1 2 Xd12d12

VS-tree Solution d 1 3 Xd23d23

VS-tree Solution X002

VS-tree Solution- limitations If this level is dense, many summary matches => More search space Process each level step by step

Possible Optimization Methods “magically” know which level to begin with to minimize the number of summary matches Use DFS(Depth First Search) to find the valid child nodes While inserting vertices, consider not only the hamming distance but also the number of super edges introduced

Optimization example

Experimental results-Exact queries Queries Yago network (20 million triples & size 3.1GB) gStore RDF-3xSW-Storex-RDF-3x BigOWLIM GRIN

Experimental results-Wildcard queries Queries gStoreRDF-3x SW-Store x-RDF-3x BigOWLIM GRIN

Conclusion This approach: – Uses two novel indexes VS-tree and VS*-tree to speed up query processing – Was also to solve the two problems with existing solutions: answers SPARQL queries with wildcards in a scalable manner handle frequent and online updates in RDF repositories

Questions?