GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria.

Slides:

Advertisements

Similar presentations

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.

Advertisements

1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.

Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 gStore: Answering SPARQL Queries Via Subgraph Matching 1 Peking University, 2 Hong.

The Palm-tree Index Indexing with the crowd Ahmed R Mahmood*Walid G. Aref* Eduard Dragut*Saleh Basalamah** *Purdue University**Umm AlQura University.

1 Greedy Forwarding in Dynamic Scale-Free Networks Embedded in Hyperbolic Metric Spaces Dmitri Krioukov CAIDA/UCSD Joint work with F. Papadopoulos, M.

Progress Update Semantic Web, Ontology Integration, and Web Query Seminar Department of Computing David George.

STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA Andrea Pugliese.

Designing Indexing Structure for Discovering Relationships in RDF Graphs Stanislav Bartoň.

Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,

Leveraging Data and Structure in Ontology Integration Octavian Udrea 1 Lise Getoor 1 Renée J. Miller 2 1 University of Maryland College Park 2 University.

RDF Databases By: Chris Halaschek. Outline Motivation / Requirements Storage Issues Sesame General Introduction Architecture Scalability RQL Introduction.

Query Specific Fusion for Image Retrieval

Using Trees to Depict a Forest Bin Liu, H. V. Jagadish EECS, University of Michigan, Ann Arbor Presented by Sergey Shepshelvich 1.

Xyleme A Dynamic Warehouse for XML Data of the Web.

Midterm 2 Overview Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Graph-Based Synopses for Relational Selectivity Estimation Joshua Spiegel and Neoklis Polyzotis University of California, Santa Cruz.

1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan.

Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.

Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.

Semantic Web Query Processing with Relational Databases Artem Chebotko Department of Computer Science Wayne State University.

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

Graph Data Management Lab, School of Computer Scalable SPARQL Querying of Large RDF Graphs Xu Bo

Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.

GRIN – A Graph Based RDF Index Octavian Udrea Andrea Pugliese V. S. Subrahmanian Presented by Tulika Thakur.

RDF Triple Stores Nipun Bhatia Department of Computer Science. Stanford University.

-By Mohamed Ershad Junaid UTD ID :

1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.

Entity Recognition via Querying DBpedia ElShaimaa Ali.

1 On Querying Historical Evolving Graph Sequences Chenghui Ren $, Eric Lo *, Ben Kao $, Xinjie Zhu $, Reynold Cheng $ $ The University of Hong Kong $ {chren,

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Database Support for Semantic Web Masoud Taghinezhad Omran Sharif University of Technology Computer Engineering Department Fall.

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,

Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools Mohammad Farhan Husain, Latifur Khan, Murat Kantarcioglu and Bhavani Thuraisingham.

 Open source RDF framework in Java.  Supports RDF Schema inferencing and querying.  Supports SPARQL 1.1 query, update, federated query.

Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.

Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.

SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Clustering XML Documents for Query Performance Enhancement Wang Lian.

Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.

Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:

C-Store: RDF Data Management Using Column Stores Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009.

RDF languages and storages part 1 - expressivness Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.

FlexTable: Using a Dynamic Relation Model to Store RDF Data IDS Lab. Seungseok Kang.

GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao Peking University, 2 Hong.

Practical RDF Chapter 10. Querying RDF: RDF as Data Shelley Powers, O’Reilly SNU IDB Lab. Hyewon Lim.

RDF-3X : a RISC-style Engine for RDF Thomas Neumann, Gerhard Weikum Max-Planck-Institute fur Informatik, Max-Planck-Institute fur Informatik PVLDB ‘08.

Practical RDF Ch.10 Querying RDF: RDF as Data Taewhi Lee SNU OOPSLA Lab. Shelley Powers, O’Reilly August 27, 2004.

Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.

RDFPath: Path Query Processing on Large RDF Graph with MapReduce Martin Przyjaciel-Zablocki et al. University of Freiburg ESWC May 2013 SNU IDB.

R Store Angelique Moscicki Oshani Seneviratne Sergio Herrero-Lopez.

Triple Stores. What is a triple store? A specialized database for RDF triples Can ingest RDF in a variety of formats Supports a query language – SPARQL.

Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar When they were out of sight Ali Baba.

1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.

Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.

An Effective SPARQL Support over Relational Database Jing Lu, Feng Cao, Li Ma, Yong Yu, Yue Pan SWDB-ODBIS 2007 SNU IDB Lab. Hyewon Lim July 30 th, 2009.

RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.

An Optimization Technique for RDFS Inference using the Application Order of RDFS Entailment Rules Kisung Kim, Taewhi Lee

RDF languages and storages part 2 - indexing semi-structure data Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.

Sesame A generic architecture for storing and querying RDF and RDFs Written by Jeen Broekstra, Arjohn Kampman Summarized by Gihyun Gong.

Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.

Spatial Data Management

Strategies for Spatial Joins

Probabilistic Data Management

RDF Stores S. Sakr and G. A. Naymat.

CoXML: A Cooperative XML Query Answering System

Presentation transcript:

GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria

2 Motivation Plenty of large RDF datasets:  TAP, GovTrack, ChefMoz, CIA World Factbook  Many many more (see rdfdata.org) Query languages: RDQL, RQL, SPARQL DB systems: Jena, Sesame, RDFBroker Indexing?  Based on relational database indexes  Has to be rooted in the characteristics of the query language

Contributions Lightweight mechanism for indexing large RDF datasets  GRIN: Graph-based RDF INdex Query answer algorithms for SPARQL-like queries Evaluation on two real-world datasets: TAP (Stanford) and ChefMoz (chefmoz.org) 3

Outline RDF data and queries The GRIN Index structure Answering queries Experimental evaluation 4

RDF graph example (ChefMoz) 5

RDF query example 6

Query example in SPARQL 7 X SELECT ?v1 ?v2 ?v3 WHERE { {(?v1 attire ?v3). (?v1 cuisine Italian)} {(?v2 attire ?v3). (?v2 cuisine Italian). (?v2 location Norfolk)} {(Norfolk locatedIn NE/USA)} } FROM ChefMoz

Native RDF systems: Jena2 Stores RDF as (subject, property, value) in a relational table Indexes on each of the three attributes Translates SPARQL/RDQL into SQL 8 X 6 self-joins

Native RDF systems: Sesame Broekstra et al., ISWC 2002 The Sesame SAIL API improves on Jena:  Supports RDF Schema inference  Separates RDFS from the triple table  Supports database schema generation based on the underlying RDF schema of a dataset The problem of too many joins remains 9

Native RDF systems: RDFBroker Sintek et al., ESWC 2006 The database schema is built based on signatures – the set of properties used on a resource Reduces the number of joins between tables 10

The human perspective 11

The human perspective 12

The human perspective 13

The human perspective 14

The human perspective 15

Outline RDF data and queries The GRIN Index structure Answering queries Experimental evaluation 16

GRIN intuition Resources “closer” in the RDF graph are more likely to be part of the same answer  Hence they should appear on the same page GRIN will group resources in circles around selected center resources Query evaluation:  Find the smallest circle that contains the answer  Evaluate query only on resources in that circle 17

The GRIN Index structure GRIN is a binary tree in which:  Leaf nodes are sets of resources (and the associated triples)  Inner nodes are circles consisting of a center resource and a radius  Each node is fully contained in its parent Distance metric: shortest path distance in the undirected graph 18

Building the index: clustering 19

Building the index: clustering 20

Building the index: clustering 21

Building the index: clustering 22

Building the index: clustering Standard k-medoids clustering (Kaufman & Rousseeuw, 1987) How many clusters?  R is the set of resources  M is the maximum number of resources per page Average link gives the best performance for the inter-cluster distance 23

Building the index: the tree 24

Building the index: the tree 25

Building the index: the tree 26

Outline RDF data and queries The GRIN Index structure Answering queries Experimental evaluation 27

Queries to constraints Extract constraints from the query:  d(?v1, Italian) ≤ 1  d(?v2, Norfolk) ≤ 1  d(?v3, Italian) ≤ 2  …and so on 28

Query evaluation 29 Goal: identify the smallest circle that is guaranteed to contain an answer to the query 1. Perform a depth-first traversal 2. For each index node, evaluate the constraints 3. If the constraints guarantee an answer, perform subgraph matching

Query evaluation 30

Evaluating constraints Constraints:  d(?v1, Italian) ≤ 1, d(?v2, Norfolk) ≤ 1, d(?v3, Italian) ≤ 2 Question: is ?v1 in the circle (Grivanti, 3)?  d(Grivanti,?v1) ≤ d(Grivanti, Italian) + d(?v1, Italian) ≤ = 2  ?v1 must be in the circle (Grivanti, 3) 31

Evaluating constraints Question: is ?v3 in (Grivanti, 3)?  d(Grivanti, ?v3) ≤ d(Grivanti, Italian) + d(Italian, ?v3) ≤ = 3  ?v3 must be in (Grivanti, 3)  Similarly, ?v2 is in the same circle 32

Subgraph matching Perform subgraph matching on the resources in the circles guaranteed to contain an answer  Algorithm by Cordella et. al, IEEE PAMI 26(10), 2006 Worst-time complexity of O(N!)  Where N is the maximum number of nodes in either graph  In practice, GRIN makes N very small 33

Outline RDF data and queries The GRIN Index structure Answering queries Experimental evaluation 34

Experimental framework Comparison between GRIN, Sesame, Jena2 and RDFBroker (in-memory)  Index build time  Memory consumption at query time  Query time Two real-world datasets:  TAP (Stanford): datasets between 1.5MB and 300MB  ChefMoz (chefmoz.org): 220 MB 35

Index build time 36

Memory consumption 37

Query time 38

Average degree of a query node 39

Conclusions Method for indexing large RDF graphs adapted to the characteristics of RDF queries Avoids expensive join operations Gives better query times than Jena2, Sesame and RDFBroker Current and future work:  Disk-based index  Analysis of overlap and coverage 40