VLDB 2005 An Efficient SQL-based RDF Querying Scheme Eugene Inseok Chong Souripriya Das George Eadon Jagannathan Srinivasan New England Development Center.

Slides:



Advertisements
Similar presentations
1 Building Database Infrastructure for Managing Semantic Data.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
ESDSWG2011 – Semantic Web session Semantic Web Sub-group Session ESDSWG 2011 Meeting – Semantic Web sub-group session Wednesday, November 2, 2011 Norfolk,
RDF Tutorial.
Store RDF Triples In A Scalable Way Liu Long & Liu Chunqiu.
 Copyright 2004 Digital Enterprise Research Institute. All rights reserved. SPARQL Query Language for RDF presented by Cristina Feier.
SPARQL for Querying PML Data Jitin Arora. Overview SPARQL: Query Language for RDF Graphs W3C Recommendation since 15 January 2008 Outline: Basic Concepts.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Introduction to Structured Query Language (SQL)
Semantic Web Query Processing with Relational Databases Artem Chebotko Department of Computer Science Wayne State University.
Introduction to Structured Query Language (SQL)
Logics for Data and Knowledge Representation SPARQL Protocol and RDF Query Language (SPARQL) Feroz Farazi.
Managing & Integrating Enterprise Data with Semantic Technologies Susie Stephens Principal Product Manager, Oracle
Rationale Aspiring Database Developers should be able to efficiently query and maintain databases. This module will help students learn the Structured.
Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.
SPARQL All slides are adapted from the W3C Recommendation SPARQL Query Language for RDF Web link:
Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Abadi, Adam Marcus, Samuel R. Madden, Kate Hollenbach VLDB, 2007 Oct 15, 2014.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
SPARQL Semantic Web - Spring 2008 Computer Engineering Department Sharif University of Technology.
Logics for Data and Knowledge Representation
Chapter 3 Querying RDF stores with SPARQL. Why an RDF Query Language? Why not use an XML query language? XML at a lower level of abstraction than RDF.
Database Support for Semantic Web Masoud Taghinezhad Omran Sharif University of Technology Computer Engineering Department Fall.
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,
KIT – University of the State of Baden-Württemberg and National Large-scale Research Center of the Helmholtz Association Institute of Applied Informatics.
Master Informatique 1 Semantic Technologies Part 11Direct Mapping Werner Nutt.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
Ontology Query. What is an Ontology Ontologies resemble faceted taxonomies but use richer semantic relationships among terms and attributes, as well as.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Semantic Technology in Oracle Database. Data Interoperability Challenges Data locked into schemas, formats, software systems Semantic technology seen.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Large-scale Linked Data Management Marko Grobelnik, Andreas Harth (Günter Ladwig), Dumitru Roman Big Linked Data Tutorial Semantic Days 2012.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Semantically Processing The Semantic Web Presented by: Kunal Patel Dr. Gopal Gupta UNIVERSITY OF TEXAS AT DALLAS.
M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society OpenDAP 2007
C-Store: RDF Data Management Using Column Stores Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009.
FlexTable: Using a Dynamic Relation Model to Store RDF Data IDS Lab. Seungseok Kang.
Scalable Distributed Reasoning Using MapReduce Jacopo Urbani, Spyros Kotoulas, Eyal Oren, and Frank van Harmelen Department of Computer Science, Vrije.
Practical RDF Chapter 10. Querying RDF: RDF as Data Shelley Powers, O’Reilly SNU IDB Lab. Hyewon Lim.
VLDB2005 CMS-ToPSS: Efficient Dissemination of RSS Documents Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto.
Practical RDF Ch.10 Querying RDF: RDF as Data Taewhi Lee SNU OOPSLA Lab. Shelley Powers, O’Reilly August 27, 2004.
Oracle Data Integrator User Functions, Variables and Advanced Mappings
RDF and Relational Databases
RDF & SPARQL Introduction Dongfang Xu Ph.D student, School of Information, University of Arizona Sept 10, 2015.
Raluca Paiu1 Semantic Web Search By Raluca PAIU
An Effective SPARQL Support over Relational Database Jing Lu, Feng Cao, Li Ma, Yong Yu, Yue Pan SWDB-ODBIS 2007 SNU IDB Lab. Hyewon Lim July 30 th, 2009.
CC L A W EB DE D ATOS P RIMAVERA 2015 Lecture 7: SPARQL (1.0) Aidan Hogan
An Optimization Technique for RDFS Inference using the Application Order of RDFS Entailment Rules Kisung Kim, Taewhi Lee
RDF languages and storages part 2 - indexing semi-structure data Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.
Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
Semantic Web in Depth SPARQL Protocol and RDF Query Language Dr Nicholas Gibbins –
More SQL: Complex Queries, Triggers, Views, and Schema Modification
CC La Web de Datos Primavera 2017 Lecture 7: SPARQL [i]
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
Keyword Search over RDF Graphs
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
Probabilistic Data Management
CPSC-310 Database Systems
Logics for Data and Knowledge Representation
RDF Stores S. Sakr and G. A. Naymat.
<Insert Picture Here>
CC La Web de Datos Primavera 2016 Lecture 7: SPARQL (1.0)
Triple Stores.
Lu Xing CS59000GDM Sept 7th, 2018.
Contents Preface I Introduction Lesson Objectives I-2
Semantic-Web, Triple-Strores, and SPARQL
<Insert Picture Here>
Presentation transcript:

VLDB 2005 An Efficient SQL-based RDF Querying Scheme Eugene Inseok Chong Souripriya Das George Eadon Jagannathan Srinivasan New England Development Center Oracle

VLDB 2005 Talk Outline Introduction Functionality Design and Implementation Performance Conclusions and Future Work

VLDB 2005 Introduction

VLDB 2005 RDF (Resource Description Framework) RDF is a W3C Standard for describing resources on the web Uniform Resource Identifiers (URIs) are used to identify resources Example: RDF triples are used to make statements about a resource Format: (subject predicate object) Example: (:John :brotherOf :Mary) Represents a directed, labeled edge in an RDF graph: :John:Mary :brotherOf

VLDB 2005 RDF Data and Graph Example Family Data: (:John :brotherOf :Mary) (:Mary :parentOf :Matt) (:John :name “John”) (:Mary :name “Mary”) (:Matt :name “Matt”) :John :Mary :brotherOf :Matt :parentOf :name John Mary :name Matt :name

VLDB 2005 RDF Querying Problem Given RDF graphs: the data set to be searched Graph Pattern: containing a set of variables Find Matching Subgraphs Return Sets of variable bindings: where each set corresponds to a Matching Subgraph

VLDB 2005 RDF Query Example Family Data: (:John :brotherOf :Mary) (:Mary :parentOf :Matt) (:John :name “John”) (:Mary :name “Mary”) (:Matt :name “Matt”) Graph Pattern: (names of Mary’s brothers) (?x :brotherOf ?y) (?y :name “Mary”) (?x :name ?n) Variable Bindings: x = :John, y = :Mary, n = “John” Matching Subgraph: (:John :brotherOf :Mary) (:Mary :name “Mary”) (:John :name “John”) :John :Mary :brotherOf :Matt :parentOf :name John Mary :name Matt :name

VLDB 2005 RDF Storage Issues Need to store RDF triples where the individual components can be URIs, blank nodes, or literals Namespaces used in URIs could be long Multiple triples describe a resource resulting in repetition of (possibly long) URIs Different representations possible for a literal occurring in multiple triples e.g e e+2 RDF graph may include schema triples e.g. (:brotherOf rdfs:domain :Male)

VLDB 2005 RDF Querying Issues in SQL Support specification of graph pattern-based SQL query Occurrence of same variables in multiple triples of graph pattern: Processing requires self-join e.g. (?x :brotherOf ?y) (?y :name “Mary”) (?x :name ?n) Query processing (e.g for filter conditions, ORDER BY) requires datatype-specific comparison semantics Schema Triple: (:age rdfs:range xsd:int) Graph Pattern: (?x :age ?a) Filter Condition: a > 60 ORDER BY: a DESCENDING

VLDB 2005 RDF Querying Issues: Inference Query processing may involve Inferencing. Example: Data: (:Jim :brotherOf :John) (:John :fatherOf :Mary) Graph Pattern: (?x :uncleOf ?y) Result: Empty Rule: (?x :brotherOf ?y) (?y :fatherOf ?z)  (?x :uncleOf ?z) Inferred data: (:Jim :uncleOf :Mary) Result: x = :Jim, y = :Mary

VLDB 2005 RDF Querying Approach General Approach Create a new (declarative, SQL-like) query language e.g.: RQL, SeRQL, TRIPLE, N3, Versa, SPARQL, RDQL, RDFQL, SquishQL, RSQL, etc. SQL-based Approach Introduces a SQL Table Function RDF_MATCH that uses SPARQL-like graph pattern to express RDF queries Benefits of SQL-based Approach Leverages all the powerful constructs in SQL (e.g., SELECT / FROM / WHERE, ORDER BY, GROUP BY, aggregates, Join) to process graph query results RDF queries can easily be combined with conventional queries on database tables thereby avoiding staging

VLDB 2005 SELECT … FROM …, TABLE ( ) t, … WHERE …; Use of RDF_MATCH Table Function allows embedding a graph query in a SQL query Embedding RDF Query in SQL RDF Query (expressed as RDF_MATCH Table Function invocation)

VLDB 2005 Functionality

VLDB 2005 RDF_MATCH Table Function Input parameters RDF_MATCH ( Pattern,  graph pattern Models,  Data (set of RDF graphs) RuleBases,  Rules (0 or more rulebases) Aliases  list of prefixes for namespaces ) Returns a set of columns containing variable bindings Variable matching URI returned as single VARCHAR2 column with the same name (e.g. x for ?x) Variable matching literal returned as a pair of VARCHAR2 columns with a name (e.g. x for ?x) and the type (x$type for ?x)

VLDB 2005 RDF_MATCH Example Example: student reviewers less than 25 years old SELECT t.r reviewer, t.c conf, t.a age FROM TABLE ( RDF_MATCH ( ‘(?r rdf:type :Student) (?r :reviewerOf ?c) (?r :age ?a)’, RDFModels(‘reviewers’), NULL, RDFAliases(…)) ) t WHERE t.a < 25;

VLDB 2005 Specifying Rules RDFS rulebase: Pre-Loaded Can add User-defined rules Rule: “Chairperson of Conference is also a reviewer” (‘rb’,  rulebase name ‘ChairpersonRule’,  rule name ‘(?r :ChairpersonOf ?c)’  antecedents NULL,  filter condition NULL,  aliases ‘(?r:ReviewerOf ?c)’)  consequents

VLDB 2005 RDF_MATCH Example with rulebase Query: Find reviewers of conferences SELECT t.r reviewer FROM TABLE( RDF_MATCH( ‘(?r :ReviewerOf?c)’, RDFModels (‘reviewers’), RDFRules (‘rb’), NULL)) t; Data  (:Mary :ChairpersonOf :IDBC2005) Inferred data  (:Mary :ReviewerOf :IDBC2005)

VLDB 2005 Design & Implementation

VLDB 2005 RDF Data Storage Triples Data stored after normalization in two tables UriMap(UriID, UriValue,…) contains mapping of (URIs, blank nodes, literals) to internal identifiers IdTriples (ModelID, SubjectID, PropertyID, ObjectID,…) contains the triple information encoded as three identifiers Multiple representation of literals: The first occurrence treated as canonical, rest mapped to canonical representation e.g  e e+1

VLDB 2005 RDF_MATCH Query Processing Subsititute aliases with namespaces in search pattern Convert URIs and literals to internal IDs Generate Query Generate self-join query based on matching variables Generate SQL subqueries for rulebases component (if any) Generate the join result by joining internal IDs with UriMap table Use model IDs to restrict IdTriples table Compile and Execute the generated query

VLDB 2005 Optimization: Table Function Rewrite TableRewriteSQL( ) Takes RDF Query (specified via arguments) as input generates a SQL string Substitute the table function call with the generated SQL string Reparse and execute the resulting query Advantages Avoid execution-time overhead (linear in number of result rows) associated with table function infrastructure Leverage SQL optimizer capabilities to optimize the resulting query (including filter condition pushdown)

VLDB 2005 Optimization: Materialized Join Views Generic Materialized Join views (MJVs) Subject-Subject, Object-Subject, … Subject-property matrix MJVs (SPMJVs) custom, workload based (e.g., frequent search patterns) Example: Select student name, university, and age Select r, u, a …… ‘(?r rdf:type :Student) (?r :enrolledAt?u) (?r :age ?a)’ …… SPMJV:

VLDB 2005 Performance

VLDB 2005 Dataset WordNet : lexical database for English language UniProt : large scale (80 million triples) Protein and annotation data

VLDB 2005 Experiments Varying number of triples in search pattern Varying filter conditions Varying projection list Large-scale RDF data Subject-property MJVs

VLDB 2005 Varying Number of Triples ‘(?a wn:hyponymOf ?b) (?b wn:hyponymOf ?c) ….. Increasing number of self-joins

VLDB 2005 Varying Number of Triples

VLDB 2005 Varying Projection List ‘(?c0 wn:wordForm ?word) (?c0 wn:wordForm ?syn1) (?c1 wn:wordForm ?syn1) …. (5 triples) Benefit of the projection list optimization Eliminate joins with UriMap table for variables not referenced outside of RDF_MATCH

VLDB 2005 Varying Projection List

VLDB 2005 Large-Scale RDF Data UniProt – 10M, 20M, 40M, 80M triples 6 example queries given with UniProt Number of matches remain constant as dataset size changes (ROWNUM)

VLDB 2005 UniProt Sample Queries DescriptionPatternProjectionResult limit Q1: Display the ranges of transmembrane regions 6 triples 5 vars 3 vars15000 rows Q2: List proteins with publications by authors with matching names 5 triples 5 vars 1 LIKE pred. 3 vars10 rows Q3: Count the number of times a publication by a specific author is cited 3 triples 2 vars 0 vars32 rows Q4: List resources that are related to proteins annotated with a specific keyword 3 triples 2 vars 1 var3000 rows Q5: List genes associated with human diseases 7 triples 5 vars 3 vars750 rows Q6: List recently modified entries 2 triples 2 vars 1 range pred. 2 vars8000 rows

VLDB 2005 Query Response Times RDF_MATCH Performance Scalability Q1Q2Q3Q4Q5Q6 10 M Triples 0.86< M Triples 0.95< M Triples 0.96< M Triples 1.03< Maximum 

VLDB 2005 Conclusions

VLDB 2005 Conclusions and Future Work SQL-based RDF querying scheme RDF_MATCH table function Supports graph-pattern based query on RDF data with RDFS and user-defined rules Efficient Execution Table Function Rewrite Materialized Join Views: Generic and Subject-Property Rule Indexes Future work OPTIONAL support – outer-join Provenance support