Ultrawrap: SPARQL Execution on Relational Data Juan F. Sequeda, Daniel P. Miranker University of Texas - Austin ISWC 2009 Seoul National University Internet.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Use Case: Populating Business Objects.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Implementing Reflective Access Control in SQL Lars E. Olson 1, Carl A. Gunter 1, William R. Cook 2, and Marianne Winslett 1 1 University of Illinois at.
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
Analyzing Minerva1 AUTORI: Antonello Ercoli Alessandro Pezzullo CORSO: Seminari di Ingegneria del SW DOCENTE: Prof. Giuseppe De Giacomo.
Michael Povolotsky CMSC491s/691s. What is Virtuoso? Virtuoso, known as Virtuoso Universal Server, is a multi-protocol RDBMS Includes an object-relational.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
--What is a Database--1 What is a database What is a Database.
Chapter 3 Database Management
Data Quality Class 5. Goals Project Data Quality Rules (Continued) Example Use of Data Quality Rules.
INTEGRITY Enforcing integrity in Oracle. Oracle Tables mrobbert owner granted access.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Presented by Gentre Dozier and Spencer Dille management.com/newsletters/database_metadata_unstructured_data_triple_store html.
Triple Stores.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Publishing data on the Web (with.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Managing & Integrating Enterprise Data with Semantic Technologies Susie Stephens Principal Product Manager, Oracle
Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz.
RDF Triple Stores Nipun Bhatia Department of Computer Science. Stanford University.
Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.
Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Abadi, Adam Marcus, Samuel R. Madden, Kate Hollenbach VLDB, 2007 Oct 15, 2014.
-By Mohamed Ershad Junaid UTD ID :
Hexastore: Sextuple Indexing for Semantic Web Data Management
Unifying Data and Domain Knowledge Using Virtual Views IBM T.J. Watson Research Center Lipyeow Lim, Haixun Wang, Min Wang, VLDB Summarized.
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
SQL Databases are a Moving Target Juan F. Sequeda – Syed Hamid Tirmizi –
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin.
Master Informatique 1 Semantic Technologies Part 11Direct Mapping Werner Nutt.
On the Semantics of R2RML and its Relationship with the Direct Mapping Juan F. Sequeda Research in Bioinformatics and Semantic Web (RiBS) Lab Department.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Dimitrios Skoutas Alkis Simitsis
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
C-Store: RDF Data Management Using Column Stores Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009.
RDF languages and storages part 1 - expressivness Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
Practical RDF Chapter 10. Querying RDF: RDF as Data Shelley Powers, O’Reilly SNU IDB Lab. Hyewon Lim.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
RDF-3X: a RISC-style Engine for RDF Presented by Thomas Neumann, Gerhard Weikum Max-Planck-Institut fur Informatik Saarbrucken, Germany Session 19: System.
Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Adam Samuel R. Kate Abadi Marcus Madden MIT Daniel Hurwitz Technion:
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Triple Stores. What is a triple store? A specialized database for RDF triples Can ingest RDF in a variety of formats Supports a query language – SPARQL.
RDF and Relational Databases
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
An Effective SPARQL Support over Relational Database Jing Lu, Feng Cao, Li Ma, Yong Yu, Yue Pan SWDB-ODBIS 2007 SNU IDB Lab. Hyewon Lim July 30 th, 2009.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Distribution of Marks For Second Semester Internal Sessional Evaluation External Evaluation Assignment /Project QuizzesClass Attendance Mid-Term Test Total.
Introduction to the Semantic Web Jeff Heflin Lehigh University.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
© 2009 OpenLink Software, All rights reserved. Mapping Relational Databases to RDF with OpenLink Virtuoso Orri Erling - Program Manager, Virtuoso.
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
Dmitry Mouromtsev, Aleksei Romanov, Dmitry Volchek and Fedor Kozlov Laboratory ITMO University, St. Petersburg, Russia “Metadata Extraction from.
1 RDF Storage and Retrieval Systems Jan Pettersen Nytun, UiA.
Triple Stores.
RDF and RDB 1 Some slides adapted from a presentation by Ivan Herman at the Semantic Technology & Business Conference, 2012.
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Efficient SPARQL-to-SQL translation using R2RML Mappings
On Directly Mapping Relational Databases to RDF and OWL
Triple Stores.
Query Optimization.
Triple Stores.
Creating a Virtual Knowledge Base for Financial Risk and Reporting
Presentation transcript:

Ultrawrap: SPARQL Execution on Relational Data Juan F. Sequeda, Daniel P. Miranker University of Texas - Austin ISWC 2009 Seoul National University Internet Database Lab. Kyung-Bin Lim

2/34 Ultrawrap  Automatically create SPARQL endpoint for legacy relational databases  Real-time consistency between the relational and RDF data  Making maximal use of existing SQL infrastructure  Research question: Do existing commercial SQL query engines already subsume all the algorithms needed to support effective SPARQL execution on relational data? Sequeda, Juan F., and Daniel P. Miranker. "Ultrawrap: Sparql execution on relational data." Web Semantics: Science, Services and Agents on the World Wide Web 22 (2013):

3/34 RDF Triples  Semantic breakdown – “Rick Hull wrote Foundations of Databases.”  Representation – Graph – Statement – XML format Foundations of DatabasesRick Hull hasAuthor Rick Hull

4/34 XYZ Fox, Joe 2001 ABC Orr, Tim 1985 French CDType MNO English 2004 BookType DVDType DEF 1985 GHI author title copyright type title language type copyright type title copyright title type title artist copyright language type ID1 ID2 ID4 ID3 ID6 ID5 Example RDF Graph

5/34 Taxonomy of RDF Data Management  Ultrawrap is a “wrapper” system RDF Data Management Relational Database to RDF (RDB2RDF) Triplestores Wrapper Systems Extract-Transform-Load (ETL) RDBMS-backed Triplestores Native Triplestores NoSQL Triplestores Sequeda, Juan F., and Daniel P. Miranker. "Ultrawrap: Sparql execution on relational data." Web Semantics: Science, Services and Agents on the World Wide Web 22 (2013):

6/34 RDB2RDF: Two Ways  Wrapper Systems – Presents a logical(“virtual”) RDF representation of relational data – Real-time consistency between the relational and RDF data  Extract-Transform-Load – Relational data is extracted from RDB, translated to RDF, and loaded into a triplestore – A batch processing Sequeda, Juan F., and Daniel P. Miranker. "Ultrawrap: Sparql execution on relational data." Web Semantics: Science, Services and Agents on the World Wide Web 22 (2013):

7/34 Ultrawrap: Overvew  RDB2RDF Mapping – Creates virtual RDF representation of relational data – SPARQL query is translated to SQL to query physical RDB SPARQL RDF SQL Results SQL Results SPARQL/RDF Results SPARQL/RDF Results Relational Database Relational Database RDB2RD F Mapping RDB2RD F Mapping

8/34 Ultrawrap: Process  Compile time – Create Putative Ontology – Create Virtual Triple Store  Use SQL View  Run Time – Naïve SPARQL to SQL translation – SQL Optimizer is the rewriter Sequeda, Juan F., and Daniel P. Miranker. "Ultrawrap: Sparql execution on relational data." Web Semantics: Science, Services and Agents on the World Wide Web 22 (2013):

9/34 Wrapper System: Ultrawrap  Ultrawrap Architecture

10/34 Step 1: Creating a Putative Ontology

11/34 Step 1: Creating a Putative Ontology  Putative Ontology – Putative: “commonly regarded as such” – Automatic syntactic transformation from a data source schema to an ontology  Ultrawrap creates PO automatically  Example: SPO ProducttypeClass Product#ptID type Datatype-Property Product#ptID domainProduct Product#label type Datatype-Property Product#label domainProduct ptIDlabel 1ACME Inc 2Foo Bars Product Sequeda, Juan F., and Daniel P. Miranker. "Ultrawrap: Sparql execution on relational data." Web Semantics: Science, Services and Agents on the World Wide Web 22 (2013):

12/34 Step 1: Creating a Putative Ontology  FOL rules transform SQL DDL to OWL – Full mapping in Datalog  Stratified and safe – Proof of total coverage of all key combinations

13/34 Step 2: Create Virtual Triple Store

14/34 Step 2: Create Virtual Triple Store  Represent all relational data as triple using a view – Promise of avoiding self joins (optimizer will do this) – Triple table: one view table with three columns – Actually, the view is (s, spk, p, o, opk)  Spk and opk are the index values (optimizer needs to know the index values)  Example: SPO Product#ptID=1 typeProduct Product#ptID=2 typeProduct Product#ptID=1 labelACME Inc Product#ptID=2 labelFoo Bars ptIDlabel 1ACME Inc 2Foo Bars Product Sequeda, Juan F., and Daniel P. Miranker. "Ultrawrap: Sparql execution on relational data." Web Semantics: Science, Services and Agents on the World Wide Web 22 (2013):

15/34 Step 2: Create Virtual Triple Store  Create SELECT statements that output triples  Use the PO as basis to create all the SELECT statements

16/34 Step 2: Create Virtual Triple Store  Triple View is a union of all the SELECT statements  BSBM generates ~80 select statements in order to represent all relational data as triples

17/34 Step 3: Naïve SPARQL to SQL Translation

18/34 Step 3: Naïve SPARQL to SQL Translation  Syntactic transformation from a SPARQL query to an equivalent SQL query on the Triple View SELECT ?label ?pnum1 WHERE{ ?x label ?label. ?x pnum1 ?pnum1. } SELECT t1.o AS label, t2.o AS pnum1 FROM tripleview_varchar t1, tripleview_int t2 WHERE t1.p = 'label' AND t2.p = 'pnum1' AND t1.s_id = t2.s_id Sequeda, Juan F., and Daniel P. Miranker. "Ultrawrap: Sparql execution on relational data." Web Semantics: Science, Services and Agents on the World Wide Web 22 (2013):

19/34 Step 4: SQL Query Optimizer is the Rewrite system

20/34 Step 4: SQL Query Optimizer is the Rewrite system  Rewrite translated SQL query into an optimal execution plan Sequeda, Juan F., and Daniel P. Miranker. "Ultrawrap: Sparql execution on relational data." Web Semantics: Science, Services and Agents on the World Wide Web 22 (2013):

21/34 Ultrawrap: SPARQL and SQL  Translating a SPARQL query to a semantically equivalent SQL query SELECT ?label ?pnum1 WHERE{ ?x label ?label. ?x pnum1 ?pnum1. }  SELECT label, pnum1 FROM product SQL on Tripleview SELECT t1.o AS label, t2.o AS pnum1 FROM tripleview_varchar t1, tripleview_int t2 WHERE t1.p = 'label' AND t2.p = 'pnum1' AND t1.s_id = t2.s_id What is the Query Plan? What is the Query Plan?

22/34 Ultrawrap: Architenture Sequeda, Juan F., and Daniel P. Miranker. "Ultrawrap: Sparql execution on relational data." Web Semantics: Science, Services and Agents on the World Wide Web 22 (2013):

23/34 Detection of Unsatisfiable Conditions  Determine that the query result will be empty if the existence of another answer would violate some integrity constraint in the database.  This would imply that the answer to the query is null and therefore the database does not need to be accessed

24/34 Detection of Unsatisfiable Conditions Tripleview_varchar t1 Product π Product+’id’ AS s, ‘label’ AS p, label AS o σ label ≠ NULL Producer π Producer+’id’ AS s, ‘title’ AS p, title AS o σ title ≠ NULL U Tripleview_int t2 Product π Product+’id’ AS s, ‘pnum1’ AS p, pnum1 AS o σ pnum1 ≠ NULL Product π Product+’id’ AS s, ‘pnum2’ AS p, pnum2 AS o σ pnum2 ≠ NULL U σ p = ‘label’ σ p = ‘pnum1’ CONTRADICTION

25/34 Self Join Elimination  If attributes from the same table are projected separately and then joined, then the join can be dropped SELECT label, pnum1 FROM product WHERE id = 1 SELECT p1.label, p2.pnum1 FROM product p1, product p2 WHERE p1.id = 1 and p1.id = p2.id SELECT p1.id FROM product p1, product p2 WHERE p1.pnum1 >100 and p2.pnum2 < 500 and p1.id = p2.id SELECT id FROM product WHERE pnum1 > 100 and pnum2 < 500 Self Join Elimination of Projection Self Join Elimination of Selection

26/34 Self Join Elimination Product π Product+’id’ AS s, ‘label’ AS p, label AS o σ label ≠ NULL Product π Product+’id’ AS s, ‘pnum1’ AS p, pnum1 AS o σ pnum1 ≠ NULL π t1.o AS label, t2.o AS pnum1 Join on the same table?  REDUNDANT

27/34 Self Join Elimination Product σ label ≠ NULL AND pnum1 ≠ NULL π label, pnum1

28/34 Evaluation  Use Benchmarks that stores data in relational databases, provides SPARQL queries and their semantically equivalent SQL queries  BSBM Million Triples – Imitates the query load of an e-commerce website  Barton – 45 million triples – Replicates search of bibliographic data (Used relational form of DBLP)

29/34 Evaluation Detection of Unsatisfiable Conditions Self Join Elimina tion MYSQL SQL Server ORACL E DB2 ✖ ✔ ✖ ✖ ✖ ✔ ✔✔

30/34 Ultrawrap Experiment

31/34 Ultrawrap Experiment

32/34 Augmented Ultrawrap Experiment  Implemented DoUC – Hash predicate to SQL query – Few LOC

33/34 SPARQL as Fast as SQL Berlin Benchmark on 100 Million Triples on Oracle 11g using Ultraw rap

34/34 Conclusion  Running of Microsoft SQL Server  Initial test on BSBM on 1 million triples – Execution time is close to running time of native SQL queries on RDB  Do not replicate relational database content  To date, wrapper systems have suffered problems in performance and scalability – Two optimizations may yield a query plan typical of a relational query plan, but starting from a logical plan representation of a SPARQL query  SPARQL queries with bound predicates on Ultrawrap execute at nearly the same speed as semantically equivalent benchmark-provided SQL queries Sequeda, Juan F., and Daniel P. Miranker. "Ultrawrap: Sparql execution on relational data." Web Semantics: Science, Services and Agents on the World Wide Web 22 (2013):