On Provenance of Queries on Linked Web Data

Slides:



Advertisements
Similar presentations
1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.
Advertisements

WP3: Provenance and Access Control Irini Fundulaki Giorgos Flouris Institute of Computer Science-FORTH 1st year review Luxembourg, December 2011.
ANHAI DOAN ALON HALEVY ZACHARY IVES CHAPTER 14: DATA PROVENANCE PRINCIPLES OF DATA INTEGRATION.
Store RDF Triples In A Scalable Way Liu Long & Liu Chunqiu.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
A Review of Ontology Mapping, Merging, and Integration Presenter: Yihong Ding.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
1 Provenance Semirings T.J. Green, G. Karvounarakis, V. Tannen University of Pennsylvania Principles of Provenance (PrOPr) Philadelphia, PA June 26, 2007.
Semantic Web Query Processing with Relational Databases Artem Chebotko Department of Computer Science Wayne State University.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
TAPP-09 23/02/2009Giorgos Flouris1 On Explicit Provenance Management in RDF/S Graphs Institute of Computer Science Foundation for Research and Technology.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
SPARQL All slides are adapted from the W3C Recommendation SPARQL Query Language for RDF Web link:
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
Relational Algebra Instructor: Mohamed Eltabakh 1.
FORTH Research Activities PlanetData WP1-3 Meeting (Frankfurt, Nov10) Giorgos Flouris, Irini Fundulaki – FORTH.
SPARQL Semantic Web - Spring 2008 Computer Engineering Department Sharif University of Technology.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
1 Relational Algebra and Calculas Chapter 4, Part A.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
WP3: Provenance and Access Policies Giorgos Flouris (FORTH) - Irini Fundulaki (CWI & FORTH) -
Advanced Relational Algebra & SQL (Part1 )
IST 210 The Relational Language Todd S. Bacastow January 2004.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
05/01/2016 SPARQL SPARQL Protocol and RDF Query Language S. Garlatti.
CC L A W EB DE D ATOS P RIMAVERA 2015 Lecture 7: SPARQL (1.0) Aidan Hogan
Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation.
WP3: Data Provenance and Access Control Irini Fundulaki, FORTH December 11-12, 2012, Luxembourg.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Relational Algebra COMP3211 Advanced Databases Nicholas Gibbins
Lecture 9: Query Complexity Tuesday, January 30, 2001.
IETF61 (November 2004) SIMPLE1 Data model and RPID Henning Schulzrinne Columbia University.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Provenance Interoperability and Reasoning Yannis Tzitzikas Assistant.
CSE202 Database Management Systems
Database Systems Chapter 6
Database Systems (資料庫系統)
CC La Web de Datos Primavera 2017 Lecture 7: SPARQL [i]
COMP3017 Advanced Databases
Module 2: Intro to Relational Model
RDF and RDB 1 Some slides adapted from a presentation by Ivan Herman at the Semantic Technology & Business Conference, 2012.
Relational Algebra Chapter 4 1.
Lifting Data Portals to the Web of Data
Relational Algebra Chapter 4, Part A
Basic SQL Lecture 6 Fall
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Relational Algebra.
Logics for Data and Knowledge Representation
Relational Algebra 1.
CC La Web de Datos Primavera 2016 Lecture 7: SPARQL (1.0)
Relational Algebra Chapter 4 1.
The Relational Algebra and Relational Calculus
Relational Algebra Chapter 4, Sections 4.1 – 4.2
G-CORE: A Core for Future Graph Query Languages
The Relational Algebra
Chen Li Information and Computer Science
Data Provenance.
Geo-Databases: lecture 4 Complex Queries in SQL
Syllabus Introduction Website Management Systems
CENG 351 File Structures and Data Managemnet
Henning Schulzrinne Columbia University
Presentation transcript:

On Provenance of Queries on Linked Web Data 1,2Yannis Theoharis, 2Irini Fundulaki, 3,2Grigoris Karvounarakis and 1,2Vassilis Christophides 1Institute of Computer Science, FORTH and 2Computer Science Department, University of Crete 3LogicBox, USA

What is “Linked Data” W3C Linking Open Data publish various open datasets as RDF on the Web set RDF typed links between data items from different data sources. 2

Motivation: Linked Data Processing Data is: fetched from heterogeneous sources integrated materialized in RDF made available via SPARQL Range of computations SPARQL queries Complex programs (logic or procedular) 3

Provenance Aware Applications Trust assessment trustworthiness Access control confidentiality level Data cleaning validity Curated databases source data origin All these applications need to represent and store the relation of the input with the output of data processes gain efficiency impossible without provenance 4

Data Provenance Models Annotation Models: annotation computation coupled with a particular application and a particular assignment of source data annotations R1 R2 R1 R2 X Y Annot. a b t c d Y Z Annot. b e X Y Z Annot. a b e t: trusted f: untrusted t f f t query recomputation! Abstract Provenance Models: abstract provenance tokens and operators are substituted by appropriate concrete tokens for a particular application and assignment R1 R2 R1 R2 X Y Annot. a b c1 c d c2 Y Z Annot. b e c3 X Y Z Annot. a b e c1 * c3 t t f t Λ f t Λ t 5

This Talk “Can previous work on abstract provenance models be leveraged for SPARQL” ? NO: due to the OPTIONAL (similar to the SQL left outer join) operator YES: for the positive (without OPTIONAL) fragment of SPARQL We present our ongoing work on a SPARQL abstract provenance model. Challenge: to capture the form of negation that OPTIONAL introduces 6

Outline SPARQL algebra Abstract Provenance Models for Positive SPARQL Limitations of Previous Models Towards a SPARQL Provenance Model 7

Outline SPARQL algebra Abstract Provenance Models for Positive SPARQL Limitations of Previous Models Towards a SPARQL Provenance Model 8

SPARQL (1/2) SPARQL: W3C Recommendation language to Query RDF data. triple patterns (?x, ?y, e) mappings {(?x,d),(?y,b)} {(?x,f),(?y,g)} Compose Filter { … } Select Construct/ Describe (?x, ?y, e) constant variables Triple Set S P O a b c d e f g Ω1 ?x ?y d b f g μ1 μ2 9

SPARQL (2/2) SPARQL algebra defines 5 operators on mapping bags Unary ops: π (projection), σ (selection, also called filtering) Binary ops: U (union) (join) (optional) Positive SPARQL (SPARQL+) μ and μ’ are compatible (μ ~ μ’), if they agree in their common variables μ1 ~ μ4 μ3 ~ μ4 μ2 ~ μ4 Ω1 Ω1 Ω Ω2 Ω2 Ω1 Ω2 Ω1 Ω2 σ?x=a (Ω) π?x (Ω) ?x ?y a b c d e - ?x ?y a b c d Ω1 ?x ?y a b c d e ?y ?z b f ?y ?z b f Ω2 Ω1 U Ω2 ?x ?y ?z a b f e ?x ?y ?z a b f c d - ?x ?y a b c μ1 μ2 μ3 μ1 μ2 μ4 μ3 μ4 = μ1 U μ3 μ2 μ5 = μ1 U μ4 μ6 = μ3 U μ4 ?x ?y a b ?x ?z c d Ω1 \ Ω2 Ω1 Ω2 ?x ?y ?z a b - c d ?x a d μ1 μ2 μ1 μ2 card(μ1) = 2 card(μ2) = 1 μ1 μ2 ?z is unbound in μ1 10

Outline SPARQL algebra Abstract Provenance Models for Positive SPARQL Limitations of Previous Models Towards a SPARQL Provenance Model 11

Abstract Provenance Models triple patterns (?x, ?y, e) mappings {(?x,d),(?y,b)} {(?x,f),(?y,g)} Compose Filter { … } Select Provenance Most informative How Trio Why Lineage Abstract provenance models encode the query operators in different level of detail Expressiveness vs efficiency (annotation storage and computation time) Less informative 12

Abstract Provenance Models for SPARQL+ Previous models are defined for positive relational algebra Positive relational operators are monotonic The addition (removal) of a tuple can only result in additional (removed) tuples in the output This also holds for SPARQL+ (projection, union, join) Previous models suffice for SPARQL+ 13

Outline SPARQL algebra Abstract Provenance Models for Positive SPARQL Limitations of Previous Models Towards a SPARQL Provenance Model 14

Boolean trust assessment (SPARQL) Trusted: μ1, μ2, μ3, μ4 Trusted: μ1, μ2, μ4 Ω1 Ω2 Ω1 \ Ω2 ?x ?y d b f g ?y ?z b c e h Ω1 \ Ω2 μ1 μ2 μ3μ4 ?x ?y ?z d b - f g ?x ?y f g μ1μ2 μ2 boolean trust semantics set semantics on trusted mappings Ω1 Ω2 Ω1 Ω2 ?x ?y ?z d b c f g - ?x ?y ?z d b - f g μ5μ2 μ1 μ2 and \ are not monotonic: μ3 becomes untrusted μ5 becomes untrusted and μ1 becomes trusted in Ω1 Ω2 15

Perm μ1 μ2 μ3μ4 Ω1 Ω1 \ Ω2 Ω1 Ω2 Ω2 Intuitively, (f, g) is in Ω1 \ Ω2 Ω1 Ω2 ?x ?y d b f g ?x ?y ?y2 ?z2 f g b c e h ?x ?y ?z ?x1 ?y1 ?y2 ?z2 d b c f g - e h μ1 μ2 Ω2 ?y ?z b c e h Intuitively, (f, g) is in Ω1 \ Ω2 because it is not compatible with neither μ3 nor μ4 μ3μ4 (d, b, c) is in Ω1 \ Ω2 due to the join between μ1 and μ3 If μ3 becomes untrusted, Perm infers that (d, b, c) becomes untrusted, but cannot infer that (d, b, -) should become trusted 16

RDF Meta Knowledge & M-semirings Ω1 Ω1 \ Ω2 ?x ?y d b c1 f g c2 ?x ?y RDF MK M-semirings f g c2 Λ (c3Vc4) c2 0 = c2 μ1 μ2 μ2 t t t Ω2 Ω1 Ω2 ?y ?z b c c3 e h c4 μ3μ4 ?x ?y ?z RDF MK M-semirings d b c c1 Λ c3 c1 * c3 f g - c2 Λ (c3Vc4) c2 f t μ5μ2 f f t t Like Perm, RDF Meta Knowledge and M-semirings infer that μ5 is untrusted but can not infer that μ1: (d, b, -) is trusted. 17

Outline SPARQL algebra Abstract Provenance Models for Positive SPARQL Limitations of Previous Models Towards a SPARQL Provenance Model 18

A Third Operation for Compatibility (1/2) Take care about compatible mappings Only one between μ1, μ5 can appear in the result Keep provenance information for both of them ! Ω1 Ω1 Ω2 = (Ω1 Ω2) U (Ω1 \ Ω2) ?x ?y d b c1 f g c2 ?x ?y ?z How SPARQL Prov. d b c c1*c3 - No Info c1*A(μ1, μ3) f g c2 μ1 μ2 t μ5μ1μ2 (t Λ t) = t (t Λ f) = f t f ? Ω2 t ?y ?z b c c3 e h c4 μ3μ4 t f t A(μ1, μ3) = f, if μ1 ~ μ3 and c3 = t t, else 19

A Third Operation for Compatibility (2/2) Ω1 Ω2 = (Ω1 Ω2) U (Ω1 \ Ω2) ?x ?y ?z How SPARQL Prov. d b c c1*c3 - No Info c1*A(μ1, μ3) f g c2 μ5μ1μ2 A is a binary operator on mappings Determines whether the mapping exist in the result or not If yes, its provenance equals the positive provenance part, e.g. c1 for c1*A(μ1, μ3) In general, A(μ1, μ3) = 0, if μ1 ~ μ3 and c3 ≠ 0 1, else 0: the neutral element for + 1: the neutral element for * 20

SPARQL Provenance Operators Two types of operators on provenance tokens, i.e. + and * (for SPARQL+) on mappings, i.e. A (for and \) Good news: Every triple of the dataset is uniquely annotated. Why not to use annotations as mapping identifiers in A? Due to the projection operator… 21

Enrich Tokens with Schema Information A( (c1, S1), (c2, S2) ) = 0, if πS1 (μ1) ~ πS2 (μ2) and c2 ≠ 0 1, else A(c1, c2) = 0, if μ1 ~ μ2 and c2 ≠ 0 1, else Use tokens (c1, c2…) as mapping ids in A expressions But, μ1 ~ μ2 might hold, while π?y,?z (μ1) ~ π ?y,?z (μ2) Tokens don’t suffice, keep pairs token-schema Ω π?y,?z (Ω) ?x ?y ?z a b c d - ?x ?y ?z Prov. a b c (c1, {?x, ?y, ?z}) d - (c2, {?x, ?y, ?z}) ?y ?z Prov. b c (c1, {?y, ?z}) - (c2, {?y, ?z}) μ1 μ2 22

Towards a SPARQL Provenance Model Define an algebra on token-schema pairs 3 operations 2 for SPARQL operators 1 for compatibility What if there is no projection (or projection is not allowed to be pushed down) ? annotations suffice (no need for schema information), still in need of the compatibility operator What if there is no Optional ? previous models suffice, e.g. How 23

Future Work SPARQL Provenance Model Extent model expressiveness to capture other computations on Linked Data Logic explanations Implementation 24

Questions ?