TAPP-09 23/02/2009Giorgos Flouris1 On Explicit Provenance Management in RDF/S Graphs Institute of Computer Science Foundation for Research and Technology – Hellas Heraklion, Greece Panagiotis Pediaditis Giorgos Flouris Irini Fundulaki Vassilis Christophides {pped, fgeo, fundul,
TAPP-09 23/02/2009Giorgos Flouris2 Provenance Management in RDF/S Provenance management problem Mostly addressed in the database context We are dealing with why provenance in RDF/S graphs — Why provenance: identifying the source data that had some influence on the existence of the target data Three main characteristics (peculiarities of RDF/S) Triple-based representation — Use quadruples to talk about triples’ provenance Inference — Assign provenance information to implicit data Coherence semantics (in updates) — Implicit data is a first-class citizen and should be retained during change, along with its provenance information
TAPP-09 23/02/2009Giorgos Flouris3 Characteristic #1 Triple-based Representation
TAPP-09 23/02/2009Giorgos Flouris4 RDF Graphs Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes RDF graph = set of RDF triples Define classes [Paper rdf:type rdfs:Class] [PaperTAPP rdf:type rdfs:Class] [Person rdf:type rdfs:Class] [Author rdf:type rdfs:Class] Define properties [writes rdf:type rdf:Property [writes rdfs:domain Author] [writes rdfs:range Paper] Instantiate (and define) individuals [Paper10 rdf:type PaperTAPP] [Giorgos rdf:type Author] [Giorgos writes Paper10] Define hierarchies [PaperTAPP rdfs:subClassOf Paper] [Author rdfs:subClassOf Person] And other stuff…
TAPP-09 23/02/2009Giorgos Flouris5 Provenance in RDF Graphs Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes Publications Graph (PUB) TAPP Graph (TAPP) PUB:[Paper rdf:type rdfs:Class] TAPP:[PaperTAPP rdf:type rdfs:Class] PUB:[Person rdf:type rdfs:Class] PUB:[Author rdf:type rdfs:Class] PUB:[writes rdf:type rdf:Property] PUB:[writes rdfs:domain Author] PUB:[writes rdfs:range Paper] TAPP:[Paper10 rdf:type PaperTAPP] TAPP:[Giorgos rdf:type Author] TAPP:[Giorgos writes Paper10] TAPP:[PaperTAPP rdfs:subClassOf Paper] PUB:[Author rdfs:subClassOf Person]
TAPP-09 23/02/2009Giorgos Flouris6 Named Graphs and Provenance Create two named graphs and assign an ID (URI) to each Publications graph (URI: PUB) TAPP graph (URI: TAPP) Each named graph corresponds to a different source Need some method to associate named graphs with triples Triples become quadruples Fourth element is the URI of the named graph (origin) Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes
TAPP-09 23/02/2009Giorgos Flouris7 Quadruples for Provenance Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes [Paper rdf:type rdfs:Class PUB] [PaperTAPP rdf:type rdfs:Class TAPP] [Person rdf:type rdfs:Class PUB] [Author rdf:type rdfs:Class PUB] [writes rdf:type rdf:Property PUB] [writes rdfs:domain Author PUB] [writes rdfs:range Paper PUB] [Paper10 rdf:type PaperTAPP TAPP] [Giorgos rdf:type Author TAPP] [Giorgos writes Paper10 TAPP] [PaperTAPP rdfs:subClassOf Paper TAPP] [Author rdfs:subClassOf Person PUB] All quadruples of the form [s p o PUB] originate from named graph PUB (Publications graph) All quadruples of the form [s p o TAPP] originate from named graph TAPP (TAPP graph)
TAPP-09 23/02/2009Giorgos Flouris8 Properties of Named Graphs The named graph URI can be used to refer to the named graph Can be used for assignment of metadata [TAPP hasAuthor JamesCheney G] Granularity of provenance A triple is the smallest bit of information The granularity of provenance achieved by named graphs is at the triple level Flexible — A named graph can contain 0,1, or many triples — A triple can belong to 0,1, or many named graphs Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes
TAPP-09 23/02/2009Giorgos Flouris9 Characteristic #2 Inference
TAPP-09 23/02/2009Giorgos Flouris10 RDF/S Graphs RDF Schema: add-on to RDF RDFS adds inference semantics Transitivity of subclass/subproperty Implicit instantiations Example [Giorgos rdf:type Author] [Author rdfs:subClassOf Person] Inference: [Giorgos rdf:type Person] Inferred knowledge is implicit Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes
TAPP-09 23/02/2009Giorgos Flouris11 Provenance and Inference Quadruples: [Giorgos rdf:type Author PUB] [Author rdfs:subClassOf Person TAPP] [Giorgos rdf:type Person ???] Needs: Shared ownership A more sophisticated, compound structure Keeping the connection with the components Composition operator (PT=PUB●TAPP) — [Giorgos rdf:type Person PT] — Ok, but see characteristic #3 Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes
TAPP-09 23/02/2009Giorgos Flouris12 Characteristic #3 Coherence Semantics (in Updates)
TAPP-09 23/02/2009Giorgos Flouris13 Foundational Semantics Foundational viewpoint (pyramid): Knowledge consists of the explicitly represented knowledge Only explicit knowledge can be changed Implicit knowledge is affected indirectly, through the changes in the explicit knowledge (so that the resulting “pyramid” is “stable”) Explicit knowledge is more important than implicit knowledge Basic Knowledge Supported Knowledge Explicit Knowledge Implicit Knowledge
TAPP-09 23/02/2009Giorgos Flouris14 Coherence Semantics Coherence viewpoint (raft): No discrimination between explicit and implicit knowledge Both explicit and implicit knowledge can be changed Changes should be made coherently in order for the resulting knowledge to make sense (so that the “raft” is “stable”) Explicit and implicit knowledge are of the same value { Knowledge (includes both implicit and explicit knowledge)
TAPP-09 23/02/2009Giorgos Flouris15 Deletes Under coherence semantics Inferred knowledge needs to be made explicit (when in danger of being lost) Explicit assignment of shared origin to triples Explicit shared origin assignment Cannot use any composition operator Must be a first-class construct (autonomous) Retain the connection with its constituents A need, but also a useful feature Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes
TAPP-09 23/02/2009Giorgos Flouris16 RDF/S Graphsets Graphsets are like named graphs Have IDs (URIs) Used in quadruples — Association of triples with graphsets [Giorgos rdf:type Person PT] — Can be referred to (metadata) [PT rdf:type Confidential G] Encode origin or shared origin [Giorgos rdf:type Person PT] URI association (via skolem function) — PT is the URI of {PUB, TAPP} — PUB is the URI of {PUB} A named graph is a graphset — PUB corresponds to {PUB} Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes PT
TAPP-09 23/02/2009Giorgos Flouris17 Querying With RDF/S Graphsets Standard queries (original RQL) Give me the Persons [Giorgos] Provenance queries (extended RQL) Give me the Persons per {PUB} [ ] Give me the Persons per {TAPP, PUB} [Giorgos] Give me the sources per which Author is a subclass of Person [{PUB}] Give me all the individual sources [{TAPP}, {PUB}] Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes
TAPP-09 23/02/2009Giorgos Flouris18 Validity and Redundancy Elimination Two invariants for RDF/S graphs Valid (per some validity rules) Redundant-free (space considerations) The invariants allow optimized execution of queries These invariants are imposed during change Improve query speed, but make updates more difficult Trade-off between having query overhead or update overhead
TAPP-09 23/02/2009Giorgos Flouris19 Updating With RDF/S Graphsets Updates supported through an extended version of RUL INSERT and DELETE Only for data (class and property instances) Implicit or explicit knowledge Take into account and update graphset (provenance) information Main considerations Apply the change (INSERT or DELETE) Respect invariants — Non-redundancy (INSERT) and validity (DELETE) Make minimal changes (under coherence viewpoint) — No unnecessary loss of information Take into account and preserve graphset (provenance) information — Applicable upon quadruples
TAPP-09 23/02/2009Giorgos Flouris20 Conclusion Objective: assign provenance information to RDF/S graphs to capture why provenance Triple-based representation — Turned triples into quadruples and used named graphs to record the origin Inference (per RDFS) — Composed named graphs Coherence semantics in updates (deletes) — Used graphsets for composed named graphs (cannot use an operator) Proposed query and update languages for graphsets Based on RQL, RUL Can be used to query/update provenance information Provided syntax and semantics, as well as an implementation — Demo at:
TAPP-09 23/02/2009Giorgos Flouris21
TAPP-09 23/02/2009Giorgos Flouris22 EXTRA SLIDES
TAPP-09 23/02/2009Giorgos Flouris23 RDF/S Graphset Properties Three types of triples in a graphset: Explicitly assigned triples Implicitly assigned triples (from the constituent named graphs) Implications of the above (per RDFS) Paper10 PaperTAPP Paper instance rdf:type subclass rdfs:subClassOf Giorgos Author Person writes PT
TAPP-09 23/02/2009Giorgos Flouris24 Inserts and Deletes: General Process INSERT Validity respected Must verify non-redundancy Process If INSERT is redundant ignore it Remove all redundant information (after insert) DELETE Must verify validity Non-redundancy respected Issues with inference and the coherence viewpoint Process If DELETE is void ignore it Make explicit all originally redundant information that will be lost otherwise Restore validity by removing property instances if necessary