WP3: Provenance and Access Control Irini Fundulaki Giorgos Flouris Institute of Computer Science-FORTH 1st year review Luxembourg, December 2011
Task 3.1 Provenance Management Task 3.2 Privacy, DRM and Access Control Task 3.3 Trust management D 3.4 Trust management and inference system FORTH 4248 D 3.2 Provenance management and propagation through SPARQL query and update languages D 3.2 Provenance management and propagation through SPARQL query and update languages D 3.1 Access control specification language, reasoning and enforcement mechanisms FORTH EPFL WP3: Work Plan View D 3.3 Access control system and privacy-aware language D 3.3 Access control system and privacy-aware language
Research Topics, Tasks and Partners Objective: manage annotations of different forms and semantics over data, related to data access Research Topics: Provenance, Access Control, Privacy, Digital Rights Management (DRM), Trust Management Partners: FORTH, EPFL, KIT
Provenance Wikipedia: “… the origin or source of something or the history of the ownership or location of an object” W3C Incubator Group: “… is a record that describes entities and processes involved in producing and delivering or otherwise influencing that resource. […] Provenance assertions are a form of contextual metadata and can themselves become important records with their own provenance.”
Provenance W3C Incubator Group: “With the arrival of massive amounts of Semantic Web Data […], provenance becomes an important factor in developing new Semantic Web applications.” Applications Data Trustworthiness, Reputation and Reliability Information Quality Data Integration and Exchange Reproducibility Argumentation (Decision Justification) Access Control Accountability Reasoning
Types of Provenance Coarse grained provenance used to reproduce a digital object or repeat an experiment (complex programs) P´P´ I O P I P O: coarse grained (workflow or dataflow provenance) I´I´ O´O´ I’ P’ O’: fine grained (data provenance) Fine grained provenance refers to the transport of annotations between input and output data (query languages)
Workflow Provenance: Sensor Scenario S1 S2 Readings Sea Temperature & Wind Readings Sea Temperature & Wind Complex Computation to predict the height of waves Provenance: Complex Program executed on Input Data
Data Provenance: Sensor Scenario Provenance: annotations of the input tuples that contributed to the query results R2 sensor database SensorLatitudeAnnot. S1 S2 23° 26’ 21”N t3 t4 R1 SensorReadgs Annot. S1 S2 8B 2B t1 Time 00:19 01:50 t2 sensor readings DB Server R1R2 SensorLatitudeAnnot. S1 S2 23° 26’ 21”N {t1,t3} {t2,t4} Time 00:19 01:50 Readgs 8B 2B
Data Provenance Models Annotation Models: provenance computation is coupled with a particular application and a particular assignment of the provenance of source data When the annotation of the input tuple changes, we must re-execute the query to obtain the annotation of the result tuples R2 SensorLatitudeAnnot. S1 S2 23° 26’ 21”N 1 1 SensorReadgs Annot. S1 S2 8B 2B 1 Time 00:19 01:50 0 The annotation of a join tuple is computed using operator x 0 x 0 = 0, 1 x 0 = 0, 1 x 1 = 1 R1R2 SensorLatitudeAnnot. S1 S2 23° 26’ 21”N 1 Time 00:19 01:50 Readgs 8B 2B 0 R1
Data Provenance Models Abstract Models: provenance annotations (referred to as tokens) and operators are abstract. When the annotation of the input tuple changes, the annotation of the result tuple is re-computed by evaluating the annotation expression only R1 R2 The annotation of a join tuple is modeled by the “x” operator R1R2 SensorLatitudeAnnot. S1 S2 23° 26’ 21”N T3T3 T4T4 SensorReadgs Annot. S1 S2 8B 2B T1T1 Time 00:19 01:50 T2T2 SensorLatitudeAnnot. S1 S2 23° 26’ 21”N T 1 x T 3 T 2 x T 4 Time 00:19 01:50 Readgs 8B 2B
Data Provenance Models Abstract Models:Abstract tokens and operators are assigned concrete values, only when the concrete value of an annotation must be computed R1 R2 R1R2 SensorLatitudeAnnot. S1 S2 23° 26’ 21”N T3T3 T4T4 SensorReadgs Annot. S1 S2 8B 2B T1T1 Time 00:19 01:50 T2T2 SensorLatitudeAnnot. S1 S2 23° 26’ 21”N T 1 x T 3 T 2 x T 4 Time 00:19 01:50 Readgs 8B 2B Data Quality Application: abstract tokens T1, T2, T3, T4 take values 1 and 0 abstract operator “x” is replaced by logical AND
Abstract Data Provenance Models Benefits: – in the presence of provenance updates in the input, we need to evaluate the value of the provenance of the affected tuples only – different applications can assign different concrete values to abstract tokens and operators, for the same data Challenges: Trade-off between provenance storage over computation efficiency – storage of large provenance expressions – efficient computation of provenance for dynamic data
Data Provenance RDFS reasoning – Given a set of RDF triples whose explicit provenance is known, and RDFS reasoning rules what is the provenance of the implicit RDF triples? SPARQL – Given a set of RDF triples whose explicit provenance is known, and a SPARQL query, what is the provenance of the query result?
RDFS Reasoning (A 1, sc, A 3 ) (A 1, sc, A 2 )(A 2, sc, A 3 ) (&r, sc, A 2 ) (&r, type, A 1 )(A 1, sc, A 2 ) C3C3 C1C1 Sensing Device Device Sensor System SSN Ontology C2C2 &s1 C4C4 Sensing Device Device Sensor System SSN Ontology C2C2 C3C3 C1C1 &s1 C4C4 type: sc (subclassOf): Given a set of RDF triples (RDF Graph) whose explicit provenance is known, and RDFS entailment rules what is the provenance of the implicit RDF triples? Given a set of RDF triples (RDF Graph) whose explicit provenance is known, and RDFS entailment rules what is the provenance of the implicit RDF triples? ? ? ? ?
RDFS Reasoning colors to capture the provenance of explicit and implicit data and schema RDF triples quadruples to represent provenance information Provenance model: commutative semi-group structure (C, +) – C: set of colors, – binary operation “+” to compose colors of the input triples
RDFS Reasoning Pediaditis P., Flouris G., Fundulaki I., Christophides V. On Explicit Provenance Management in RDF/S Graphs. In Theory and Practice of Provenance (TaPP-2009) Flouris G., Fundulaki I., Pediaditis P., Theoharis Y., Christophides V. Coloring RDF Triples to capture Provenance. In ISWC 2009.
Provenance for SPARQL We showed that existing provenance models for positive relational algebra can capture the provenance of SPARQL (without OPTIONAL) We follow the approach by Karvounarakis et. al. in Provenance Semirings, PODS 2007 to develop a model for full SPARQL – records the input tuples and the operators used to compute the query results Given a set of RDF triples (RDF Graph) whose explicit provenance is known, and a SPARQL query what is the provenance of the result? Given a set of RDF triples (RDF Graph) whose explicit provenance is known, and a SPARQL query what is the provenance of the result?
Provenance Model for SPARQL + K: set of provenance tokens : operator for SPARQL join : operator for SPARQL union subjectpredicateobject S1 type Sensor S1 Readgs &r1 S1 Latitude 23° 26’ 21”N S2 type Sensor S2 Readgs &r2 S2 Latitude 23° 26’ 21”N &r1 value 8B 00:19 time &r1 &r2 value 2B 01:50 time &r2 prov t1 t5 t2 t3 t6 t4 t7 t8 t9 t10 select ?s, ?l where { ?s type Sensor. ?s latitude ?l } SPARQL Query: return the sensor and its latitude
Provenance Model for SPARQL + ?s type Sensor. ?s Latitude ?l Q = The evaluation of a triple pattern over T is a set of mappings (?variable, ?value) ?s S1 S2 11 ?s type Sensor 11 t1 t3 22 22 ?s latitude ?l ?s S1 S2 ?l 23° 26’ 21”N 23° 26’ 23”N t2 t4 33 44 subjectpredicateobject S1 type Sensor S1 Readgs &r1 S1 Latitude 23° 26’ 21”N S2 type Sensor S2 Readgs &r2 S2 Latitude 23° 26’ 21”N &r1 value 8B 00:19 time &r1 &r2 value 2B 01:50 time &r2 prov t1 t5 t2 t3 t6 t4 t7 t8 t9 t10
Provenance Model for SPARQL + ?s type Sensor. ?s Latitude ?l Q = ?s S1 S2 11 ?s type Sensor 11 t1 t3 22 22 ?s latitude ?l ?s S1 S2 ?l 23° 26’ 21”N 23° 26’ 23”N t2 t4 33 44 The result of a join between two triple patterns contains all mappings that have the same value for their common variable(s) subjectpredicateobject S1 type Sensor S1 Readgs &r1 S1 Latitude 23° 26’ 21”N S2 type Sensor S2 Readgs &r2 S2 Latitude 23° 26’ 21”N &r1 value 8B 00:19 time &r1 &r2 value 2B 01:50 time &r2 prov t1 t5 t2 t3 t6 t4 t7 t8 t9 t10 33 ?s S1 S2 ?l 23° 26’ 21”N 23° 26’ 23”N t1 t3 t2 t4
Provenance for SPARQL Theoharis Y., Fundulaki I., Karvounarakis G., Christophides V. On Provenance of Queries on Linked Web Data. In IEEE Internet Computing:Provenance in Web Applications, 2011.
Access Control Refers to the ability to permit or deny the use of a particular resource by a particular entity Crucial for sensitive content since it ensures the selective exposure of information to different classes of users
RDF Access Control In general, an access control model specifies – the access annotations – conflict resolution policy to resolve ambiguous access annotations – default semantics used to annotate data that are not in the scope of any authorization Access Authorizations specify (by a query) the access annotations for data
Access Control Access Annotations can be – boolean values true/false (grant/deny access permission) – confidentiality levels low, medium, high Conflict Resolution Policy depends on the type of access annotations – boolean values: deny overrides grant access annotation – confidentiality levels high confidentiality overrides medium, medium overrides low Default Semantics depend on the type of access annotations
Fine-grained Access Control Framework for RDF Data We encode access annotations of RDF triples using quadruples We propose an abstract access control model defined by a set of abstract tokens and abstract operators to model – the computation of access annotations of RDF triples considering RDFS inference – the propagation of access annotations – conflicting and missing access annotations
Abstract Tokens L: set of abstract access control tokens L default access token – assigned to triples that have not an explicitly assigned access token
Abstract Operators Entailment Operator ⊙ to compute the access annotations of implied quadruples Propagation Operator to model the propagation of access annotations Conflict Resolution Operator to resolve ambiguous access annotations
Entailment Operator ⊙ binary operator to model the computation of the annotation of an implicit RDF quadruple for the subclass, subproperty and type hierarchies in an RDF graph – Properties: Associativity: Commutativity (A 1, sc, A 2, l 1 )(A 2, sc, A 3, l 2 ) (A 1, sc, A 3, l 1 ⊙ l 2 ) ⊙ l2l2 l1l1 l4l4 ⊙ () ⊙ l2l2 l1l1 l4l4 ⊙ () = l1l1 l4l4 ⊙ = l1l1 l4l4 ⊙ The order of the application of inference rules is not important
Entailment Operator ⊙ (A 1, sc, A 3,l1 ⊙ l2 ) (A 1, sc, A 2,l1 ) (A 2, sc, A 3,l2 ) (&r, type, A 2,l1 ⊙ l2 ) (&r, type, A 2,l1 ) (A 2, sc, A 3,l2 ) Sensing Device Device Sensor System l2l2 l3l3 l1l1 &s1 l4l4 rdfs:Class l0l0 Sensing Device Device Sensor System l2l2 l3l3 l1l1 &s1 l4l4 type: sc (subclassOf): l0l0 rdfs:Class l1l1 l4l4 ⊙ l1l1 ⊙ l2l2 ⊙ l2l2 l1l1 l4l4 ⊙ ()
Propagation Operator unary operator to model propagation of access annotations along the subclass/subproperty and type hierarchies in an RDF Graph – a class inherits the annotation of its superclass, an instance of a class inherits the annotation of its class, etc. – Properties: Idempotence: (A 1, type, class, l 1 )(&r 1, type, A 1, ( l 1 )) (&r 1, type, A 1, l 2 ) l0l0 l0l0 ( ( )) = ( ) We do not care how many times an annotation is propagated
Propagation Operator Sensing Device Device Sensor System l2l2 l3l3 l1l1 &s1 l4l4 type: sc (subclassOf): l0l0 Sensing Device Device Sensor System l2l2 l3l3 l1l1 &s1 l4l4 rdfs:Class l0l0 l0l0 ⊙ l2l2 l1l1 l4l4 ⊙ () (&r, type, A 1,l1 )(A 1, type, rdfs:Class,l2 ) ((&r, type, A 1, ) l2
Conflict Resolution Operator binary operator to resolve ambiguous access labels – Properties: Associativity: Commutativity: Idempotence: (A 1, sc, A 2, L 1 )(A 1, sc, A 2, L 2 )(A 1, sc, A 2, L 1 L 2 ) l2l2 l0l0 () l1l1 = l0l0 ( l2l2 l1l1 ) l0l0 l1l1 = l0l0 l1l1 l1l1 = l1l1 l1l1
Computing Abstract Access Control Annotations assign access annotations to triples of the RDF graph to obtain quadruples apply RDFS inference rules on quadruples to obtain the implicit annotated quadruples apply propagation rules on quadruples to compute their propagated annotations apply the conflict resolution operator to resolve ambiguities
Computing Abstract Access Control Annotations (example) Sensing Device Device Sensor System l2l2 l3l3 l1l1 l0l0 rdfs:Class l5l5 Sensing Device Device Sensor System l2l2 l3l3 l1l1 l0l0 rdfs:Class l5l5 ⊙ l2l2 l1l1 ⊙ () l0l0 ⊙ l3l3 l5l5 () ⊙ l0l0 ⊙ l2l2 l1l1 ⊙ () l0l0 ⊙ l3l3 l5l5 () ⊙ l0l0 l0l0 (SensingDevice, type, rdfs:Class, )
Concrete Policies A concrete policy assigns concrete values to the abstract tokens and operators Example – Boolean values assigned to abstract tokens false: deny access true: grant access – Conjunction assigned to entailment operator an implied triple is accessible iff all its implying triples have been granted access – Disjunction assigned to Conflict Resolution operator grant overrides deny annotation – Identity assigned to propagation operator
Concrete Policy (example) (SensingDevice, type, rdfs:Class, ⊙ l2l2 l1l1 ⊙ (( ) l0l0 l2l2 l3l3 l1l1 l0l0 l5l5 false (F) true (T) Assignment of abstract tokens to values Assignment of abstract operators to concrete ones ⊙ ()() ()() propagation (¬) negation entailmentconjunction conflict resolutiondisjunction (SensingDevice, type, rdfs:Class, ((( (¬ F) ) (F F ) F ) T T) T) ) ⊙ l3l3 l5l5 ) ⊙ l0l0 (( ) l0l0 )( ) T
References Flouris G., Fundulaki I., Michou M., Papakonstantinou V., Antoniou G. Access Control for RDFS Graphs Using Abstract Models. Ongoing work.
RDFS Reasoning RDFS inference rules for triples RDFS inference rules for quadruples (SensingDevice, sc, System)(SensingDevice, sc, Device)(Device, sc, System) (&s1, type, SensingDevice) (SensingDevice, sc, Device,)(&s1, type, Device) C1C1 C2C2 + (SensingDevice, sc, System ) C2C2 (Device, sc, System, ) (SensingDevice, sc, Device, ) C1C1 C1C1 + (&s1, type, Device, ) (SensingDevice, sc, Device, ) (&s1, type, SensingDevice, ) C4C4 C1C1 C4C4
RDFS Reasoning Commutative semi-group structure (C, “+”) – Commutativity: – Associativity: – Idempotence: C1C1 C2C2 + C1C1 C2C2 + = C1C1 C2C2 C1C1 C2C2 ( + ) + = + ( + ) C3C3 C3C3 The order of the application of inference rules is not important C1C1 + C1C1 = C1C1 Multiple appearances of the same provenance do not affect the provenance of the resulting triple
Provenance for SPARQL Data is: fetched from heterogeneous sources, integrated, materialized in RDF and made available via SPARQL queries Range of computations: SPARQL queries, complex programs
Provenance Model for SPARQL + subjectpredicateobject S1 type Sensor S1 Readgs &r1 S1 Latitude 23° 26’ 21”N S2 type Sensor S2 Readgs &r2 S2 Latitude 23° 26’ 21”N &r1 value 8B 00:19 time &r1 &r2 value 2B 01:50 time &r2 prov p1 p2 p1 p3 p6 p3 p2 p6 p7 ?s?l provenance S1 S2 23° 26’ 21”N p1 2 x p2 3 p3 2 x p6 2 x p7 ?t 00:19 01:50 ?v 8B 2B select ?s, ?v, ?l, ?t where { ?s type Sensor. ?s Latitude ?l. ?s Readgs ?r. ?r value ?v. ?r time ?t }} SPARQL Query: return the sensor, its latitude, the time and value of its readings Encodes the join, union and projection operators using the abstract operator x. Coefficients capture the times a provenance token is used.
Provenance Model for SPARQL + subjectpredicateobject S1 type Sensor S1 Readgs &r1 S1 Latitude 23° 26’ 21”N S2 type Sensor S2 Readgs &r2 S2 Latitude 23° 26’ 23”N &r1 value 8B 00:19 time &r1 &r2 value 2B 01:50 time &r2 prov p1 p2 p1 p3 p6 p2 p6 p7 ?s type Sensor. ?s Latitude ?l ?s S1 S2 11 22 ?s S1 S2 ?l 23° 26’ 21”N 23° 26’ 23”N Set of triples T Q = 11 p1 p3 p1 p6 22 33 44 The result of a join between two triple patterns contains all mappings that have the same value for their common variable(s) 33 ?s S1 S2 ?l 23° 26’ 21”N 23° 26’ 23”N p1 66 55 p3 p1 p6
Provenance for SPARQL The SPARQL OPTIONAL operator is non-monotonic: the result contains all triples that match, and all those that do not. provenance of the resulting tuples should keep the provenance of the triples that matched and contributed to the result and those that did not
Access Control for RDF Fine-grained, repository independent, portable across platforms, access control framework for RDF data – Abstract Access Control model for RDF data focusing on read-only operations – Annotation-based Access Control Enforcement Mechanism
Computing Abstract Access Control Expressions (example) Sensing Device Device Sensor System l2l2 l3l3 l1l1 l0l0 rdfs:Class (A 1, sc, A 3,l1 ⊙ l2 ) (A 1, sc, A 2,l1 ) (A 2, sc, A 3,l2 ) (&r, type, A 2,l1 ⊙ l2 ) (&r, type, A 2,l1 ) (A 2, sc, A 3,l2 ) l5l5 Sensing Device Device Sensor System l2l2 l3l3 l1l1 l0l0 rdfs:Class l5l5 l1l1 ⊙ l2l2 ⊙ l3l3 l5l5 ⊙ l2l2 l1l1 ⊙ () l0l0 ⊙ l3l3 l5l5 () ⊙ l0l0 l0l0 (&r, type, A 1,l1 )(A 1, type, rdfs:Class,l2 ) (&r, type, A 1, ) l2 (SensingDevice, type, rdfs:Class, ⊙ l2l2 l1l1 ⊙ () l0l0 ⊙ l3l3 l5l5 () ⊙ l0l0 l0l0 )
⊙ Entailment Operator RDFS Inference: quadruple generating rules (A 1, sc, A 2, l 1 )(A 2, sc, A 3, l 2 ) (A 1, sc, A 3, ⊙ ( l 1, l 2 )) (A 1, sc, A 2, l 1 )(A 2, sc, A 3, l 2 ) (A 1, sc, A 3, ⊙ ( l 1, l 2 )) (A 3, sc, A 4, l 3 ) (A 1, sc, A 3, ⊙ ( l 1, l 2 )) (A 1, sc, A 4, ⊙ ( ⊙ ( l 1, l 2 ), l 3 ) (A 1, sc, A 4, ⊙ ( l 1, l 2, l 3 ) un-nesting (P 1, sp, P 2, l 1 )(P 2, sp, P 3, l 2 ) (P 1, sp, P 3, ⊙ ( l 1, l 2 )) (&r 1, type, A 1, l 1 )(A 1, sc, A 2, l 2 ) (&r 1, type, A 2, ⊙ ( l 1, l 2 )) (&r 1, P 1, &r 2, l 1 ) (P 1, sp, P 2, l 2 ) (&r 1, P 2, &r 2, ⊙ ( l 1, l 2 )) ⊙ ( ⊙ ( ⊙, ( … ))) = ⊙ ( … ) ( un-nesting )
Propagation Operator idempotence ( ( l 1 )) = ( l 1 ) ( idempotence ) (A 1, type, class, l 1 )(A 2, type, class, ( l 1 ))(A 2, sc, A 1, l 2 ) (A 2, type, class, l 3 ) (A 2, type, class, ( l 1 )) (A 3, sc, A2, l 4 ) Propagating labels: no new quadruples are created (A 3, type, class, ( ( l 1 ))) (A 2, type, class, l 3 ) (A 3, type, class, ( l 1 )) (A 1, type, class, l 1 )(A 2, type, class, ( l 1 ))(A 2, sc, A 1, l 2 ) (A 2, type, class, l 3 ) (P 1, type, prop, l 1 )(P 2, type, prop, ( l 1 ))(P 2, sp, P 1, l 2 ) (P 2, type, prop, l 3 ) (A 1, type, class, l 1 ) (&r 1, type, A 1, ( l 1 )) (&r 1, type, A 1, l 2 ) (P 1, type, prop, l 1 ) (&r 1, P 1, &r 2, ( l 1 )) (&r 1, P 1, &r 2, l 2 )