An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission
Overview Introduction – what is P3P and the Base Data Schema Why do we need a generic data schema for personal data (outside of P3P)? Other schemas available Modelling the schema in OWL –Model –Reasoning –Validation Further work
Intro P3P – Platform for Privacy Preferences W3C XML standard for expressing web site privacy policies (2001) Statements about data practices by data type Example of use of data schema
Requirements P3P data schema works OK within P3P 1.0 and 1.1 but many uses outside of P3P scope. EPAL (Enterprise Privacy Authorization Language) CC/PP PRIME –Obligations –Credential metadata –Data-handling
Requirements –Reasoning about credential types (e.g. Driver’s licence valid => Over 18) –Reasoning about data handling: e.g. purpose marketing, opt-out -> Risk of spam. –Obligation management – attach obligations to triples without revealing content. –Automatic form-filling – implies reasoning about data type equivalences between data store, data request and client preferences –Identity management and privacy enhancing access control rules – reasoning about pseudonyms and linkability related to classes of data revealed.
Requirements Reuseable data structures Type validation Efficient and extensible definition format Metadata on types Abstraction layer between privacy rules and enterprise data structures
Existing Schema Formats P3P1.0 Schema –Quirky syntax only understood by 3 people worldwide –Semantics understood by 2 people worldwide –Customization format understood by 0 people worldwide –But all other versions share the same semantics as they are required by the use cases (Reuseable, extensible, non- subclassed data structures) E.g.
Existing Schema Formats P3P1.1 Schema Uses XML syntax + informal semantics: E.g.
Existing Schema Formats RDFS Schema for P3P ( ) Models every single class in the class hierarchy Models classes of data as properties. –Difficult to describe instance data –Metadata for properties less natural can be seen as a property, but what is the Dynamic/Cookies property?
OWL Schema Models semantics of P3P 1.0 data schema Allows reference from RDF -> reasoning Allows type validation Simplifies syntax esp extensibility syntax BUT Modelling P3P semantics exactly => Modal logic which makes some reasoning nasty
Structure of Existing Schema Personname Bdate User Gender Thirdparty Cert Entity May Collect DataClass X User Name GivenPrefix Some Values From Only subClass A hierarchy of sorts but NOT subclass hierarchy Essentially semantic and syntactic validation scheme. Employer Address Thirdparty Name Prefix Given
How to model the existing structure Formal set theory definition Personname Bdate User Gender Thirdparty Cert For A (User) SVFO L (Cert,Personname…)
Shortcut
Data handling statements and reasoning use case Entity May Collect DataClassX User Name GivenPrefix subClass A service states that it may collect any values from the class User data A user agent rule says to block transfer to any services which might collect Given name data. Note the modal predicate May collect, which changes the expected logic
Data handling statements and reasoning use case Entity May Collect DataClassX User Name GivenPrefix subClass The agent needs to deduce: if a service may collect values from User data, it may also collect values from Name Applying the same rule again, if a service may collect values from Name, it may also collect values from GivenName -> If a service may collect values from User, it may collect them from GivenName For discussion of how this was achieved using Jena and OWL, see paper
Quickfix: Using shortcut classes Use of shortcut/convenience classes:
Advantage: More compact RDF Bob Instead of Bob (Important for adoption and acceptance by policy authors)
Advantage 2. Makes reasoning use case trivial Practical use cases only require matching concrete classes (described by the shortcut classes) with their ancestors in the hierarchy. By using shortcut classes in OWL, this is simply acheived since a standard OWL reasoner concludes: -> User.Name.Given rdfs:subClassOf User
Validation Structure provides some semantic validation through disjoint classes (e.g. City disjoint from Gender – so if something is typed as both city and gender data, it flags an error) OWL supports XSD datatyping for syntactic validation (e.g. string, numeric and allows customized types through Regex such as addresses)
Summary We need an ontological model which satisfies the requirements of the P3P 1.0 data schema We can use OWL for this OWL satisfies (with difficulty) reasoning requirements provides validation features not provided by P3P syntax
Further work Rethink structure without trying to be backward compatible? Multi language HR strings Support for numerical reasoning – e.g. not just Drivers’ Licence -> Majority age, but ?x has Drivers’ Licence -> [?a >= 18 age > 16. Other more complex reasoning –e.g. ?x collects User.Name.Prefix -> [?x collects User.CivilStatus <- User.Name.Gender = ‘female’]
That’s all folks ?????????????????? ?????????????????? ??????????????????