Security Issues in a SOA- based Provenance System Victor Tan, Paul Groth, Simon Miles, Sheng Jiang, Steve Munroe, Sofia Tsasakou and Luc Moreau PASOA/EU Provenance University of Southampton IPAW May 2006, Chicago
Provenance in a SOA context Interactions through message exchange between services (actors) Execution of a workflow: process Provenance of a piece of data is the process that led to that piece of data. P-assertion: specific piece of information documenting some step of a process p-assertions are stored in a provenance store, to be queried by actors in the system
Access control on process documentation Useful provenance information obtained from aggregation of p- assertions Granularity of access control: on groups of p-assertions Problem: combination of certain p- assertions may provide unintentional access to provenance information
Access control on process documentation PA2PA1PA3PA4PA5PA6 To answer provenance query X To answer provenance query Y To answer provenance query Z User A has access to answer provenance query X User A is given access to answer provenance query Y Unintentionally, User A is given access to answer provenance query Z
Access control on process documentation Expose access only at level of provenance queries Tools/services aggregate p-assertions and process them Potential provenance queriers only access tools/services Use cryptographic protocols Use appropriate algorithms to encrypt p- assertions Assign keys corresponding to different groups Information obtainable only if user has access to p-assertions as well as keys to decrypt groups of p-assertions.
Accountability for p-assertions P-assertion is a subjective view of actor Need to establish accountability for the creation of an assertion (non- repudiation) Ensure that p-assertions are not altered after being created (integrity) Directly implemented by signing p- assertions
Trust framework for actors and provenance stores Distributed systems: cannot ensure that all possible actors creating p- assertions are doing so correctly Establish trust model to reflect relationships: between actors creating p-assertions and actors using them between actors and provenance stores e.g. ratings system, e-Bay, mySpace
Information sensitivity in p-assertions Relevant with regards to legal requirements, e.g. patient records Information recorded in p- assertions may be obscured: One way anonymization Encryption with a shared key
Long term storage P-assertions may be archived If signed and/or encrypted, appropriate certificate/key archival facilities is also required May need to ensure algorithms remain updated
Relating access control for data and p- assertions P-assertions may describe or relate to data with existing access control restrictions (authorizations) How do we relate authorizations for data and p-assertions that is derived from that data ? No relation Allow actor creating p-assertion to specify its authorization Allow automated generation of authorizations from existing authorizations
Distributed provenance stores PS - Bandwidth - Access Control - Storage
Federated identity – approach 1 ActorSecurity token service Provenance store – Security domain 1 Provenance store – Security domain 2 Security token
Federated identity – approach 2 ActorSecurity token service Provenance store – Security domain 1 Provenance store – Security domain 2 Security token
Conclusion Many security issues: most analogous to standard access control issues, some possibly new Important to consider if provenance systems are to become industrial strength EU Provenance project – security features in GT4, WS-Security for authentication, proxy certificates for delegating access control, CAS for role-based authorization and federated identity