Removing digital certificates from the end-user’s experience of grid environments Bruce Beckles University of Cambridge Computing Service
Security in contemporary grid environments (1) Extremely complex: –Security in distributed computing environments –Network security –Security of multi-user operating systems –Security of grid ‘middleware’ layer –Often users and resources geographically dispersed –Resource administrators often have no contact with resource users –Complex multi-layered trust relationships Such environments are relatively new, so unfamiliar to most resource administrators
Security in contemporary grid environments (2) Unfortunately, security in these environments is usually presented as principally a matter of authentication (sometimes in conjunction with authorisation considerations). Authentication usually addressed by making use of X.509 digital certificates X.509 certificates often presented as a “magic bullet” to the problem of security, particularly in the context of grid environments: –One of the leading architects of the computational grid technologies described them as “a world-class security solution”
X.509 digital certificates (1) …however, X.509 certificates have been heavily critiqued by a number of leading security experts: Gutmann, Schneier, etc. Usability considerations alone lead one to the conclusion they are unsatisfactory: –Extremely difficult to use, particularly as implemented in current grid environments –Security solutions which are difficult to use are inherently insecure because users will either inadvertently use them in an insecure way or deliberately subvert them in an attempt to “just get my work done without all this stuff getting in the way”
X.509 digital certificates (2) So, for instance, I and colleagues have found: –Users sharing certificates because “it’s too hard to get my own”, “it’s too hard to get my certificate authorised for that site, but my colleague managed to get his done”, “my certificate doesn’t work properly”, etc. –Users storing private keys in multiple locations –Users protecting private keys with no passphrase or with trivial passphrases –Users re-using certificates obtained for one specific purpose for another (“because it is too difficult to get another one”) –Certificates not being revoked when a user is no longer entitled to them, which highlights the following: –Confusion between certificates as proof of identity (“I am Bruce Beckles”) and proof of status or membership of an organisation (“I work for the University of Cambridge”).
Remove digital certificates from the end-user’s experience of grid environments as far as practical (whilst preserving a level of security consonant with the use of digital certificates): As far as possible, remove manual interactions with digital certificates from the system administrator’s domain and replace such interactions with automatic, scalable processes. Objectives of the solution presented here Job submitted and results returned or retrieved without additional manual authentication underlying grid User logs in using normal login method
Trust Relationships (1) Consider job submission in the ‘traditional’ job-based grid architecture (à la Globus Toolkit 2): User obtains/retrieves proxy certificate proxy certificate stored on submit host ‘user’ and remote resource mutually authenticate using user’s proxy certificate and remote resource’s server certificate Remote resource receives a request from a host elsewhere on the network (submit host) remote resource and submit host mutually authenticate using remote resource’s server certificate and a proxy certificate presented by the submit host
Trust Relationships (2) Thus we see that the remote resource implicitly assumes that the submit host is entitled to make use of the proxy certificate it presents – this is a trust relationship. And there is another implicit trust relationship here: suppose I am the administrator of a remote resource and I need to know whether it was the certificate owner who misused my machine or someone using their certificate (or proxy certificate): –At a minimum, I contact the administrator of the submit host and ask them for details of who was using their host (or of any scheduled jobs running on it, etc.) –I must therefore trust the administrator and integrity of the submit host or accept that I can only link misuse to a particular proxy certificate and not to an individual.
Trust Relationships (3) Why are we interested in these trust relationships? –By examining these trust relationships we can see that if a solution is proposed which requires remote resources to explicitly trust the submit host, we will be in no worse a position that we already are. …and it is worth noting that most traditional grid environments make no attempt to verify the identity or integrity of the submit host, only the identity of the user through their proxy certificate.
Principal trust relationship of the solution presented here “A remote resource trusts the submit host to be ‘well managed’, to the extent that when the submit host asserts that a job is being submitted by a particular user, the remote resource accepts that this is true (unless the submit host or the user has been compromised) and processes the job accordingly.”
Solution Details (1) There are two modes of operation: –Mode 1: Using certificates dynamically generated by the submit host(s) –Mode 2: Using certificates issued by a Certificate Authority (CA) It is Mode 1 that requires the explicit trust relationship described on the previous slide.
Mode 1 – Details (1) All machine-machine communication first involves mutual authentication using the machines’ server certificates (which may be self-generated or issued by a CA). User logs into submit node as normal, using normal authentication method provided for this purpose. When the user submits a job, the submit host generates a limited lifetime digital certificate, signed by itself, asserting that the user is whoever it believes the user to be based on the user’s login credentials. The submit machine then sends the job, along with the public certificate it has generated to the remote resource or some intermediate machine (e.g. for pre-staging executables, for transparent job redirection, for ‘peer-to-peer’ grid environments, etc.).
Mode 1 – Details (2) If the job is passed to intermediate machines before its final destination, at each stage the attached certificate may optionally be digitally countersigned by the intermediate machine. When the job reaches its destination, that remote resource validates the attached certificate and decides whether it trusts the submit machine (and optionally any intermediate machines); if so, it accepts the job. If not, it notifies its administrator, the administrator of any relevant intermediate machines, the submit host and the user who submitted the job. The remote resource may need to ‘unpack’ the digitally signed job before presenting it to the underlying grid environment.
Mode 2 – Details Very similar to Mode 1, with the following differences: –At job submission, the user presents the submit host with their certificate (or proxy certificate), or the submit host retrieves their certificate from a secure remote location (e.g. MyProxy). –The submit host (and any intermediate machine) digitally countersigns the user’s certificate before sending the job onward. It is presumed, but not mandated, that the server certificates of all the machines involved will have been issued by a CA.
Certificate Distribution and Management – PKIBoot PKIBoot is: –A “plug-and-play certificate mechanism”… –…which behaves like “DHCP for obtaining certificates” –Developed by Peter Gutmann –Implemented in his open source cryptographic toolkit, cryptlib (
Features of PKIBoot Allows the secure retrieval from a remote location of: –The user’s certificate (or a machine’s server certificate), including the certificate’s private key –The public signing certificate of a CA (for verifying certificates issued by that CA) –The public signing certificate of an arbitrary entity (for the purpose of validating that entity’s identity or for validating that entity’s digital signature) Can also be used to issue certificates.
Features of a ‘PKIBoot environment’ So, in a PKIBoot environment: –After initial setup, machines would be able to retrieve the public signing certificates of CAs (or other entities) as necessary, without manual intervention –If desired, machines could retrieve their own certificates (and private keys) using PKIBoot, thus automating the cryptographic setup of machines –If desired, PKIBoot could be used to automatically issue machines (or users) with their certificates (if one was happy with the security implications of this) –Users need never manually transport their certificates as they could always retrieve them from some remote location using PKIBoot
Use of PKIBoot in this solution (1) PKIBoot servers are deployed throughout the environment – depending on the size of the environment, one PKIBoot server, or one per geographical site may be sufficient. Machines communicate with their ‘nearest’ PKIBoot server to retrieve public signing certificates of CAs (or other entities) as necessary. If there is more than one PKIBoot server, these servers will periodically use the PKIBoot mechanism to communicate with each other to retrieve public signing certificates. Resource administrators merely have to configure their resources to use PKIBoot, and supply their ‘nearest’ PKIBoot server with the public signing certificate of their resource.
Use of PKIBoot in this solution (2) If desired, PKIBoot can be used to issue resources with their certificates (i.e. the PKIBoot server acts as a CA for resources). If users are using certificates issued by a CA, such certificates could be securely stored on the PKIBoot server (provided one is happy with the security implications of this). If desired, PKIBoot could be used to issue user certificates, thus making the use of digital certificates entirely transparent to the end-user (again, provided one is happy with the security implications).
Simple example implementation of solution Submit host PKIBoot server Execute host Intermediate machine: e.g. job ‘router’, resource broker Retrieve public signing certificates for verification Optionally retrieve user certificate or ask for user certificate to be issued or… Either send job directly to execute host, or send it via some intermediate machine(s) Submit host can (optionally) act as CA and issue user certificate itself Optionally countersign certificate attached to job when sending job onward
Progress to date As this is a security solution, and one whose principal aim is to increase the usability of its target environments, it is essential that security and usability are addressed from the earliest design phases through to completion. Thus the following design strategy is being pursued: –Consultation with HCI experts concerning usability –Assessment by security experts –Usability testing of current design, using low fidelity techniques such as paper prototyping –Iterate this process until a stable design is reached Only once the design is stable will the development phase commence. The current design is believed to be largely stable: in the current iteration, there are a few minor issues to be addressed before usability testing is undertaken.