Stateful Services and Identified Usage: Fallout from AstroGrid’s Architecture Guy Rixon Institute of Astronomy and AstroGrid
Is AstroGrid a Compute Grid? (e.g. Globus: Data-sets too large to be easily portable. Programmes not easily portable. Archive data not stored on desktop
Is AstroGrid a Distributed File System? Data-sets still too large to be portable. Some data-sets stored in DBMS, not in files. How does it find data-sets? (Needs an index.) Nice to be able to abstract away the storage location. E.g. “StarGrid” at RAL. C.f. Storage Resource Broker (
Is AstroGrid a list of web sites? Astronomers are not passive consumers of pre-defined reports. Data presented as web pages can’t easily be combined. Results presented other than as reports can be tricky to handle: file format not known; best app. not available; context lost (bad metadata). I.e. simply index existing sites and web apps.
Is AstroGrid a web(-service) portal? Pro: portal abstracts, registers services; portal can translate data to std. form. Con: portal is a bottleneck for data: need to send data separately from metadata; monolithic portal needs central management; processing at portal may not scale. (E.g. VizieR; NED; AstroGrid 2001 prototype: )
“Move the results, not the data”
Detachable workflow Service(s) coordinated as workflow. Client/user can detach from workflow. Can reattach later, pos. from different location. Can receive notification from workflow. Can steer workflow at pause points.
MySpace Workflows not entirely pre- planned: –feed one flow into another; –ad-hoc workflow; –re-run some parts of workflow. Cache results in “MySpace”. Clients can get results from MySpace. Need a dictionary to unify MySpace servers.
Stateful services Detachable workflows and data caches imply stateful services… …where state is determined by client… …and is inherent in the service semantics, not just the back-end storage/process.
Identified usage; access restrictions Stateful service imply identified usage –Whose state applies to a transaction? Identity on public Internet implies authentication. (C.f. intranet, VPN.) Data caches imply private data, even if original archives are public. Private data imply authorization.
Technology shopping Globus GRAM SRB Corba Jini GridFTP MDS Mocha Spitfire.NET servlets EJBs Web services using SOAP over HTTP for basic structure. WSDL (but probably not UDDI) for registry. XML for all metadata. Open Grid Services Architecture (OGSA) for statefulness. ( Grid Security Infrastructure (GSI) for authentication. ( GridFTP for bulk data-transfer.
Managing state in services Two alternatives: –hide statefulness behind service façade; –expose statefulness in service interface and system structure. Need identification/authentication at each contact with service. Need to tidy up abandoned state: e.g. leased storage must be recycled eventually. Long-lived state must persist across changes in service implementation.
OGSA (1) Exposes state in service interfaces: –Implements gang-of-four patterns as web services: Abstract Factory Observer State Template Method –Factory services create dynamic “service instances” to hold state for a particular job. –Instances are recorded in service registries. –Two kinds of identifier for instances: Grid Service Handle (GSH): abstract; does not state location. Grid Service Reference (GSR): concrete: states location.
OGSA (2) Identitified usage: built in. –OGSA assumes access control. –Initial implementations use GSI. Tidying up abandoned state: built in. –Instances self-destruct on timer if not “refreshed” by owner. State persistence across implementation change: feasible. –Implementation change invalidates GSR… –…but GSH still valid. –Client/agent can get new GSR from GSH.
OGSA (3) Implications: –Even simple workflows have to use factory, registry. –Need OGSA-compliant W/S hosting. –How to allow anonymous usage for trivial cases? –Too hard for user-programmers?
GSI GSI comes from Globus ( Authentication by Public Key Infrastructure. Identity carried in X.509 certificates. Non-standard usage of X.509 to allow “delegation by impersonation” (set “sign-certificate” flag but not “is CA” flag). Works with OGSA, GridFTP, UK e-Science CA. Medium-term, may be replaced by industry standards. AstroGrid uses it for short-term gain.
Authorization (1) Need to check authorization on access to controlled data/metadata/resources. Authority derives from role, not from individual identity. Role derives from position in VOrg(s). There are multiple overlapping VOrgs in our VObs. Collaboration must be supported: –Informal grouping inside VOrg. –Affects authorization. –Some authority controlled by rank-and-file end-users.
Authorization (2) Roles in Grid span services, data-centres. –Roles user groups. –Manage groups centrally. –Get existing HR people to do it? Role knowledge is persistent state –Very valuable! –Must survive implementation changes. Authentication systems may change –Therefore authorization should be orthogonal to authentication.
Authorization (3): CAS and CoPS CoPS = Community Privilege Service CoPS
Summary AstroGrid will be a data-grid. SOAP web services + non-SOAP data- transfer services. Exploit caching and unattended workflows. Needs stateful services and identified usage. Expect to use OGSA and associated patterns/technology to do this.