The OGSA Data Architecture Dave Berry Allen Luniewski OGSA F2F 19th January 2006
Current Scope Files and databases (& storage) Not streams, sessions, provenance, … Services and interfaces Storage, Access, Transfer, Metadata catalogues Replication, Caching, Federation Cross-cutting themes Security, Policies, … Scenarios Including Grid file system, Data staging, Database federation, Replication, … Part of the bigger OGSA picture E.g. Naming, Workflow, Transactions, Scheduling, …
Some of the WG members Dave Berry, NeSC - Author: Overview, Cache Allen Luniewski, IBM USA - Author: Replication, Federation Stephen Davey, NeSC - Author: Services document Mark Morgan, U. Virginia - Author: Access (ByteIO) Mario Antonioletti, EPCC - Author: Access (WS-DAI) Peter Kunszt, ex-CERN - Author: Storage Management Simon Laws, IBM UK - Author: Data Description Ann Chervenak, ISI - Author: Replication Susan Malaika, IBM USA Fred Maciel, Hitachi Neil Chue Hong, EPCC Chris Jordan, SDSC
Architecture document Overview Architectural Context Requirements on OGSA Security Data Description Data Transfer Data Access Storage Resource Management Cache Services Data Replication Data Federation Catalogues Appendices: Specifications referenced Glossary
Ch 3: Architectural Context Naming (WS-A, OGSA-Naming, RNS?) Management GAP! – Need info model for data services WS-Management, WSDM – how do we use these? Security (see Chapter 4) State, Lifetime & Notification OGSA Base Profile Resource Discovery Fuzzy boundary with Information Services design team Policies and agreements Site & VO management Reservation, Scheduling & Provisioning At some point we will need to integrate with EMS Transactions (WS-Coordination, WS-CAF) GAP! - Sessions
Things to name… activities metadata roles caches namespaces schemas catalogues naming schemes schema mappings content identifiers networks security contexts data bytes people security tokens data formats policies service level agreements data streams queries service types database tables query result row sets services databases references storage (space) file directories registries times file locations replicas transactions files repositories transformations identities resolvers transport protocols languages resource locations user defined entities locales resources vocabularies
Ch 4: Security Discusses requirements beyond simple AAA Legal requirements for security and privacy GAP! – Security policies GAP! – Attaching security policies to data in motion GAP? – Geographical location of requester and resource GAP? – Reason for access GAP? – Authorisation of sequences of access requests GAP? – Authorisation based on previous requests WGs: AuthZ, OGSA
Ch 5: Data Description Format description Resource description Static formats (e.g. XML) Dynamic formats (e.g. an XML schema) GAP! - URIs to name data formats Resource description “Data resource” = source/sink of data Set of properties May be managed by a service May be stored in a catalogue GAP? – Set of generic properties for data resources QoS policies Service description WSDL List of data resources? Management & control information? GAP! – Information model WGs: DFDL? CIM?
Ch 6: GAP! - Data Transfer Sources & sinks that are data-type agnostic Must allow low-level optimisations, e.g. at the storage level Must allow different protocols URIs to name protocols Separation of access protocols and transfer protocols? Basic level Single point -> single point, bytes only Policies: performance, scheduling, robustness, … Higher levels (for future work) Broadcast Encryption Format translation WGs: DMIS (BOF)?
Ch 7: Data Access Selecting and consuming data WS-DAI (WS-DAIR, WS-DAIX, WS-DAIRDF) ByteIO (RandomByteIO, StreamableByteIO) Direct access (Request/Response) Indirect access (Request -> new EPR) WS-DAI only GAP! - Third-party delivery Integration with transfer GAP! - URIs for query languages and access mechanisms WS-Enumeration? WGs: DAIS, ByteIO
Ch 8: Storage Resource Management Storage Properties Space types (file space, raw, streaming) Data retention (volatile, permanent, durable) Quotas Interaction of storage and transfer Policies: performance, availability, resilience, … Use access & transfer interfaces to use the storage allocated GAP! – interfaces for the different space types WGs: GSM, SNIA?, IETF?
Ch 9: Cache Services Option 1: Managed by client Remote resource doesn’t know about it Option 2: Composed with resource Transparent to client GAP! – Management Interface Policies: coherence WGs: None
Ch 10: Data Replication Creating replicas Discovering replicas Validation of registered replicas Consistency GAP! – consistency policies Managing replicas Supports files and databases GAP! – more detail about these interfaces WGs: OREP?
Ch 11: Data Federation General description Creation of federations Hard to standardise detailed capability Creation of federations Expansion / Contraction Add / Remove input sources Add / Remove access mechanisms GAP! - Federation policies Security considerations GAP! – properties that describe a source’s access capabilities WGs: None
Ch 12: “Metadata” Catalogues Publish, Update, Classify, Augment, Delete Find, Subscribe XML format, Xpath/Xquery Consistency management GAP! - Consistency policies Duplicating Information Services DT work? WGs: ?
Scenarios document Introduction Data Replication Data Transfer Data Integration Data Staging Personal Data Profile Data Discovery Data Storage Data Federation
GAP! - Grid file system scenario Where to fit into the architecture? Probably a combination of human-readable names & data transfer Left-field possibility: a query language for selecting files from a file system GAP! – File description metadata Also useful for file replication WGs: GFS?
General issues Language for information documents? Description vs. MUST/SHOULD Present vs. future tense Format for interface descriptions Policy languages, ontologies Working groups needed Should OGSA-D address any gaps ourselves? Reaffirm policy that recommendation documents are written by dedicated WGs
Next steps Currently GGF16, Athens Meanwhile… Revising both documents for consistency Gap analysis GGF16, Athens Present documents to wider community Discussions with GFS Aim for submission soon after Meanwhile… DAIS, ByteIO already submitted