17 March 2008Standards for Interoperable Grids 1 Data Management Standards for Interoperable Grids: Experience from NextGRID and OMII-Europe Clive Davenhall National e-Science Centre, University of Edinburgh
Standards for Interoperable Grids2 Data Management: Overview Manipulation and management of data. Typically including: Processing Transfer Storage Access
Standards for Interoperable Grids3 Data Management: Overview Manipulation and management of data. Typically including: Processing Job execution, BES, JSDL Transfer Storage Access
Standards for Interoperable Grids4 OGSA Standards There are a number of OGSA data management standards: DMI: data transfer. ByteIO: data access (file-like), data transfer. WS-DAI: data access (database-like). Can be used individually or in concert with other OGSA standards.
Standards for Interoperable Grids5 OGSA-DMI DMI: Data Management Interface. Not yet a specification; still a draft: currently receiving public comments, completion is imminent. A standard mechanism for moving data between locations: from a source of data, to a sink (or destination) of data.
Standards for Interoperable Grids6 OGSA-DMI Architecture A standard structure or interface Various resources can use and interoperate. Support a variety of protocols for the actual data transfer: GridFTP, file access, OGSA-ByteIO, SRB. Supports ‘third party’ transfers, a superintending process initiates a transfer from a remote source to a remote sink. Only concerned with moving bytes from the source to the sink: not concerned with the semantics or structure of the data, though future versions might be.
Standards for Interoperable Grids7 Port Types DMI: a mechanism for scheduling and managing data transfers. Provides two port types. Uses the factory pattern. DTF: Data Transfer Factory Client invokes a DTF to create a DTI. DTI: Data Transfer Instance Service created to perform a specific transfer.
Standards for Interoperable Grids8 DTI Operations A DTI (Data Transfer Instance) will support the following operations: Start Activate Stop Resume Suspend GetState GetInstanceAttributeDocument
Standards for Interoperable Grids9 Sources and Sinks Source: Emits an ordered sequence of bytes. Sink: Receives an ordered sequence of bytes. For a resource to act as a source or sink in a DMI transfer it must: Provide suitable services to send or receive data. Furnish a list of protocols that it can use. Information about how data are to be sent or received is encapsulated in a DEPR (Data Endpoint Reference).
Standards for Interoperable Grids10 DEPR DEPR: Data Endpoint Reference. Encapsulates all the information about: How data in a source are to be accessed. How data sent to a sink are to be received. Includes all the transport protocols supported by a source or sink. Contains endpoint references to access the data. In future versions these endpoint references will use WS-Addressing.
Standards for Interoperable Grids11 NextGRID Recommendations Resources should be modelled as WS- resources. Transfers must be implemented as ‘Logical Data Transfers’ (the most flexible of several options available). Prescribes a mechanism to query the protocols available to a source or sink. OGSA-ByteIO must be one of the protocols available to both the source and sink.
Standards for Interoperable Grids12 OGSA Data Management Standards DMI: data transfer. ByteIO: data access (file-like), data transfer. WS-DAI: data access (database-like).
Standards for Interoperable Grids13 OGSA ByteIO POSIX-like access to remote resources. The remote resource can be any source of data: files, sensors, live-data streams, etc… Aims to provide access transparency.
Standards for Interoperable Grids14 Mapping to Web Services Core OGSA ByteIO Specification Independent of any basic profile. ByteIO OGSA WSRF Basic Profile Rendering Mapping to WSRF Basic Profile. Currently WSRF is the only mapping. Others are anticipated.
Standards for Interoperable Grids15 ByteIO Access Methods Two access methods. Implemented as port-types. Each is optional. RandomByteIO: Direct random access to a portion of data resource. Portion to access specified as offset from start of the resource. StreamableByteIO : Streamed access to a data resource. Each access relative to the previous access.
Standards for Interoperable Grids16 RandomByteIO read(startOffset: unsignedLong, bytesPerBlock: unsignedInt, numBlocks: unsignedInt, stride: long): byte[] write(startOffset:unsignedLong, bytesPerBlock: unsignedInt, stride: long, data: byte[]): void append(data: byte[]): void truncAppend(offset: unsignedLong, data: byte[]): void
Standards for Interoperable Grids17 RandomByteIO read as XML xsd:unsignedLong xsd:unsignedInt xsd:long byteio:transfer-information-type
Standards for Interoperable Grids18 StreamableByteIO seekRead(offset: long, seekOrigin: URI, bytesToRead: unsignedInt): byte[] seekWrite(offset: long, seekOrigin: URI, data: byte[]): void
Standards for Interoperable Grids19 NextGRID Recommendations Must conform to the WSRF rendering. Must support RandomByteIO. Restrictions on naming.
Standards for Interoperable Grids20 OGSA Data Management Standards DMI: data transfer. ByteIO: data access (file-like), data transfer. WS-DAI: data access (database-like).
Standards for Interoperable Grids21 OGSA WS-DAI WS-DAI: Web Service Data Access and Integration. Access to remote data resources. Modelled on access to databases, - of various sorts.
Standards for Interoperable Grids22 WS-DAI Data Resource Models The CORE WS-DAI Specification Independent of data model. Implemented as a model-dependent realisation. WS-DAIR Modelled on access to relational databases. Queries in SQL. WS-DAIX Modelled on access to XML databases. Queries in XPath, XQuery and XUpdate. Anticipated that additional realisations will be developed: eg, RDF, object databases…
Standards for Interoperable Grids23 Properties A WS-DAI resource has a number of properties which a client can interrogate to determine the resource’s characteristics: DataResourceAbstractName: ParentDataResource: DataResourceManagement: DatasetMap: ConfigurationMap: LanguageMap: DataResourceDescription: Readable Writeable: ConcurrentAccess: TransactionInitiation: TransactionIsolation ChildSensitiveToParent
Standards for Interoperable Grids24 Data Resources Externally managed resources Data stored using a pre-existing DBMS which has its own existence apart from WS-DAI. WS-DAI gives access to this resource. Service managed resources No independent existence. WS-DAI exists to manage the resource. For example, the results of a previous query could be made available as a serivce-managed resource.
Standards for Interoperable Grids25 Direct and Indirect Access Patterns for obtaining the results of queries to a resource. Direct Access The results are simply returned in response to the query. Indirect Access Effectively implements the ‘factory pattern’. The results are not returned in the response to the query. Rather, they are made available as a data resource in their own right.
Standards for Interoperable Grids26 NextGRID Recommendations WS-DAI access is optional for NextGRID. Resources should be modelled as WS- resources. Restrictions on naming.