DAIS Grid1 Database Access and Integration Services on the Grid * * Authors: N. Paton, M. Atkinson, V. Dialiani, D. Pearson, T. Storey, P. Watson Florida International University School of Computing and Information Sciences Summer 2006 Presented by:Ariel Cary
DAIS Grid2 Agenda Introduction Scope and Context of Proposal Proposed Database Services DS in OGSA Current DAIS Standards and Systems Conclusion
DAIS Grid3 Introduction Grid research generally focus on applications where data is stored in files DBMS systems have a central role in data organization for numerous applications, e-Science: particle physics ( ), earth sciences, bio-informatics There is a need to interconnect pre-existing and independently operated databases
DAIS Grid4 Introduction (cont) This work seeks to encourage the development of standards that can meet those needs. A (preliminary) proposal is made for the staged development of a collection of Grid Database Services that allow access to existing, autonomous databases within Grid Follows a service-based approach within OGSA framework for DBMS integration
DAIS Grid5 Introduction (cont) How functionalities are supported may come to be implemented in different ways (performance characteristics, etc.) Services definitions essentially state what functionality is to be supported
DAIS Grid6 Scope and Context of Proposal
DAIS Grid7 Scope The proposal has several characteristics – Independent of any specific Grid toolkit (could skew and restrict it) – It does not propose the development of a new DBMS for the Grid, but wrapping existing systems to a consistent interface and developing distributed managers – Independent of any specific data model or access language
DAIS Grid8 Context Relevant terms related to Databases – Database Service is any service that supports a database interface (WSDL) – Service interfaces are abstract and not prescriptive on how they are supported, or the data model that underpins a DBMS – Specific DBMS services could provide access to relational or object DBMS, XML repositories, specialist storage systems …
DAIS Grid9 Context – Grid Database Service (GDS) provides capabilities for querying, updating and evolving a database – The interface also describes: Data delivery: transmitting structured data Transactions: coordinating collections of operations Database Metadata: accessing information about the data a DB service provides
DAIS Grid10 Proposed Database Services
DAIS Grid11 Database Discovery It is assumed that a registry lookup returns a Grid Service Handle (GSH), globally unique name for a service instance A service provider publishes description (WSDL) of a service to a service registry Later consulted by a requestor, and binding created that allow calls to the service
DAIS Grid12 Database Statements Thus, it is a point of tension with the proposal being independent of the data model Statements allow queries or change operations to be sent to a DBMS This implies that the underlying DBMS supports a query or command language, different on every database model
DAIS Grid13 Database Statements (cont.) The pairs (queryNotation, query), … are introduced to allow flexibility (like MIME types for attachments) For example: – queryNotation=“SQL’92” – query=“Select * from EMP Where Salary>1000”
DAIS Grid14 Database Statements (cont.) The optional txHandle indicates if the operation is part of a transaction, provided the DBMS supports transactions The final results of an operation are managed via: – resultHandle: generated dynamically – expires: an expiry time up for the result to be claimed
DAIS Grid15 Database Statements (cont.) The operations on a GDS will be atomic: – Preparation and Validation: consistency check – Application: operation is performed – Result Delivery: results available to the caller Usually involve transfer of large amounts of data which may take long time to execute (prone to interruptions!) The implementation of the DBMS service should handle such failures to achieve atomicity
DAIS Grid16 Delivery System Means by which (potentially large amounts of) structured data is moved from one locations to one or more others Should be considered complementary to protocols such as GridFTP, which could be used as a delivery mechanism
DAIS Grid17 Delivery System (cont.) Single data source to be delivered, represented as a URI Several destinations represented by URI with delivery mechanisms associated The deliver operation initiates delivery of the data from the single source to multiple destinations A more elaborated delivery system would include encryption, progress monitoring, etc.
DAIS Grid18 Distributed Transactions A minimal transaction interface: performs the role of conferring a guaranteed unique identity on the transaction Given a transaction handle, other operations over a database service can be put explicitly within the context of a transaction, using the txHandle parameter
DAIS Grid19 Distributed Transactions (cont.) For a transaction to span multiple DBMS services, they must provide operations for use by the transaction manager that is overseeing the distributed transaction startTransaction includes an expires param. to limit the consumption of resources prepareCommit operation can be used by a two-phase commit protocol to ensure that all participating database services commit
DAIS Grid20 Database Metadata Metadata that could be useful to have access to includes: – Content description: DB schema – data model, logical & physical structures, stats (could be obtained from the data dictionary) – Capability description: language (query /update operations supported), transactional capabilities, protocols supported The metadata should be described in a standard representation, e.g. XML document given by the data service provider
DAIS Grid21 Distributed Query Service Query DS1 (DQS) Parsed & optimized Sub-queries to relevant DB’s Results collected & joined by DQS
DAIS Grid22 Database Services in OGSA
DAIS Grid23 DS in OGSA The Open Grid Services Architecture (OGSA) represents an evolution towards a Grid system architecture based on Web services concepts and technologies* * The described interfaces can be used as the basis of database services through participation in the OGSA Thus many features of this architectural framework can be obtained for service creation, authorization, notification, etc.
DAIS Grid24 Requirements from OGSA The secure connection and authentication mechanism underpins all GDS security and authentication The lifetime management model carries over unchanged as the lifetime management model for GDS The notification mechanism specified in OGSA appears to satisfy the GDS needs
DAIS Grid25 Requirements from OGSA (cont.) It is required information about the user authorization (potentially through many intermediate grid services) – User identification services, referenced from a certificate Certification of the services themselves may be necessary. A discovery service could be tricked to mimic the intended GDS and get the data sent Some databases charge for their use. It is necessary to support a digital payment process
DAIS Grid26 Current DAIS Standards and Systems
DAIS Grid27 DAIS Standards Global Grid Forum – “The Global Grid Forum (GGF) is the community of users, developers, and vendors leading the global standardization effort for grid computing.” Part of the GGF: DAIS-WG – “The group seeks to promote standards for the development of grid database services, focusing principally on providing consistent access to existing, autonomously managed databases.”
DAIS Grid28 OGSA-DAI System OGSA-DAI Overview Architecture + Extensibility Supported Data Resources “The aim of the OGSA-DAI project is to develop middleware to assist with access and integration of data from separate sources via the grid…and is working closely with the Global Grid Forum DAIS- WG...”
DAIS Grid29 Conclusion
DAIS Grid30 Conclusion This document has made a preliminary, service-oriented proposal for integrating database functionality into a Grid setting It is hoped that the document will provoke discussion on how best databases can be integrated with Grid middleware There is an establish community dedicated to defining DBMS service standards, and emerging system are adopting them