“Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC
e-Science Workflow Services - Talk Overview l Background: OGSA-DAI and DAIS l Motivation and Definitions l Hierarchies of Service Coordination l Conclusions
e-Science Workflow Services - OGSA-DAI and DAIS l GGF DAIS WG u Database Access and Integration Services u Attempting to standardise interfaces based on OGSI l OGSA-DAI u Aim to provide an implementation of DAIS u Serve UK e-Science Community l OGSA-DAI and DAIS u Currently not aligned l Data service interface in OGSA-DAI coarse grained u Based on an earlier version of DAIS l Data service interface in DAIS currently fine grained u Scope for more coarse grained interfaces u OGSA-DAI will realign DAIS once the latter stabilizes
e-Science Workflow Services - OGSA-DAI Project Partners Powered by ….
e-Science Workflow Services - Data Resource 1. Provides access to a data resource. Simple Data Service Scenario Client Data Service Data Resource 2. May provide integration of several data resources.
e-Science Workflow Services - Some Definitions l Data Resource u An object that can source/sink data u Currently databases in scope l Files and file systems may come in scope l Data Services u Grid services u Provides common interface to data resources u Exposes some capabilities of a data resource l SQL Queries, XPath, BinX, … u Can also provide additional capabilities l Transformations, Third party data delivery, etc …
e-Science Workflow Services - Motivation l Want common interfaces for: u Data access u Data integration l As requests to data service may produce lots of data u Want to minimise data movement l Hence encapsulate interactions with service u Serialise multiple interactions into one interaction u Abstract each interaction into an “activity” u Data flows between activities u Use a document mechanism to describe this l DAIS and OGSA-DAI u Concerned with data flow u Currently do not have control constructs l No looping, conditionals, splits, joins, …
e-Science Workflow Services - Service Coordination Patterns Client Data Service 1. Coordinate of activities performed at one Data Service. Data Service 2. Client choreographs a set of services to work together. Service … or a service may orchestrate on behalf of the client. 3. Orchestration of services using a document directed to one service. 4. Possibly interface with standard workflow languages, e.g. BPEL4WS, WSCI, …
e-Science Workflow Services - Coordination Hierarchies l Service coordination may take place: u Intra service l Document based u Inter services – application driven l Choreographed/orchestrated by a client or service u Inter service – document driven l Orchestration l Ideally would look the same as the intra service document based interface u Combined with other workflow languages
e-Science Workflow Services - Intra Service Processing l Service processing described by a document l Possible activities (OGSA-DAI perspective): u Statement l SQL Query, XPath Query u Delivery l Input data from third party l Output data to a third party l Deliver data in the response u Transformations l XSL Transformations, compression l OGSA-DAI has produced a framework for this
e-Science Workflow Services - Simple Example: no data flow sqlQueryStatement DeliverToURL select * from myTable where id=10
e-Science Workflow Services - Simple Example: with data flow DeliverToURL select * from myTable where id=10 sqlQueryStatement
e-Science Workflow Services - The Perform Document <gridDataServicePerform xmlns=" xmlns:xsi=" xsi:schemaLocation=" This example performs a simple select statement to retrieve one row from the test database. The results are delivered within the response document. select * from littleblackbook where id=10
e-Science Workflow Services - Predefined Building Blocks sqlQueryStatement sqlStoredProcedure sqlUpdateStatement sqlBulkLoadRowset xPathStatement xUpdateStatement xQueryStatement xmlResourceManagement xmlCollectionManagement relationalResourceManager gzipCompression zipArchive xslTransform inputStream outputStream DeliverFromURL DeliverToURL DeliverToGFTP DeliverFromGFTP DeliverToStream DeliverFromGDT DeliverToGDT
e-Science Workflow Services - Activities: positives l Simple sequence pattern u Data-flow l Avoid multiple message exchanges l Minimise data movement l Extensible u XML Schema excerpt gives syntax u Associate an implementation with activity u Done at configuration l Allows optimisation u Enactment engine can optimise interaction
e-Science Workflow Services - Activities: negatives l Incomplete syntax u Activity inputs and outputs are not typed u No typing of data streams u Possible issue in coming up with a sensible document l Activity implementation & XML schema loosely coupled u Keeping activity and implementation in synch l Semantics are not specified l Puts work load on the server u Workloads on the server may need to be managed l Activities not exposed at the interface level u This may change in line with DAIS l Perform document factored out from DAIS base specs u Standardisation to become a DAIS informational document u Scope may be bigger than DAIS
e-Science Workflow Services - Inter Service Application Defined "Workflow" l Services stitched together by an application u Could be a client l Use the OGSA-DAI GridDataTransport (GDT) portType u Could be another service l Distributed Query Processing (DQP) l Service configured separately u Each performs its part in the workflow
e-Science Workflow Services - Client Driven Scenario (aka poor man's data integration) Client Data Service … … GDT Client creates Data Services.
e-Science Workflow Services - Service Driven Scenario Client Query planning, compilation, scheduling, evaluation, partitioning GDQSGQES Evaluate sub-queries Distributed Query Processing
e-Science Workflow Services - More Complex DQP Scenario
e-Science Workflow Services - Application Driven "Workflow" l Labour intensive u Client driven (service choreography) l Restricted to small numbers of services u Need tooling u Even then this is best done through other means u Service driven (service orchestration) l DQP hides details l There may be other examples … l Need to explore this space further u Can probably accommodate these patterns in an existing workflow language l For more general data integration need: u Describe more sophisticated behaviour
e-Science Workflow Services - Inter Service Document Coordination l Currently evolving l Document describes: u Sequence of operations that may span multiple services l Single document includes enough information to: u Run an expression on a source data service u Deliver the results to a target data service u Run and expression on the target data service l Informational document to be presented at GGF10
e-Science Workflow Services - A Dataset Example Client Data Service Request DataRequest.xsd … RemoteRequiredTable DataAccessRecipe.xsd … … Data Service
e-Science Workflow Services - Document Driven "Workflow" l Work in this area is tentative u No implementations as yet l OGSA-DAI needs to see how it matures u Shows versatility l Carries over some of the OGSA-DAI activity framework u Focused on data l Can track provenance in the dataSet l Needs to be positioned against general workflow languages
e-Science Workflow Services - Traditional Workflow l OGSA-DAI has not explored this space … yet u May need such a framework to facilitate data integration l Traditionally workflow: u Revolves around the execution of atomic activities u Use a processing model, e.g. WfMC based l Akin to how people talk about service orchestration l Want to use existing frameworks as far as possible u OGSA-DAI does not want to define its own workflow u DAIS may come up with something l Clearly: u Activity model can be used to implement a workflow u Collecting use cases
e-Science Workflow Services - Workflow Issues l OGSA-DAI needs to play to see what works l Standards still evolving u IP rights: l BPEL4WS u Royalty-free … ? l WSCI u Royalty-free l Need workflow engines l Tooling to construct workflow u Ptolemy II … Triana … ?
e-Science Workflow Services - Summary & Conclusions l Base standards in a state of flux u DAIS not settled down yet l If you don't like what you see get involved and change it u Document based interface needs to be re-worked l OGSA-DAI implemented simple "workflow" patterns u Successful for data access u Shied away from real workflow u Should try to use emerging standards if possible l Data integration will require workflow patterns u Need to examine use cases l Positioning of OGSA-DAI u Want it to be the leaves of your complex workflow graphs u Wrap your data sources and sinks l Try OGSA-DAI and feedback!
e-Science Workflow Services - Further information l The OGSA-DAI Project Site: u l The DAIS-WG site: u l OGSA-DAI Users Mailing list u u General discussion on grid DAI matters l Formal support for OGSA-DAI releases u u l OGSA-DAI training courses