Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimising the OGSA-DAI Enactment Model

Similar presentations


Presentation on theme: "Optimising the OGSA-DAI Enactment Model"— Presentation transcript:

1 Optimising the OGSA-DAI Enactment Model
eSI, Workflow Optimisation in Distributed Environments (WODE ’06) Konstantinos Karasavvas Research Associate NeSC, University of Edinburgh

2 Enactment Model optimisation Reconstruction Scenarios Summary
Overview Background: OGSA-DAI Workflow distinctive features Enactment Model optimisation Reconstruction Scenarios And future work Summary We will provide a brief overview of OGSA-DAI, its goals and its operation as well as its workflow functionality. Initially we were resisting the idea that we actually had a workflow language plus enactment engine. We now accept it and have identified certain distinctive characteristics that differentiate OD workflows from others. eSI - WODE, 19 Oct. 06

3 OGSA-DAI in a nutshell An extensible framework for data access and integration Expose heterogeneous data resources to a grid through web services Interact with data resources Queries and updates Data transformation / compression Data delivery Application specific functionality A base for higher-level services Federation, data mining, visualisation Reduce development cost of data centric grid applications by providing useful functionality readily packaged Aim to provide consistent interfaces to heterogeneous data resources It allows interaction with these data resouces as well as provides additional processing functionality - transformation on the data (Projection) or on the stream themselves, e.g. tee a stream - several data delivery methods: ftp, gftp, url, SMTP, service to service - extensible framework: users can add their own functionality (activities) which can be application specific OGSA-DQP, Data mining (DataMiner group in Vienna), etc. (visualisation ?? Or delete!) [[minimise data movement by bringing computation close to data]] ~1 minute eSI - WODE, 19 Oct. 06

4 OGSA-DAI Request/Response
Data Service DB Response 3rd Party Request: Perform document (XML) describes the workflow (we see a simple flow of work) - contains activities that are chained together to form a workflow (define activities as ogsadai workflow units of work - workflow is being processed and executed in the OGSA-DAI service (enactment model!) - resutls are produced: could have the results on the response document or delivered asynchronously depending on the activities used Streamed model: focus is on data, so data is streamed through the system when possible (almost always) - A1 produces output… A2 receives and processes… A1 might send another which is buffered.. When A2 finishes - It sents to A3 to process and starts processing on the next input blocks Activity A Activity B Activity C eSI - WODE, 19 Oct. 06

5 Example OGSA-DAI Request
sqlQueryStatement SELECT * FROM Bands WHERE name = Bangles; deliverFromURL ResultSet sqlResultsToXML XSL WebRowSet xslTransform HTML deliverToURL ftp:// eSI - WODE, 19 Oct. 06

6 Workflow Distinctive Features (1)
Keep everything local We have homogeneity Avoid data movement Call with whole result Separate calls E.g. we can pass object references Activities are configured at service deployment Cannot use any “service” you want in your workflow sqlQuery RS transform “You could give a SQL query followed by some column transform as your example.   Doing this with web services rather than activities would require either batching the query results and sending them to the transform service in a batch or making an expensive web service call for each transforms which would be crazy.” sqlQuery Service SOAP/XML transform Service eSI - WODE, 19 Oct. 06

7 Workflow Distinctive Features (2)
Simple, efficient workflow language Sequence, flow Fits our data processing needs Other constructs can be used at the activity level (e.g. exclusive choice – a.k.a. if-split) A We try to keep the language as simple as possible for efficiency as long as it fits our data processing needs. We might add more workflow language constructs if there is need to. E.g. exclusive choice Contained Workflow, within a service… if-split begin if-split end B x y pipe closes pipe closes eSI - WODE, 19 Oct. 06

8 Workflow Distinctive Features (3)
Streaming model Large quantities of data Parallel processing (pipelining) Implicit iteration via streaming Large quantities of data: data are separated into chunks for more efficient processing and memory usage Parallel processing: while activity two processes, activity one also processes Iteration via streaming: we introduced control blocks to group entities together, which effectively allows us to iterate in the workflow We typically need small amounts of computation per byte of input/output eSI - WODE, 19 Oct. 06

9 Enactment Model A.k.a. Activity Framework Core component of OGSA-DAI
It is responsible for performing tasks (activities) and streaming data DB Query Produces data in blocks Pipe block Stores and provides access to data blocks block . Enactment model/engine, aka Activity Framework . Core component of OGSA-DAI . It is responsible for performing tasks (activities) and streaming data + activities are specific processing tasks + they often interact with a data resource + they usually consume and produce data + can be chained by connecting outputs and inputs Consumes data blocks Delivery eSI - WODE, 19 Oct. 06

10 Activity Processing (Current Release)
Processing of blocks (and therefore activities) is controlled by the pipe – from outside the activity processBlock() is called many times until processing is complete Usually consumes and produces a single block per call eSI - WODE, 19 Oct. 06

11 Activity Processing (New)
Processing of blocks will be controlled by the activity process() is called exactly once Consumes and produces blocks as necessary Each activity in a pipeline processes within each own thread Pipes receive and may buffer blocks until they are requested eSI - WODE, 19 Oct. 06

12 Current Model Activity A Pipe Activity B Called repeatedly Request
processBlock() block processBlock Pipe Called repeatedly block getBlock processBlock Request Processor Activity B processBlock() eSI - WODE, 19 Oct. 06

13 New Model Activity A Pipe Activity B Called Once Processing Service
putBlock Pipe Processing Service initialise Chains are synchronised in the buffered pipes block as result getBlock process Activity B process() eSI - WODE, 19 Oct. 06

14 Improvements to the Enactment Model
Each activity is run in its own thread Rather than one thread per chain Each activity is called to process only once Rather than once per block Each activity sends its result to the pipe when ready Rather than wait to be called The pipe now buffers Rather than delegate the request for new block Performance improvements We expect significant performance improvements and will be running fuller benchmarks once the implementation is complete. eSI - WODE, 19 Oct. 06

15 Mobile Code Allows to ‘embed’ functionality in your workflow
Allows dynamic inclusion of new code Extends the workflow functionality without calling to a remote service Simple but ‘prepared’ code [We typically need small amounts of computation per byte of input/output] To deal with the issue that activities are pre-configured in the server side! “Mobile code can be seen as another optimisation of the workflow as you are allowing users to 'embed' their service functionality within OGSA-DAI at runtime to minimise the expensive data movement that would result if we called out to a remote service. “ Code has to wrapped with simple interfaces to comply with the OD framework (much simpler than building web services but not standardised!) A prototype exists! Activity A Activity B Mobile Code eSI - WODE, 19 Oct. 06

16 Workflow Reconstruction
Be able to identify patterns Describe workflow nodes/edges Reconstruct part of the workflow graph Performance Some example scenarios This is an area that we haven’t done much work yet… but we are interested to investigate! Be able to identify patterns in the workflow graph based on some predefined rules/criteria. Need to (semantically) describe graph nodes/edges (activities and their inputs and outputs) Then substitute part of that graph with another that is formally equivalent (semantically identical) and will give some performance benefits! Let’s see at some example scenarios… eSI - WODE, 19 Oct. 06

17 Scenario (1) RSToWebRowSet RS WRS WRS sqlQuery RSToWebRowSet
WebRowSetProjection sqlQuery RS resultsetProjection RSToWebRowSet More efficient to first project in the result set… and then convert to WRS which is more expensive computationally… WRS eSI - WODE, 19 Oct. 06

18 Scenario (2) A FTP Split filename file URL FTP A
Explain split activity! (effectively parallelises) Reconstructed with split to send to multiple FTP activities; take advantage of available bandwidth eSI - WODE, 19 Oct. 06

19 Thoughts on Reconstruction
Currently recommendations Not automated How and what do we describe? Scenario 1 Semantics of projection Activities’ I/O typing Scenario 2 Can be applied in many scenarios Rules needed When are two workflows equivalent? Models of enactment and activities How do we estimate/predict the cost of a workflow? Future work Need to investigate the above We identified these scenarios but currently we can only recommend them to the users – good perform document practices. We would like to be able to perform similar optimisations automatically at the background. This is an open question. eSI - WODE, 19 Oct. 06

20 Thoughts on Parallelisation
Parallelising the execution of OGSA-DAI already exploit thread parallelism Multiple nodes/JVM running activities Requires serialisation of objects passing between activities When and where do we serialise? Future work Need to investigate the above “You could also talk about theoretical issue that OGSA-DAI could optimise the workflow. The simplest idea here would be to parallelise execution across multiple JVMs.  This requires serialisation of objects passing from activity to activity.  How would be know the best place to do this serialization.” This is a difficult problem to deal with. We believe that the serialisation overhead might outweigh the parallelisation performance increase in some scenarios! This is an open question. eSI - WODE, 19 Oct. 06

21 Overview of OGSA-DAI workflow
Summary Overview of OGSA-DAI workflow Motivation Distinctions Enactment optimisations Improved pipeline processing Workflow reconstruction and parallelisation Motivations (e.g.): - Encapsulates multiple interactions with a WS in one - Moves computation closer to the data - Extensible activity framework Distinctions (e.g.): - Simple language - Streaming model We saw in some detail the OGSA-DAI pipeline processing and the improvements we made to improve its performance: Namely, activities having control of processing to save calling them per block…. in between buffers… etc. And finally we discussed a bit about workflow reconstruction an area which we are interested to explore further. Most significantly how and what to describe! Parallelisation! eSI - WODE, 19 Oct. 06

22 www.ogsadai.org.uk kostas@nesc.ac.uk
Questions?


Download ppt "Optimising the OGSA-DAI Enactment Model"

Similar presentations


Ads by Google