Composing workflows in the environmental sciences using Web Services and Inferno Jon Blower, Adit Santokhee, Keith Haines Reading e-Science Centre Roger.

Composing workflows in the environmental sciences using Web Services and Inferno Jon Blower, Adit Santokhee, Keith Haines Reading e-Science Centre Roger Peppé, Charles Forsyth Vita Nuova Holdings Ltd

Summary  We have devised a system that allows data to be streamed directly from Web Service to Web Service  Also allows monitoring of progress and other state information  Uses the Inferno operating system  Workflows can be executed in any WS-based workflow engine, e.g. Triana

Motivation  Environmental scientists work with large datasets (~TB)  Datasets are often remote  Would like to perform compute-intensive tasks and sophisticated 3-D visualization  Would like to be able to assemble workflows based on remote services  BUT large datasets raise issues  Need to find efficient way of moving these datasets between services  Services are long-running, hence would like to monitor progress  Web Services get us only part of the way there

Workflows  Typical workflow will be of the form “extract -> process -> visualize”  i.e. small number of services in a linear workflow or pipeline  Each service likely to be data-intensive and long-running  Would like to use Web Service based systems, however:  The data should not have to pass through the workflow engine  Large datasets should not be placed on SOAP message, either in the XML or as attachments  Could pass around pointers to the data (e.g. URLs) but this would involve writing output data to disk in some temporary cache  Ideally we want to stream data between services.

Data movement Client (or WF engine) Service A Service B Service C Path of SOAP messages Desired path of data

Data streaming and Inferno  If all services in the workflow were running as Unix filters on the same machine, we could write something like:  extract | process | render  But we want to pipe the data across a network  Plus we’d like to be able to monitor progress  Inferno is an operating system that’s designed for distributed computing  It provides an easy way of doing this streaming in a distributed system  The alphabet shell  But fairly steep learning curve for new users

Brief intro to Inferno  Inferno is built from the ground up for distributed computing  Extremely lightweight (~ 1MB RAM) so can run as emulated application on multiple platforms (Linux, Windows, Solaris...)  Hence it is a powerful base for Grid middleware  Everything in Inferno is represented as a file or set of files  cf. /dev/mouse in Unix  So to create a distributed system, just have to know how to share “files”  The Styx protocol is used for all file manipulations  Very lightweight (only 13 commands)  Versions exist for C, Java, Python, (easy to implement for more)

Our solution  We wanted a system that uses the attractive features of Inferno in a Web Services environment  Hides complexity, reduces learning curve  We want to implement direct streaming of data and progress monitoring  We'd like to make it easy to wrap binary executables with little or no modification  Our solution is to create an “Inferno Grid Service” (IGS) that wraps the binary.  Shares many features with an OGSI-type service  We then create a thin Web Service wrapper around the IGS

An Inferno Grid Service / Root of the service – associated with a given URL (e.g. styx://myserver:9875/) clone Read the clone file to create a new service instance 0/ 1/ 2/ ctl params progress status endpoint/ in out block/ progress status styx://myserver:9875/0/endpoint/out These files maintain state data

Web Service Interface Inferno Grid Service Anatomy of a service Binary executable 2. New instance of Inferno Grid Service is created 3. Executable is started. stdin and stdout are redirected to the endpoint 1. SOAP message arrives containing URL of IGS that's providing the input data. e.g. styx://myserver:9875/1 4. SOAP message is sent back to client, containing URL of new IGS, e.g. styx://server2:9875/4 5. As executable progresses, it writes state data (e.g. progress) to the IGS instance 6. Clients read state data from the IGS instance

Messaging in Inferno  Our Inferno Grid Service is also a messaging system  Similar to publish/subscribe system  Very easy to implement and use  Requires no firewall holes to be open on the client!  Clients (message recipients) run no server processes  Clients read from blocking files on the IGS  The first read from a client yields the data immediately  Subsequent reads block indefinitely until the data change  Then the client gets the reply  Inferno keeps track of multiple clients effortlessly

Application: correlation analysis  Correlation analysis is an important tool in understanding the circulation of the oceans and atmosphere  Particularly important for the science of data assimilation  Our app calculates the correlation between timeseries of given quantities  Could be salinity at a certain depth, temperature on a density surface, etc  Reveals information about the ocean circulation, characteristic scales of physical processes, etc

Demonstration

Future work  Create a toolkit for environmental scientists, based on this architecture  Integrate with CDAT (Climate Data Analysis Tools)  Well-known toolkit, but not good at handling large datasets  Integrate with Inferno Grid  Condor-like technology for batch processing  Demo in the ReSC booth!  Try out computational steering  Should be relatively easy – clients write new parameters to the IGS and the executable picks them up

Composing workflows in the environmental sciences using Web Services and Inferno Jon Blower, Adit Santokhee, Keith Haines Reading e-Science Centre Roger.

Similar presentations

Presentation on theme: "Composing workflows in the environmental sciences using Web Services and Inferno Jon Blower, Adit Santokhee, Keith Haines Reading e-Science Centre Roger."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Composing workflows in the environmental sciences using Web Services and Inferno Jon Blower, Adit Santokhee, Keith Haines Reading e-Science Centre Roger.

Similar presentations

Presentation on theme: "Composing workflows in the environmental sciences using Web Services and Inferno Jon Blower, Adit Santokhee, Keith Haines Reading e-Science Centre Roger."— Presentation transcript:

Similar presentations

About project

Feedback