Download presentation
Presentation is loading. Please wait.
Published byJason Singleton Modified over 8 years ago
1
Composing workflows in the environmental sciences using Web Services and Inferno Jon Blower, Adit Santokhee, Keith Haines Reading e-Science Centre Roger Peppé, Charles Forsyth Vita Nuova Holdings Ltd
2
Summary We have devised a system that allows data to be streamed directly from Web Service to Web Service Also allows monitoring of progress and other state information Uses the Inferno operating system Workflows can be executed in any WS-based workflow engine, e.g. Triana
3
Motivation Environmental scientists work with large datasets (~TB) Datasets are often remote Would like to perform compute-intensive tasks and sophisticated 3-D visualization Would like to be able to assemble workflows based on remote services BUT large datasets raise issues Need to find efficient way of moving these datasets between services Services are long-running, hence would like to monitor progress Web Services get us only part of the way there
4
Workflows Typical workflow will be of the form “extract -> process -> visualize” i.e. small number of services in a linear workflow or pipeline Each service likely to be data-intensive and long-running Would like to use Web Service based systems, however: The data should not have to pass through the workflow engine Large datasets should not be placed on SOAP message, either in the XML or as attachments Could pass around pointers to the data (e.g. URLs) but this would involve writing output data to disk in some temporary cache Ideally we want to stream data between services.
5
Data movement Client (or WF engine) Service A Service B Service C Path of SOAP messages Desired path of data
6
Data streaming and Inferno If all services in the workflow were running as Unix filters on the same machine, we could write something like: extract | process | render But we want to pipe the data across a network Plus we’d like to be able to monitor progress Inferno is an operating system that’s designed for distributed computing It provides an easy way of doing this streaming in a distributed system The alphabet shell But fairly steep learning curve for new users
7
Brief intro to Inferno Inferno is built from the ground up for distributed computing Extremely lightweight (~ 1MB RAM) so can run as emulated application on multiple platforms (Linux, Windows, Solaris...) Hence it is a powerful base for Grid middleware Everything in Inferno is represented as a file or set of files cf. /dev/mouse in Unix So to create a distributed system, just have to know how to share “files” The Styx protocol is used for all file manipulations Very lightweight (only 13 commands) Versions exist for C, Java, Python, (easy to implement for more)
8
Our solution We wanted a system that uses the attractive features of Inferno in a Web Services environment Hides complexity, reduces learning curve We want to implement direct streaming of data and progress monitoring We'd like to make it easy to wrap binary executables with little or no modification Our solution is to create an “Inferno Grid Service” (IGS) that wraps the binary. Shares many features with an OGSI-type service We then create a thin Web Service wrapper around the IGS
9
An Inferno Grid Service / Root of the service – associated with a given URL (e.g. styx://myserver:9875/) clone Read the clone file to create a new service instance 0/ 1/ 2/ ctl params progress status endpoint/ in out block/ progress status styx://myserver:9875/0/endpoint/out These files maintain state data
10
Web Service Interface Inferno Grid Service Anatomy of a service Binary executable 2. New instance of Inferno Grid Service is created 3. Executable is started. stdin and stdout are redirected to the endpoint 1. SOAP message arrives containing URL of IGS that's providing the input data. e.g. styx://myserver:9875/1 4. SOAP message is sent back to client, containing URL of new IGS, e.g. styx://server2:9875/4 5. As executable progresses, it writes state data (e.g. progress) to the IGS instance 6. Clients read state data from the IGS instance
11
Messaging in Inferno Our Inferno Grid Service is also a messaging system Similar to publish/subscribe system Very easy to implement and use Requires no firewall holes to be open on the client! Clients (message recipients) run no server processes Clients read from blocking files on the IGS The first read from a client yields the data immediately Subsequent reads block indefinitely until the data change Then the client gets the reply Inferno keeps track of multiple clients effortlessly
12
Application: correlation analysis Correlation analysis is an important tool in understanding the circulation of the oceans and atmosphere Particularly important for the science of data assimilation Our app calculates the correlation between timeseries of given quantities Could be salinity at a certain depth, temperature on a density surface, etc Reveals information about the ocean circulation, characteristic scales of physical processes, etc
13
Demonstration
14
Future work Create a toolkit for environmental scientists, based on this architecture Integrate with CDAT (Climate Data Analysis Tools) Well-known toolkit, but not good at handling large datasets Integrate with Inferno Grid Condor-like technology for batch processing Demo in the ReSC booth! Try out computational steering Should be relatively easy – clients write new parameters to the IGS and the executable picks them up
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.