Composing workflows in the environmental sciences using Web Services and Inferno Jon Blower, Adit Santokhee, Keith Haines Reading e-Science Centre Roger.

Slides:



Advertisements
Similar presentations
PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya
Advertisements

Welcome to Middleware Joseph Amrithraj
Operating System Structures
Remote Procedure Call (RPC)
BARRODALE COMPUTING SERVICES LTD. Managing and serving large volumes of gridded spatial environmental data Adit Santokhee, Chunlei Liu,
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
Distributed components
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,
Data streaming, collaborative visualization and computational steering using Styx Grid Services Jon Blower 1 Keith Haines 1 Ed Llewellin 2 1 Reading e-Science.
Exploring large marine datasets using an interactive website and Google Earth Jon Blower, Dan Bretherton, Keith Haines, Chunlei Liu, Adit Santokhee Reading.
The NERC Cluster Grid Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre Environmental Systems Science Centre.
Federated Hierarchical Filter Grids STTR-funded project with Indiana, Caltech and Deep Web Technologies A Grid infrastructure for Data Analysis Integrates.
Communication in Distributed Systems –Part 2
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
1 port BOSS on Wenjing Wu (IHEP-CC)
The Old World Meets the New: Utilizing Java Technology to Revitalize and Enhance NASA Scientific Legacy Code Michael D. Elder Furman University Hayden.
GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Open Web App. Purpose To explain Open Web Apps To explain Open Web Apps To demonstrate some opportunities for a small business with this technology To.
Running Climate Models On The NERC Cluster Grid Using G-Rex Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre Environmental.
Lecture 15 Introduction to Web Services Web Service Applications.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
Building simple, easy-to-use grids with Styx Grid Services and SSH Jon Blower, Keith Haines Reading e-Science Centre Environmental Systems Science Centre.
INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.
CH1. Hardware: CPU: Ex: compute server (executes processor-intensive applications for clients), Other servers, such as file servers, do some computation.
Oracle 10g Database Administrator: Implementation and Administration Chapter 2 Tools and Architecture.
Ophelia User friendly Network Multi-player game engine Albert Öhrling.
WebServices, GridServices and Firewalls Matthew J. Dovey Technical Manager Oxford e-Science Centre
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
Styx Grid Services: Lightweight, easy-to-use middleware for e-Science Jon Blower Keith Haines Reading e-Science Centre, ESSC, University of Reading, RG6.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Processes Introduction to Operating Systems: Module 3.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Hwajung Lee.  Interprocess Communication (IPC) is at the heart of distributed computing.  Processes and Threads  Process is the execution of a program.
X-WindowsP.K.K.Thambi The X Window System Module 5.
WHIP - Workflow Hosted in Portals Kurt Mueller and Andrew Harrison School of Computer Science, Cardiff And Ian Taylor School of Computer Science, Cardiff.
NQuery: A Network-enabled Data-based Query Tool for Multi-disciplinary Earth-science Datasets John R. Osborne.
Interactive Workflows Branislav Šimo, Ondrej Habala, Ladislav Hluchý Institute of Informatics, Slovak Academy of Sciences.
BOINC: Progress and Plans David P. Anderson Space Sciences Lab University of California, Berkeley BOINC:FAST August 2013.
Distributed Computing With Triana A Short Course Matthew Shields, Ian Taylor & Ian Wang.
Preliminary Ocean Project Page 1 WGISS SG May 15, C. Caspar G. Tandurella P. Goncalves G. Fallourd I. Petiteville Preliminary Ocean Project Phase.
Grid Computing Environment Shell By Mehmet Nacar Las Vegas, June 2003.
Using Federated Services with Triana Matthew Shields Cardiff University.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
A Demonstration of Collaborative Web Services and Peer-to-Peer Grids Minjun Wang Department of Electrical Engineering and Computer Science Syracuse University,
1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.
Application Communities Phase II Technical Progress, Instrumentation, System Design, Plans March 10, 2009.
AMH001 (acmse03.ppt - 03/7/03) REMOTE++: A Script for Automatic Remote Distribution of Programs on Windows Computers Ashley Hopkins Department of Computer.
Using Google Maps and other OpenSource GIS software for displaying geospatial data Jon Blower, Dan Bretherton, Keith Haines, Chunlei Liu, Adit Santokhee.
Intro to Web Services Dr. John P. Abraham UTPA. What are Web Services? Applications execute across multiple computers on a network.  The machine on which.
Grid Remote Execution of Large Climate Models (NERC Cluster Grid) Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
MSF and MAGE: e-Science Middleware for BT Applications Sep 21, 2006 Jaeyoung Choi Soongsil University, Seoul Korea
Reading e-Science Centre Technical Director Jon Blower ESSC Director Rachel Harrison CS Director Keith Haines ESSC Associated Personnel External Collaborations.
Holding slide prior to starting show. GECEM: Grid-Enabled Computational Electromagnetics David W. Walker School of Computer Science Cardiff University.
Reading e-Science Centre
Client-Server Communication
File System Implementation
Cross Platform Development using Software Matrix
Tools and Services Workshop Overview of Atmosphere
#01 Client/Server Computing
Ch 15 –part 3 -design evaluation
Federated Hierarchical Filter Grids
Gordon Erlebacher Florida State University
Lecture 4: File-System Interface
#01 Client/Server Computing
GGF10 Workflow Workshop Summary
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Composing workflows in the environmental sciences using Web Services and Inferno Jon Blower, Adit Santokhee, Keith Haines Reading e-Science Centre Roger Peppé, Charles Forsyth Vita Nuova Holdings Ltd

Summary  We have devised a system that allows data to be streamed directly from Web Service to Web Service  Also allows monitoring of progress and other state information  Uses the Inferno operating system  Workflows can be executed in any WS-based workflow engine, e.g. Triana

Motivation  Environmental scientists work with large datasets (~TB)  Datasets are often remote  Would like to perform compute-intensive tasks and sophisticated 3-D visualization  Would like to be able to assemble workflows based on remote services  BUT large datasets raise issues  Need to find efficient way of moving these datasets between services  Services are long-running, hence would like to monitor progress  Web Services get us only part of the way there

Workflows  Typical workflow will be of the form “extract -> process -> visualize”  i.e. small number of services in a linear workflow or pipeline  Each service likely to be data-intensive and long-running  Would like to use Web Service based systems, however:  The data should not have to pass through the workflow engine  Large datasets should not be placed on SOAP message, either in the XML or as attachments  Could pass around pointers to the data (e.g. URLs) but this would involve writing output data to disk in some temporary cache  Ideally we want to stream data between services.

Data movement Client (or WF engine) Service A Service B Service C Path of SOAP messages Desired path of data

Data streaming and Inferno  If all services in the workflow were running as Unix filters on the same machine, we could write something like:  extract | process | render  But we want to pipe the data across a network  Plus we’d like to be able to monitor progress  Inferno is an operating system that’s designed for distributed computing  It provides an easy way of doing this streaming in a distributed system  The alphabet shell  But fairly steep learning curve for new users

Brief intro to Inferno  Inferno is built from the ground up for distributed computing  Extremely lightweight (~ 1MB RAM) so can run as emulated application on multiple platforms (Linux, Windows, Solaris...)  Hence it is a powerful base for Grid middleware  Everything in Inferno is represented as a file or set of files  cf. /dev/mouse in Unix  So to create a distributed system, just have to know how to share “files”  The Styx protocol is used for all file manipulations  Very lightweight (only 13 commands)  Versions exist for C, Java, Python, (easy to implement for more)

Our solution  We wanted a system that uses the attractive features of Inferno in a Web Services environment  Hides complexity, reduces learning curve  We want to implement direct streaming of data and progress monitoring  We'd like to make it easy to wrap binary executables with little or no modification  Our solution is to create an “Inferno Grid Service” (IGS) that wraps the binary.  Shares many features with an OGSI-type service  We then create a thin Web Service wrapper around the IGS

An Inferno Grid Service / Root of the service – associated with a given URL (e.g. styx://myserver:9875/) clone Read the clone file to create a new service instance 0/ 1/ 2/ ctl params progress status endpoint/ in out block/ progress status styx://myserver:9875/0/endpoint/out These files maintain state data

Web Service Interface Inferno Grid Service Anatomy of a service Binary executable 2. New instance of Inferno Grid Service is created 3. Executable is started. stdin and stdout are redirected to the endpoint 1. SOAP message arrives containing URL of IGS that's providing the input data. e.g. styx://myserver:9875/1 4. SOAP message is sent back to client, containing URL of new IGS, e.g. styx://server2:9875/4 5. As executable progresses, it writes state data (e.g. progress) to the IGS instance 6. Clients read state data from the IGS instance

Messaging in Inferno  Our Inferno Grid Service is also a messaging system  Similar to publish/subscribe system  Very easy to implement and use  Requires no firewall holes to be open on the client!  Clients (message recipients) run no server processes  Clients read from blocking files on the IGS  The first read from a client yields the data immediately  Subsequent reads block indefinitely until the data change  Then the client gets the reply  Inferno keeps track of multiple clients effortlessly

Application: correlation analysis  Correlation analysis is an important tool in understanding the circulation of the oceans and atmosphere  Particularly important for the science of data assimilation  Our app calculates the correlation between timeseries of given quantities  Could be salinity at a certain depth, temperature on a density surface, etc  Reveals information about the ocean circulation, characteristic scales of physical processes, etc

Demonstration

Future work  Create a toolkit for environmental scientists, based on this architecture  Integrate with CDAT (Climate Data Analysis Tools)  Well-known toolkit, but not good at handling large datasets  Integrate with Inferno Grid  Condor-like technology for batch processing  Demo in the ReSC booth!  Try out computational steering  Should be relatively easy – clients write new parameters to the IGS and the executable picks them up