USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman.

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n.
Sandra Gesing Division for Simulation of Biological Systems Eberhard-Karls-Universität Tübingen Portals for Life.
Sandra Gesing Eberhard-Karls-Universität Tübingen Requirements on a portal for MoSGrid (Molecular Simulation.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
Transparent Robustness in Service Aggregates Onyeka Ezenwoye School of Computing and Information Sciences Florida International University May 2006.
Ngu, Texas StatePtolemy Miniconference, February 13, 2007 Flexible Scientific Workflows Using Dynamic Embedding Anne H.H. Ngu, Nicholas Haasch Terence.
Business Process Orchestration
Workflows within Taverna Stuart Owen University of Mancester, UK
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Software Engineering Module 1 -Components Teaching unit 3 – Advanced development Ernesto Damiani Free University of Bozen - Bolzano Lesson 2 – Components.
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
SOA, BPM, BPEL, jBPM.
ESB Guidance 2.0 Kevin Gock
WorkPlace Pro Utilities.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
T Network Application Frameworks and XML Web Services and WSDL Sasu Tarkoma Based on slides by Pekka Nikander.
Composing Models of Computation in Kepler/Ptolemy II
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
Triana: Service-Oriented Examples Ian Taylor Cardiff University, and the Center for Computation and Technology LSU.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Architecting Web Services Unit – II – PART - III.
Going with the Flow Distributed Computing for Systems Biology Using Taverna Prof Carole Goble The University of Manchester, UK
Workflow and Triana Services Matthew Shields, e-Science Workflow Services, 3-5 December T r a ai n.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
Provenance challenge --- my Grid David De Roure University of Southampton Jun Zhao, Carole Goble and Daniele Turi University of Manchester.
Shannon Hastings Multiscale Computing Laboratory Department of Biomedical Informatics.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Software Engineering Prof. Ing. Ivo Vondrak, CSc. Dept. of Computer Science Technical University of Ostrava
95-843: Service Oriented Architecture 1 Master of Information System Management Service Oriented Architecture Lecture 7: BPEL Some notes selected from.
Ian Taylor, Cardiff Work-Flow Application Toolkit Eger Meeting Ian Taylor & Ian Wang Cardiff University, UK.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
What is Triana?. GAPGAP Triana Distributed Work-flow Network Action Commands Workflow, e.g. BPEL4WS Triana Engine Triana Controlling Service (TCS) Triana.
An Identity Crisis in the Life Sciences Jun Zhao, Carole Goble and Robert Stevens The University of Manchester, UK Thanks to: Tom Oinn, Matthew Pocock,
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Hwajung Lee.  Interprocess Communication (IPC) is at the heart of distributed computing.  Processes and Threads  Process is the execution of a program.
Taverna Workbench Stuart Owen University of Mancester, UK
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
Distributed Computing With Triana A Short Course Matthew Shields, Ian Taylor & Ian Wang.
First International Workshop on Portals for Life Sciences Sandra Gesing
Workflow Optimisation Services for e-Science Applications David W. Walker Cardiff University.
EScience Case Studies Using Taverna Dr. Georgina Moulton The University of Manchester
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
Slide 1 Service-centric Software Engineering. Slide 2 Objectives To explain the notion of a reusable service, based on web service standards, that provides.
On Using BPEL Extensibility to Implement OGSI and WSRF Grid Workflows Aleksander Slomiski Presented by Onyeka Ezenwoye CIS Advanced Topics in Software.
The Semantic Web, Service Oriented Architectures, the my Grid Experience Carole Goble
Workflows Description, Enactment and Monitoring in SAGA Ashiq Anjum, UWE Bristol Shantenu Jha, LSU 1.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
SDM Center Experience with Fusion Workflows Norbert Podhorszki, Bertram Ludäscher Department of Computer Science University of California, Davis UC DAVIS.
Selected Workflow and Semantic Experiences from my Grid Professor Carole Goble The University of Manchester, UK
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
An Introduction to Taverna caBIG monthly workspace call and Taverna, Franck Tanoh.
By Jeremy Burdette & Daniel Gottlieb. It is an architecture It is not a technology May not fit all businesses “Service” doesn’t mean Web Service It is.
1 Seminar on SOA Seminar on Service Oriented Architecture BPEL Some notes selected from “Business Process Execution Language for Web Services” by Matjaz.
Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft.
SOA (Service Oriented Architecture)
Service-centric Software Engineering
Software models - Software Architecture Design Patterns
Software Engineering with Reusable Components
A Semantic Type System and Propagation
A General Approach to Real-time Workflow Monitoring
New Tools In Education Minjun Wang
GGF10 Workflow Workshop Summary
Presentation transcript:

USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering Outline Scientific workflows Business workflows Different workflow systems –Taverna –Kepler –Triana –Askalon

USC Viterbi School of Engineering Ewa Deelman Applications today Complex –Involve many computational steps –Require many (possibly diverse resources) Composed of individual application components –Components written by different individuals –Components require and generate large amounts of data –Components written in different languages Reuse of individual intermediate data products Need to keep track of how the data was produced

USC Viterbi School of Engineering Workflow Instance Ewa Deelman, Collect image Adjust Color Adjust Color Adjust Color Co-Add image Visualize … Image 2 Image 1 Image n

USC Viterbi School of Engineering Business Workflows

USC Viterbi School of Engineering Business Workflows Designed to compose applications based on web services BPEL –Standard language for service interactions –Has many constructs to deal with the invocation of web services, including fault handling, and support for conditional logic.

USC Viterbi School of Engineering BPEL constructs : Blocks until a matching message is received. This is typically used to receive a message from the client or a callback from a partner web service. : Send a message in response to a message received via a : Perform an invocation on a web service. (one- way or request-response) : Assign a value to a variable. : Executes a list of activities sequentially in lexical order. : Executes the activities in parallel. : Used for looping until a criteria is true. : Select one branch for execution amongst a set of branches based on a value.

USC Viterbi School of Engineering Many BPEL engines Active bpel IBM BPEL4J Oracle BPEL Process Manager Microsoft Windows Foundation ….

USC Viterbi School of Engineering Scientific vs Business Workflows Large amounts of data Varied granularity of computations Large number of computations Often standalone components Non-programmers need to be able to compose them Need to provide provenance info Performance is important Deal with services across domains Do not deal with standalone application components Usually not very data intensive –Data can be easily sent between services Important to agree on standard interfaces so that MS & IBM can work together Focus on functionality/interoperability rather than performance

USC Viterbi School of Engineering Example of a business workflow

USC Viterbi School of Engineering Example of Scientific Workflow Workflow Specification Components –Standalone computations –Designed by different individuals

USC Viterbi School of Engineering Different workflow systems Taverna, a workbench for bioinformatics workflows Slides courtesy of Katy Wolstencroft

USC Viterbi School of Engineering The Community Problems Everything is Distributed –Data, Resources and Scientists Heterogeneous data Very few standards –I/O formats, data representation, annotation –Everything is a string! Integration of data and interoperability of resources is difficult

USC Viterbi School of Engineering Lots of Resources NAR 2007 – 968 databases

USC Viterbi School of Engineering Traditional Bioinformatics acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

USC Viterbi School of Engineering Cutting and Pasting Advantages: –Low Technology on both server and client side –Very Robust: Hard to break. –Data Integration happens along the way Disadvantages: –Time Consuming (and painful!) Can be repeated rarely Limited to small data sets. –Error Prone : Poor repeatability

USC Viterbi School of Engineering Pipeline Programming Advantages –Repeatable –Allows automation –Quick, reliable, efficient Disadvantages –Requires programming skills –Difficult to modify –Requires local tool and database installation –Requires tool and database maintenance!!!

USC Viterbi School of Engineering What we want as a solution A system that is: Allows automation Allows easy repetition, verification and sharing of experiments Works on distributed resources Requires few programming skills Runs on a local desktop / laptop

USC Viterbi School of Engineering my Grid as a solution my Grid allows the automated orchestration of in silico experiments over distributed resources from the scientist’s desktop Built on computer science technologies of: Web services Workflows Semantic web technologies

USC Viterbi School of Engineering Workflows –General technique for describing and enacting a process –Describes what you want to do, not how you want to do it –High level description of the experiment Repeat Masker Web service GenScan Web Service Blast Web Service

USC Viterbi School of Engineering Workflow language specifies how bioinformatics processes fit together. High level workflow diagram separated from any lower level coding – you don’t have to be a coder to build workflows. Workflow is a kind of script or protocol that you configure when you run it. Easier to explain, share, relocate, reuse and repurpose. Workflow Model Workflow is the integrator of knowledge The METHODS section of a scientific publication Workflows

USC Viterbi School of Engineering Workflow Advantages Automation –Capturing processes in an explicit manner –Tedium! Computers don’t get bored/distracted/hungry/impatient! –Saves repeated time and effort Modification, maintenance, substitution and personalisation Easy to share, explain, relocate, reuse and build Releases Scientists/Bioinformaticians to do other work Record –Provenance: what the data is like, where it came from, its quality

USC Viterbi School of Engineering Taverna Workflow Components Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available SOAPLAB Web Service Any Application Web Service e.g. DDBJ BLAST

USC Viterbi School of Engineering An Open World Open domain services and resources. Taverna accesses services Third party – we don’t own them – we didn’t build them All the major providers –NCBI, DDBJ, EBI … Enforce NO common data model. Quality Web Services considered desirable

USC Viterbi School of Engineering Adding your own web services SoapLabJava API Consumer import Java API of libSBML as workflow components

USC Viterbi School of Engineering Shield the Scientist – Bury the Complexity Workflow enactor Processor Plain Web Service Soap lab Processor Local Java App Processor Enactor Processor Bio MOBY Processor WSRF Processor Bio MART Styx client Processor R package... Scufl Model Taverna Workbench Workflow Execution Application Simple Conceptual Unified Flow Language

USC Viterbi School of Engineering Kepler Slides courtesy of Bertram Ludaesher

USC Viterbi School of Engineering Scientific Workflow Capture how a scientist works with data and analytical tools –data access, transformation, analysis, visualization –possible worldview: dataflow-oriented (cf. signal-processing) Scientific workflow (wf) benefits (compare w/ script-based approaches) : –wf automation –wf & component reuse –wf design, documentation –wf archival, sharing –built-in concurrency (task-, pipeline-parallelism) –built-in provenance support –distributed execution (Grid) support – …

USC Viterbi School of Engineering Ex: SEEK Ecological Niche Modeling Pipeline Scientific Workflow paradigm: –Reusable components (“actors”): a scientist’s verbs/actions –Top-level workflows ≈ conceptual representation of the science process, sentences in the scientist’s language –Sub-workflows ≈ increasing levels of detail Separation of concerns: –actors: what to do –parameters: configurable behavior –channels: dataflow, pipeline composition –directors: fix execution model, scheduling –semantic types: smart discovery, linking D Pennington, D Higgins, AT Peterson, M Jones, B Ludaescher, S Bowers. Ecological Niche Modeling using the Kepler Workflow System. Workflows for e-Science, Springer.

USC Viterbi School of Engineering Simple Kepler workflow using R (a statistics package) Data source from EcoGrid (metadata-driven ingestion) res <- lm(BARO ~ T_AIR) res plot(T_AIR, BARO) abline(res) R processing script

USC Viterbi School of Engineering Convert Archive Monitor Transfer Plumbing with Style … (Norbert Podhorszki UC Davis, Scott Klasky ORNL) Plasma physics simulation on 2048 processors on (LBL) –Gyrokinetic Toroidal Code (GTC) to study energy transport in fusion devices (plasma microturbulence) –Generating 800GB of data (3000 files, 6000 timesteps, 267MB/timestep), 30+ hour simulation run Under workflow control: –Monitor (watch) simulation progress (via remote scripts) –Transfer from NERSC to ORNL concurrently with the simulation run –Convert each file to HDF5 file –Archive files to 4GB chunks into HPSS

USC Viterbi School of Engineering Our Starting Point: Actor-Oriented Modeling Ports –each actor has a set of input and output ports –denote the actor’s signature –produce/consume data (a.k.a. tokens) –parameters are special “static” ports

USC Viterbi School of Engineering Actor-Oriented Modeling Dataflow Connections –unidirectional actor “communication” channels –connect output ports with input ports –for composing analysis pipelines

USC Viterbi School of Engineering Actor-Oriented Modeling Sub-workflows / Composite Actors –composite actors “wrap” sub-workflows –like actors, have signatures (i/o ports of sub-workflow) –hierarchical workflows (arbitrary nesting levels)

USC Viterbi School of Engineering Actor-Oriented Modeling Directors –define the execution semantics of workflow graphs –executes workflow graph (some schedule) –sub-workflows may have different directors –promotes reusability

USC Viterbi School of Engineering Models of Computation (A Wf Engineer’s Issue) Directors separate the concerns of orchestration and scheduling from conceptual design –Synchronous Dataflow (SDF) Statically analyzable: schedule, no deadlocks, fixed buffer requirements; executable as a single thread by the director. –Process Networks (PN) Generalizes SDF. Actors execute as separate threads/processes, with queues of unbounded size (Kahn/MacQueen networks). –Directed Acyclic Graph (DAG) Special case of SDF. No loops, no pipelining. –Continuous Time (CT) Connections represent the value of a continuous time signal at some point in time... Often used to model physical processes. –Discrete Event (DE) Actors communicate through a queue of events in time. Used for instantaneous reactions in physical systems. –…

USC Viterbi School of Engineering Everything is a service / actor…

USC Viterbi School of Engineering Smart Discovery Find a component (here: an actor) in different locations (“categories”) … based on the semantic annotation of the component (or its ports) Browse for ComponentsSearch for Component NameSearch for Category / Keyword

USC Viterbi School of Engineering Behold the Beauty of Scientific Workflow Design Author: Kristian Stevens, UC Davis

USC Viterbi School of Engineering … Shimology Part 2: the ugly truth inside Author: Kristian Stevens, UC Davis

USC Viterbi School of Engineering Triana Slides courtesy of Ian Taylor

USC Viterbi School of Engineering Triana Focus Two core underlying focuses: –Interactive graphical programming of the distributed tasks - complex editing Intuitive drag/drop flexible editing - copy/paste services, wizards for creating tools/toolboxes, user interfaces, adding nodes and multi-level grouping. Has been used as a “graphical editor” for other languages, e.g. DAG, VDLx (DAX in progress). –Heterogeneous workflows - Bridge the gap between different distributed environments Use cross-environment interfaces led to integration with GAT (pre SAGA), GAP

USC Viterbi School of Engineering Types of Uses –For fine-grained operations, specifying dataflow for local operations –Or course-grained composition of a distributed workflow –Or Both - can connect heterogeneous tools (e.g. Web services, Java units, Jxta services) on one workflow Has been used as a dataflow system, a distributed-workflow environment, workflow-management system, an automated scripting tool, workflow editor.

USC Viterbi School of Engineering Current Capabilities Local Java Units –600 units in signal, image, audio, text processing, complete math/stats toolboxes etc –Common units - flexible importers/exporters, graphing, duplicators –Data types - strong data types for a number of domains - includes run-time checking Distributed Integration –GAT - Java GAT implementation - graphical representation of GAT primitives - supports GRAM, GridFTP, etc –GAP - SOA publish, find, bind triad of operations Bindings: Jxta, P2PS, Web Services, WS-RF –Group unit deployment Legacy Applications –Can incorporate legacy applications easy (using local GAT adaptor) - standard file in/out interface

USC Viterbi School of Engineering Distributed Work-flow Workflow Commands Workflow, e.g. BPEL4WS Triana Engine Triana Service & Engine Remote Legacy Applications Distributed services Distributing Triana Units or Groups (Java) Integrating Legacy applications into Workflow Integrating Web Services or P2P Services GAP GAT & GAP Upperware Middleware

USC Viterbi School of Engineering Triana, the GAT and the GAP P2PSJXTA Web Services GAP Interface UDDI SOAP P2PS Discovery P2PS Pipes JXTA Discovery JXTA Pipes GAT Interface Condor Globus RLS Unicore PBS GridLab GRMS SGESSH WSRF LDR.NET Other.. GridFTP Grid Computing: Job Submission, File services A Graphical Grid Computing Environment or Portal Service Based Computing: Deployment, discovery and communication with distributed services e.g. P2P and (GSI) Web services

USC Viterbi School of Engineering Audio Processing (Groups)

USC Viterbi School of Engineering Group Units

USC Viterbi School of Engineering GAT Interface Main deliverable of Gridlab Application-level interface With a set of adapters –That adapt the interface to an underlying capability Versions in C++ and Java Pre-cursor to SAGA - Simple API for Grid Applications

USC Viterbi School of Engineering Grid FTP Adapter Grid FTP Connection Jxta File Adapter Jxta Pipe GAT Adapters: Example GAT API Resource Management Streaming/ Comms File Management Job Management Monitoring Collection Management GAT Engine P2P Environment Copy File(Machine A, Machine B) Grid Environment

USC Viterbi School of Engineering GAP Interface Motivation by GAT A Simple Service based API, for –Service Deployment, –Service Discovery –Pipe Based Communication Static application interface with multiple middleware bindings –P2PS (name…?) –JXTA –Web services P2PSJXTA Web Services GAP Interface UDDI SOAP P2PS Discovery P2PS Pipes JXTA Discovery JXTA Pipes

USC Viterbi School of Engineering Deploying and Connecting To Remote Services Running services are automatically discovered via the GAP Interface, and appear in the tool tree User can drag remote services onto the workspace and connect cables to them like standard tools (except the cables represent actual JXTA/P2PS pipes) Remote Services

USC Viterbi School of Engineering Web Service Discovery Triana allows users to query UDDI repositories Alternatively, users can import services directly from WSDL

USC Viterbi School of Engineering Complex Data Types Users can build their own interface for creating/mediating between complex types Alternatively, Triana can dynamically generate an interface from the WSDL2Java generated bean class

USC Viterbi School of Engineering Askalon Slides Courtesy of Thomas Fahringer

USC Viterbi School of Engineering Goal: simple, efficient, effective application development for the Grid Invisible Grid Application Modeling (UML) and programming at a high level of abstraction (AGWL) Semantics technologies Semi-automatic deployment SOA-based runtime environment with stateful services Analysis and optimization of performance, costs and reliability ASKALON Application Development and Runtime Environment for the Grid

USC Viterbi School of Engineering WSRF ASKALON Workflow Composition and Runtime Environment Execution Engine Execution Engine Scheduler Resource Manager Resource Manager activity activity The Grid Globus toolkit UML-based Workflow Composition AGWL Runtime Middleware Services Data Repository Data Repository Job Performance Analysis

USC Viterbi School of Engineering Austrian Grid karwendel 80 CPUs 272 CPUs altix1 64 CPUs 16 CPUs CA UniVie RA Uni-Linz RA UIBK MAUI Uni-Sbg 16 CPUs MAUI ZID Grid gescher FHV RA RA` hydra altix1 16 CPUs HPC 16 CPUs grid 21 CPUs Torque PBS SGE PBS/Torque SGE Torque schafberg 16 CPUs PBS RA 517 CPUs distributed across 5 cities and over 20 parallel computers Parallel computer#CPUClockArchitectureLocation altix1.jku hydra.gup schafberg.sbg grid.fhv.at gescher.vcpc karwendel.dps altix1.uibk hc-ma.uibk zid-grid ITA2 Athlon ITA2 Xeon Opteron ITA2 Opteron P ccNUMA COW ccNUMA COW ccNUMA COW NOW Linz Salzburg Vorarlberg Vienna Innsbruck

USC Viterbi School of Engineering ASKALON Workflows Activity = basic or atomic unit of computation Activity type –Functional description of the activity Signature specified by data input/output ports –Semantically meaningful name E.g. matrix multiplication, Gaussian elimination, povray, png2yuv, ffmpeg, FFT, LAPW, WASIM, … –Implementation-independent Workflow = collection of activity types interconnected through control flow and data flow dependencies –Plus some advanced constructs Activity deployment –Binds an activity type to a concrete installed implementation –Description how to instantiate the activity –Registered by the application provider in a special registry of the Resource Management service

USC Viterbi School of Engineering ASKALON: Abstract Grid Workflow Language (AGWL) Atomic activities –abstract from the real implementation, e.g. Web services, legacy applications –Sequential constructs: –Conditional constructs:, Basic compound activities –Loop constructs:,,, –Directed Acyclic Graph constructs: Advanced compound activities –Parallel section constructs: –Parallel loop constructs:, Data flow constructs –dataIn/dataOut ports, collections, data repositories, data set distributions, etc. Properties –provide hints about the behavior of activities –Predicted I/O data size, computational complexity, non-functional parameters Constraints –Optimization metric (e.g. performance, cost, fault tolerance) –Scheduling constraints (e.g. compute architecture, disk, memory)

USC Viterbi School of Engineering ASKALON Workflow Development Stack Portal AGWL CGWR Grid Application Developer ASKALON Middleware Abstract Grid Workflow Language UML Workflow UML model XML Activity Type Java ASKALON Activity Deployment Grid Activity Instance Concretizing Concrete Grid Workflow Representation

USC Viterbi School of Engineering Real-world Scientific Workflows with ASKALON WIEN2k Material science application Technical University of Vienna –Institute of Theoretical Chemistry Seven activity types Over 500 activity instances Statically unknown number of sequential loop iterations

USC Viterbi School of Engineering Resource Management Resource brokerage –Interface to MDS information service for resource discovery –Selection based on matchmaking Advance reservation –Useful for co-allocation purposes GLARE –Registry of activity deployments Activity deployment –Binds an abstract activity type to a concrete implementation –Refers to an installed executable or a deployed Web/Grid service –Description how to instantiate the activity –Registered in GLARE by the application provider

USC Viterbi School of Engineering Askalon Runtime Environment Dynamic Bindings of Workflow Abstract - Concrete Node 1 Nod 2 Node 3 Node 4 Abstract Workflow Web Services Executables A G A A D CB A B AB y x yx Activity Type (abstract) Activity Deployment AB y x AB y x Concrete Workflow Resource Manager

USC Viterbi School of Engineering Composite Activities Composite activity –Sequence –Parallel activities –Conditional activities: if, switch –Sequential loops: for, while, for each –Parallel loops: parallel for, parallel for each –Sub-workflows data flow control flow A1 A2 Sequence

USC Viterbi School of Engineering If-then-else (2) (4) (3) A1 A2 A0 A3 (1) thenelse

USC Viterbi School of Engineering Execution Engine Workflow controller –Converts XML-based specification (AGWL) to internal representation –Executes the workflow according to control and data flow dependencies One separate Controller for every workflow instance Event system –Other components can subscribe to the internal events –e.g. logging, controller, tool (WS-Notification),... Logging and database –For post-mortem performance analysis GT4 WSRF wrapper –Send WS-Notifications to the portal  Scheduler –Receives jobs ready to execute from the task loop –Retrieves the resources with available from GridARM –Assigns the task to the best machine according to the selection criteria oClock speed * no free processors oPrediction information, memory available, … Core Task Loop Fault Handler Controller AGWL Interpreter Event System GT4 WSRF Service Logging & Database Scheduler Execution / Launching Framework GridARM AGWL