Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Background information Formal verification methods based on theorem proving techniques and model­checking –to prove the absence of errors (in the formal.
Asa MacWilliams Lehrstuhl für Angewandte Softwaretechnik Institut für Informatik Technische Universität München Dec Software.
Lecturer: Sebastian Coope Ashton Building, Room G.18 COMP 201 web-page: Lecture.
Hybrid-Type Extensions for Actor-Oriented Modeling (a.k.a. Semantic Data-types for Kepler) Shawn Bowers & Bertram Ludäscher University of California, Davis.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Ngu, Texas StatePtolemy Miniconference, February 13, 2007 Flexible Scientific Workflows Using Dynamic Embedding Anne H.H. Ngu, Nicholas Haasch Terence.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
February 12, 2009 Center for Hybrid and Embedded Software Systems Encapsulated Model Transformation Rule A transformation.
Establishing the overall structure of a software system
February 12, 2009 Center for Hybrid and Embedded Software Systems Model Transformation Using ERG Controller Thomas H. Feng.
Software Issues Derived from Dr. Fawcett’s Slides Phil Pratt-Szeliga Fall 2009.
System-Level Types for Component-Based Design Paper by: Edward A. Lee and Yuhong Xiong Presentation by: Dan Patterson.
Course Instructor: Aisha Azeem
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
January, 23, 2006 Ilkay Altintas
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Composing Models of Computation in Kepler/Ptolemy II
An Introduction to Software Architecture
DISTRIBUTED COMPUTING
Architectural Design portions ©Ian Sommerville 1995 Establishing the overall structure of a software system.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life UC DAVIS Department of Computer Science The Kepler/pPOD Team Shawn.
Workflow Topics for the Next- Generation SDM-Center Ilkay Altintas Bertram Ludäscher San Diego Supercomputer Center.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher.
Hyper/J and Concern Manipulation Environment. The need for AOSD tools and development environment AOSD requires a variety of tools Life cycle – support.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
Dimitrios Skoutas Alkis Simitsis
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
Architectural Design Yonsei University 2 nd Semester, 2014 Sanghyun Park.
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
1 Class Diagrams. 2 Overview Class diagrams are the most commonly used diagrams in UML. Class diagrams are for visualizing, specifying and documenting.
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
CSC480 Software Engineering Lecture 10 September 25, 2002.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Formal Verification. Background Information Formal verification methods based on theorem proving techniques and model­checking –To prove the absence of.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
Lecture VIII: Software Architecture
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
SDM Center Experience with Fusion Workflows Norbert Podhorszki, Bertram Ludäscher Department of Computer Science University of California, Davis UC DAVIS.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Slide 1 Chapter 8 Architectural Design. Slide 2 Topics covered l System structuring l Control models l Modular decomposition l Domain-specific architectures.
Integrating and Extending Workflow 8 AA301 Carl Sykes Ed Heaney.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Defects of UML Yang Yichuan. For the Presentation Something you know Instead of lots of new stuff. Cases Instead of Concepts. Methodology instead of the.
TensorFlow– A system for large-scale machine learning
Unified Modeling Language
Hierarchical Architecture
SDM workshop Strawman report History and Progress and Goal.
Software models - Software Architecture Design Patterns
Software Engineering with Reusable Components
Metadata Framework as the basis for Metadata-driven Architecture
An Introduction to Software Architecture
A Semantic Type System and Propagation
Overview of Workflows: Why Use Them?
Presentation transcript:

Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow design –… and other research issues

What is a Scientific Workflow (SWF)? Model the way scientists work with their data and tools –Mentally coordinate data export, import, analysis via software systems Scientific workflows emphasize data flow ( ≠ business workflows) Metadata (incl. provenance info, semantic types etc.) is crucial for automated data ingestion, data analysis, … Goals: –SWF automation, –SWF, component reuse –SWF design & documentation  making scientists ’ data analysis and management easier!

What we use SWF for … Short answer: Everything –includes making coffee (tea ceremonies are harder) Kinds of workflows (not disjoint): –Plumbing: Stage files, submit batch jobs, monitor progress, move files off XT3 to analysis and viz cluster, archive, steer computation, … Ex: Fusion simulation, Astrophysics (supernova simulation), … your laptop backup??? –Knowledge discovery workflows: automate repetitive data access, retrieval, custom analysis (e.g. Blast), generic steps (PCA, cluster analysis,..), Do this in ways that are meaningful to the scientist Ex: PIW, Motif analysis, NDDP, … –Conceptual modeling workflows: what the heck is XYZ doing? Reverse engineering of processes and information flows at all levels, in order to optimize, we need to understand first Ex: napkin drawing workflows to get an overview, refine design from abstract to executable (top-down), or generalize from the concrete/legacy to the abstract (bottom-up); data-driven, task- driven,..

Why not just a Python script? Users who might be able to define, reuse, modify, specialize WFs might not be able to do the same for Python scripts But wait, there’s more: –Modular reuse –Debugging and monitoring of WF execution easy to “tee” (“man tee” for you windows guys ;-) –Automated Provenance Mgmt –Semantic types –From integrated WF modeling (ER + dataflow + co- registrations) to execution, optimization, archival …

Business workflows born-again? Yes, there are similarities –And we can learn from BWF! E.g. transactions! But also big differences: –SWF: data-flow oriented streaming/pipelined execution cf. signal processing (see also COM later) popular MoC: PN –BWF: task- and control-flow oriented popular MoC: Petri-Net? CSP?

Sample BWFs Focus is on … –Tasks –Control-flow –Work items Useful stuff: –Transactions! –How to handle complex control- flow …

Pop Quiz! BWF? SWF?

And the answer is …

Click here for “Oracle” (or another one)

Dataflow it is!

The Dataflow Difference

Data/Process/Provenance Central

BUY ME!!

A Signal Processing Pipeline

Some Terminology (tentative) Workflow definition W (  WF graph we see) –partial specification of a workflow (cf. program) –parameters P need to be instantiated –data-bindings D can be viewed as special parameters Model of Computation (MoC) –Looking at W, P, D we still not know how to execute W(P,D) to compute result R –A MoC is an algorithm telling us how to apply W on P and D to obtain R. –Examples: MoC TM (Turing Machine): –given program P and input I, we know what to do MoC PN (Process Network): –Network of independent processes, communicating through (infinite) unidirectional buffers (queues), prefix-monotonic behavior; given a PN and an input stream and prefix-monotonic, deterministic actors, the output stream is determined! (lots of flexibility for execution!) MoC SDF (Synchronous Dataflow): –Similar to PN, but actors must statically declare there token production/consumption rates; solving for pos. int. solutions of balance equations (“LGS”) yields static schedule guaranteeing fixed buffer size

Some Terminology (tentative) Model of Computation (MoC) WF Run: completed computation WF Execution: ongoing computation Computation graph: graph data structure keeping track of which token has been computed from which other one(s) –Simple examples: evaluating an arithmetic expression; running a “job DAG” –But keeping track of “real dependencies” can be tricky Ex: output tuples of an SQL query have “witness tuples” in multiple relations; clear for positive existential queries; what are witnesses for universal and negated queries? R = A \ B ; witnesses anybody? Similar to the notion of “proof tree” in logic (and LP); negation-as- failure looms it’s ugly (beautiful?) head!

Research Area: Provenance (Abstract) Use Cases –“Total Recall”: capture everything the MoC can observe … and more: MoC-inherent plus addtl. observables –Example: time-stamp token-in, token-out events  benchmark actor exec time, data movement time, … –The 7 W’s: Who, What, Where, Why, When, Which, (W)how (C. Goble) –Smart Re-run: after Pause or Stop, followed by parameter changes: rerun relevant parts –Fault tolerance, crash recovery (cf. checkpointing) –Result interpretation and post-mortem analysis Research Question: –Given a use case (as a query U) and a provenance schema PS, can U be answered using PS? (related to query answering using views – a reasoning problem!) –Ultimately: design PS with U in mind! Also: optimize/specialize PS if U is known/limited –Note: the MoC can make a difference! For example, some MoCs have explicit notion of “firing” or might exploit actor declarations (“I’m a function! I have no state!”) This means is relevant e.g. for checkpointing (Need to save state or not? When to save state..)

Research Area: WF/Dataflow Design Collection-Oriented Modeling (COM) – Assembly line metaphor + Signal Processing + XML + … Streams are nested collections (  XML) Stream data schema is “registered” to a WF data model (really need this) Actor “picks up” only certain parts of the stream: scope Actor declares how within the scope is changed: delta Gives rise to new notions of type and new problems of type inference (using scope, delta, workflow structure etc.) –Advantages: Less “messy” WFs (more linear, less branching) “Add-only” mode (inject new derived information); augmentation instead of transformation Tagging data for downstream processing (instead of “bombing”, pass on “dirty” / faulty / strange data with a relevant tag Pipelined parallelism (can stream an array)

Research: WF Design ER model primitives: –Entity (-type), attribute, relationship (-type) SWF model primitives?? –Actors, directors (MoC), … –Lots of new “types”: Conventional data type (Java style) Polymorphic types w/ type variables (Haskell style) Semantic type (formal annotations in logic relative to a controlled vocabulary or knowledge base) Hybrids A “theory of adapters” !?

hand-crafted control solution; also: forces sequential execution! designed to fit hand-crafted Web-service actor Complex backward control-flow No data transformations available [Altintas-et-al-PIW-SSDBM’03]

A Scientific Workflow Problem: More Solved (Computer Scientist’s view) Solution based on declarative, functional dataflow process network (= also a data streaming model!) Higher-order constructs: map (f)  no control-flow spaghetti  data-intensive apps  free concurrent execution  free type checking  automatic support to go from piw(GeneId) to PIW := map (piw) over [GeneId] map (f)-style iterators Powerful type checking Generic, declarative “programming” constructs Generic data transformation actors Forward-only, abstractable sub- workflow piw(GeneId)

A Scientific Workflow Problem: Even More Solved (domain&CS coming together!) map(GenbankWS) Input: {“NM_001924”, “NM020375”} Output: {“CAGT…AATATGAC",“GGGGA…CAAAGA“}

Research Problem: Optimization by Rewriting Example: PIW as a declarative, referentially transparent functional process  optimization via functional rewriting possible e.g. map(f o g) = map(f) o map(g) Technical report &PIW specification in Haskell map(f o g) instead of map(f) o map(g) Combination of map and zip

Job Management (here: NIMROD) Job management infrastructure in place Results database: under development Goal: 1000’s of GAMESS jobs (quantum mechanics)

Kepler Coupling Components & Codes Types of Coupling … –Loosely coupled (“1 st Phase”) Web Services (SPA, GEON, SEEK, …), ssh actors,.. + reusability (behavorial polymorphism) + scalability (# components) – efficiency –Tight(er) coupling (“2 nd Phase”) Via CCA (SciRUN-2, Ccaffeine, …) (Cipres uses CORBA) HPC needs: code-coupling as efficient & flexible as possible (e.g. Scott’s challenges…) –memory-to-memory (single node or shared memory), –MPI (multiple-nodes) –optimizations for transfer of data & control (streaming, socket-based connections)

Accord-CCA: Ccaffeine w/ Self-Managed Behavior Source: Hua Liu and Manish Parashar cf. w/ mobile models, reconfiguration in Ptolemy II … begging for a Kepler design and implementation …

Fault Tolerance & Maintenance Challenges

Workflow Templates and Patterns New Ingredients Proposed Layered Architecture work w/ Anne Ngu, Shawn Bowers, Terence Critchlow

Use Ideas from Fault Tolerant Shell Source: Douglas Thain, Miron Livny The Ethernet Approach to Grid Computing Good ideas in ftsh; some might be (semi-)low hanging fruits for Kepler …

Use of Semantics in SWF… “Smart” Search –Concept-based, e.g., “find all datasets containing biomass measurements” Improved Linking, Merging, Integration –Establishing links between data through semantic annotations & ontologies –Combining heterogeneous sources based on annotations –Concatenate, Union (merge), Join, etc. Transforming –Construct mappings from schema S1 to S2 based on annotations Semantic Propagation –“Pushing” semantic annotations through transformations/queries

Typing Workflow Components Because ontologies can get large and complicated, there is a built in browser for navigating through and choosing the concept that fits the port. The Semantic Type Editor allows the user to assign one or more semantic types to the component or to the component’s input and output ports. In the simplest case, a semantic type is a class taken from an OWL-DL ontology. Multiple types define a conjoined concept expression. The above- right screenshot shows a user assigning semantic types to the dataset and the above-left screenshot shows the user assigning an ontology class to the output port (dataset attribute) labeled “Plot.” A simple ontology browser is provided in Kepler for navigating a classified OWL-DL concept hierarchy and ontology properties. Classes can be searched for and selected. Selecting a class assigns it as the corresponding semantic type. Semantic Type Editor is used to assign one or more semantic types to the component or to the component’s input and output ports. In the simplest case, a semantic type is a class taken from an OWL-DL ontology. Multiple types define a conjoined concept expression. A simple ontology browser is provided in Kepler to navigate a classified OWL-DL ontology. Classes can be searched for and selected as a semantic type.

More on Semantic Annotation Initial Version Supports: Actor-level and port-level annotations Annotations are stored in actor’s MoML definition (as new “semantic type” properties) Creation of composite ports (i.e., “virtual” ports grouping a set of underlying ports) Regular and composite ports may have multiple annotations (conjunction) Annotations can be drawn from multiple ontologies An annotated composite port

More on Semantic Annotation Currently Adding: “Semantic Link” Annotations for annotation of ports via ontology properties –E.g, hasLat(point1, lat1) –Supported in MoML, not yet in tool Simple condition “filters” in port semantic annotations –E.g., if attribute height > 0 then biomass is annotated as AboveGroundBiomass Incorporating instances/values in semantic links –E.g., hasUnit(biomass, celsius) Suggesting additional annotations based on given ones –suggesting/guessing ways to “fill in” given annotations –E.g., possible semantic links Templates and ontology “views” –To help specify common annotation patterns Semantic Links

Checking Type Constraints Kepler can statically perform semantic and structural type checking of connections. A type checker allows the user to see potentially mismatched port connections as well as known type conflicts before workflow execution. The user can navigate the unsafe and potentially unsafe channels using the Kepler Type Checker dialog. When a channel is selected: (a) it is highlighted on the canvas, (b) the structural type and status is shown (here, the channel is structurally well typed), and (c) the semantic type and status is shown (here, the connection produce a semantic type error).

Kepler Actor-Library Ontology-based actor organization / browsing Customizable libraries based on ontologies Text search with concept-based expansion Users can discover ImageJ using various search terms. Here, ImageJ shows up in multiple tree locations based on its given annotations. The library search permits text- based matching against the component’s metadata (its given name and certain properties), expanded with concept matches.

Semantic Searching Kepler provides a more advanced ontology-based search mechanism. Users can start the Semantic Search dialog, where components can be search for based on their semantic types. The Semantic Search dialog allows a user to search components by any combination of actor, input, and output semantic types.

Structural Type (XML DTD) Annotations S 1 (life stage property) S 2 (mortality rate for period) S 2 (mortality rate for period) P1P1 P2P2 P4P4 P3P3 P5P5 root population = (sample)* elem sample= (meas, lsp) elem meas= (cnt, acc) elem cnt= xsd:integer elem acc= xsd:double elem lsp= xsd:string 44, Eggs … root cohortTable= (measurement)* elem measuremnt= (phase, obs) elem phase= xsd:string elem obs= xsd:integer Eggs 44,000 … structType(P 2 ) structType(P 3 ) Source: [Bowers-Ludaescher, DILS’04]

Ontology-Guided Data Transformation Source Service Source Service Target Service Target Service PsPs PtPt Semantic Type P s Semantic Type P t Structural Type P t Structural Type P s Desired Connection Compatible (⊑)(⊑) Structural/Semantic Association Structural/Semantic Association Correspondence Generate (Ps)(Ps) (Ps)(Ps) Ontologies (OWL) Transformation Source: [Bowers-Ludaescher, DILS’04]

WF-Design: Adapters for Semantic & Structural Incompatibility Adapters may: –be abstract (no impl.) –be concrete –bridge a semantic gap –fix a structural mismatch –be generated automatically (e.g., Taverna’s “list mismatch”) –be reused components (based on signatures) C1C1 C1C1 D1D1 C1C1 C2C2 CDC CD D DD C2C2 C2C2 D2D2 f2f2 f1f1 [S] S T f1f1 [T] f2f2 map f2f2 f1f1 [[S]] S T f1f1 [[T]] f2f2 map Source: [Bowers-Ludaescher, ER’05]

Additional Design Primitives for Semantic Types Extended TransformationsStarting WorkflowResulting Workflow t 9 : Actor Semantic Type Refinement (T T) T t 12 : I/O Constraint Strengthening (    ) t 10 : Port Semantic Type Refinement (C C, D D) C t 14 : Adapter Insertion T t 11 : Annotation Constraint Refinement (    ) s C 11  t 15 : Actor Replacement f f t 16 : Workflow Combination (Map) t 13 : Data Connection Refinement … f1f1 f2f2 f1f1 … f2f2  Resulting Workflow D C DC D t D 22 11 t D 22 s C 11 t D 22 s C Source: [Bowers-Ludaescher, ER’05]

Scientific Workflow Design Support SWF design & reuse, via: –Structural data types –Semantic types –Associations (=constraints) between them –Type checking, inference, propagation  Separation of concerns: –structure, semantics, WF orchestration, etc. Source: [Bowers-Ludaescher, ER’05]

Semantic Annotation Propagation

Forward and Backward Propagation Rules

GEON Dataset Generation & Registration (and co-development in KEPLER) Xiaowen (SDM) Edward et al.(Ptolemy) Yang (Ptolemy) Efrat (GEON) Ilkay (SDM) SQL database access (JDBC) Matt et al. (SEEK) % Makefile $> ant run % Makefile $> ant run

Web Services  Actors (WS Harvester)  “Minute-made” (MM) WS-based application integration Similarly: MM workflow design & sharing w/o implemented components

Some KEPLER Actors (out of 160+ … and counting…)

Different “Directors” for Different Concerns Example: –Ptolemy Directors – “factoring out” the concern of workflow “orchestration” (MoC) –common aspects of overall execution not left to the actors Similarly: –“Black Box” (“flight recorder”) a kind of “recording central” to avoid wiring 100’s of components to recording-actor(s) –“Red Box” (error handling, fault tolerance) use ftsh ideas; tempaltes –“Yellow Box” (type checking) for workflow design –“Blue Box” (shipping-and-handling) central handling of data transport (by value, by reference, by scp, SRB, GridFTP, …) –“CCA++ Boxes” Change behavior (e.g. algorithm) of a component Change behavior (i.e., wiring) of a workflow in-flight SDF/PN/DE/… Provenance Recorder Static Analysis On Error Component Mgr Composition Mgr

Separation of Concerns: Port Types Token consumption (& production) “type” –a director’s concern More generally: resource consumption “type” –other scheduling problems Token “transport type” –by value, reference (which one), protocol (SOAP, scp, GridFTP, scp, SRB, …) –a SHA concern Structural and semantic types –SAT (static analysis & typing) concern –built after static unit type system… static unit type system as a special case!?

Other Research Problems Making the system more X-aware: –MoC-aware: ok (directors) –Provenance-aware: … –DS (data schema)-aware: … –Semantics-aware: upcoming (should be hybrid w/ DS) –Host-aware: allow distributed scheduling of actors –Data-transport-aware: choose suitable data transport protocol (scp, bbcp, http, (Grid-)ftp, SRB, SRM,...) –Think of new “folks” on the movie set: Actors, director Cameraman (provenance recorder?) Editor (FF/REW/Play/Pause/Stop provenance re-run) Caterer/Stager (feeding actors with yummy tokens!) Managers for “Process Central” and “Data Central” Semantic/Hybrid Type Manager

More Research Topics What if we know something about bandwidths, processor loads, data sizes?  workflow optimization! What if we have more semantics for actors? –Black-box: token in/out –Grey-box: data types, semantic types –White box: exact functional behavior is known! –Example: Actor implements a (stream-?) query!  Query Process Network –New optimization opportunities!

A User’s Wish List Usability Closing the “lid” (cf. vnc) Dynamic plug-in of actors (cf. actor & data registries/repositories) Distributed WF execution Collection-based programming Grid awareness Semantics awareness WF Deployment (as a web site, as a web service, …) “Power apps” (cf. SCIRun) …