GGF10 Workflow Workshop Summary March 9 2004 Berlin The Organizers
Topics General Issues Application Requirements Language/User Interface Execution Engine (Run-time) 3 Grid Workflow Issues
General Issues Grain Size for “science” and efficiency of distributed service model Hierarchy (workflow of workflows) Data versus Control Security Metadata and Provenance Dynamic/Event based or Static Component Models/Architecture -- CCA, OGSI, WSRF. Web Services Error Handling (Detect, Specify action, Take action) Ease of Use (for real users not Grid hackers) Collaborative use by several users Open Source?
Application Requirements Time of running (seconds to months) People in loop Interactivity: real-time v batch Number of entities (10's to 100000's) Stream-based (communicate via pipes) OR Job-based (communicate via files) Spatial versus temporal interactions Multiple “workflow job” instances handled in or outside workflow
Language/User Interface Abstract versus High level (specification) versus low-level (“workflow virtual machine” ) Virtual Data Abstraction level Language: Kepler, Triana. BPEL WSCL WSCI ….. BPEL is inevitable? Diversity via Different “towers” in BPEL And/Or another language Does “other language” map to BPEL as low level interoperable Workflow VM Scripts: Perl, Python, Ant, Matlab, Specialized Petri Nets Functional Language specification Graphical UI Dataflow (stream) versus Control (message) model Web Service ports can be data and control?
Execution Engine (Run-time) Performance Robustness Support Streams and Messages Discovery of Services and Resources (computers, data repositories, networks) Support Scheduling/Planning of tasks and/or streams and/or data resources (“Towers”?) Support of Monitoring, Factories, Life-times etc Type checking Support Debugging Support "Workflow" (Computational) Steering Distributed versus centralized implementation
3 Grid Workflow Issues 1) Analyze issues such as dataflow, scheduling, virtual data, “science state” Map into WSRF and BPEL Correlation Identifiers/Extensibility or find “inadequacies”? 2) Look at scale and data size, data locality issues in science workflow What are implications for runtime engine? 3) Examine Semantic Grid (metadata/ provenance) issues for workflow 2) and 3) can be examined for both BPEL and “other approaches”