Workflow Topics for the Next- Generation SDM-Center Ilkay Altintas Bertram Ludäscher San Diego Supercomputer Center.

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

Nimrod/K: Towards Massively Parallel Dynamic Grid Workflows David Abramson, Colin Enticott, Monash Ilkay Altinas, UCSD.
UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.
Framework is l Reusable Code, often domain specific (GUI, Net, Web, etc) l expressed as l a set of classes and l the way objects in those classes collaborate.
Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February.
Hybrid-Type Extensions for Actor-Oriented Modeling (a.k.a. Semantic Data-types for Kepler) Shawn Bowers & Bertram Ludäscher University of California, Davis.
Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Ngu, Texas StatePtolemy Miniconference, February 13, 2007 Flexible Scientific Workflows Using Dynamic Embedding Anne H.H. Ngu, Nicholas Haasch Terence.
Software Factory Assembling Applications with Models, Patterns, Frameworks and Tools Anna Liu Senior Architect Advisor Microsoft Australia.
7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,
Professional Informatics & Quality Assurance Software Lifecycle Manager „Tools that are more a help than a hindrance”
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
January, 23, 2006 Ilkay Altintas
The Design Discipline.
SDM Center A Quick Update on the TSI and PIW workflows SDM All Hands March 2-3, Terence Critchlow, Xiaowen Xin, Bertram.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.
Composing Models of Computation in Kepler/Ptolemy II
An Introduction to Software Architecture
Architectural Design portions ©Ian Sommerville 1995 Establishing the overall structure of a software system.
Introduction to MDA (Model Driven Architecture) CYT.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 09. Review Introduction to architectural styles Distributed architectures – Client Server Architecture – Multi-tier.
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life UC DAVIS Department of Computer Science The Kepler/pPOD Team Shawn.
Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher.
10/18/20151 Business Process Management and Semantic Technologies B. Ramamurthy.
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
1 Ilkay ALTINTAS - July 24th, 2007 Ilkay ALTINTAS Director, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, UCSD.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
Architectural Design Yonsei University 2 nd Semester, 2014 Sanghyun Park.
Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D.
Rule-Based Programming for VORBs Bertram Ludaescher Arcot Rajasekar Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego.
Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.
Paolo Missier (1), Bertram Luda ̈ scher (2), Shawn Bowers (3), Saumen Dey (2), Anandarup Sarkar (3), Biva Shrestha (4), Ilkay Altintas (5), Manish Kumar.
STASIS Technical Innovations - Simplifying e-Business Collaboration by providing a Semantic Mapping Platform - Dr. Sven Abels - TIE -
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Kepler+PF+RWS, Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance GGF18 RWS Provenance Experiments in Kepler (Kepler + PR + RWS) Norbert.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
CSC480 Software Engineering Lecture 10 September 25, 2002.
Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Supporting Collaborative Ontology Development in Protégé International Semantic Web Conference 2008 Tania Tudorache, Natalya F. Noy, Mark A. Musen Stanford.
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
Slide 1 Chapter 8 Architectural Design. Slide 2 Topics covered l System structuring l Control models l Modular decomposition l Domain-specific architectures.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Software Design Refinement Using Design Patterns
Ptolemy II - Heterogeneous Concurrent Modeling and Design in Java
SDM workshop Strawman report History and Progress and Goal.
Simplified Development Toolkit
Software models - Software Architecture Design Patterns
An Introduction to Software Architecture
A Semantic Type System and Propagation
Developing and testing enterprise Java applications
Business Process Management and Semantic Technologies
Presentation transcript:

Workflow Topics for the Next- Generation SDM-Center Ilkay Altintas Bertram Ludäscher San Diego Supercomputer Center UC DAVIS Department of Computer Science SciDAC SDM AHM Oct 5-6, 2005, NCSU Raleigh, NC SciDAC SDM AHM Oct 5-6, 2005, NCSU Raleigh, NC Sir Walter Raleigh

SDM-AHM NCSUNext SDM-C: Workflows Overview Kepler/SPA: –What we have (The GOOD) –What we don’t (yet) have (The BAD) –What we really need?? (The UGLY)  Things we might do; prioritization

SDM-AHM NCSUNext SDM-C: Workflows Macro Definitions … #define KEPLER KEPLER/SPA #define KEPLER KEPLER*SPA By the end: #define SPA KEPLER HPC

SDM-AHM NCSUNext SDM-C: Workflows What we have – The GOOD Big Heritage from Ptolemy II –Vergil GUI for design and (some) execution monitoring –Actor-Oriented Modeling & Design Director / Actor Separation Models of Computation: PN, SDF, DE,.. Nested Workflows & Hierarchical Modeling Research Results on Modeling Complex Systems –modal models, mobile models, reconfig’able models, model lifecycle management, higher-order actors, …  head-start for CCA Extensions, e.g. SciRUN-2 Extensions (Steve P. et al.) Self-Managing, Dynamically-Adaptive, Autonomous Components (Manish et al.)

SDM-AHM NCSUNext SDM-C: Workflows What we have – The GOOD Kepler Extensions (to Ptolemy II) –Mostly: loosely coupled, e.g. WS (web service) workflows –Many generic actors ssh, scp, cmd-line,SRB, Globus, … new R expression actor –Many custom actors e.g. in PIW, TSI-1, TSI-2, GEON, SEEK, Resurgence, … –Several ad-hoc extensions & (initial) research, e.g. External job scheduling (e.g. NIMROD, …) Director extensions (fault tolerance via WS “retry”) WF-Templates (structured combination of dataflow & control-flow: fault-tolerance, reusability) Higher-order functions (map/3, iterate-over-array, … : simpler control-flow, optimization potential, …)

SDM-AHM NCSUNext SDM-C: Workflows Some KEPLER Actors (out of 160+ … and counting…)

SDM-AHM NCSUNext SDM-C: Workflows What we have – The GOOD Kepler Extensions (Cont’d) –Some generic extensions Metadata-based (EML/ADN) Dataset Search Concept-based Actor Search (OWL) Documentation Framework Authentication & Authorization Framework (GAMA from GEON) Improved component/WF archival & plug-in (KAR,…) Provenance Recorder (“Listener”) PS … a growing open-source developers community … … and some scientific users … (TSI-1/2, PIW, GEON, SEEK, … )

SDM-AHM NCSUNext SDM-C: Workflows Concept-based Actor Search –Implemented as proof-of- concept Additional operations slated for next Kepler Release (data search, port-based actor search, etc.) Biggest Challenges –Building/searching a repository … –Making changes to MoML (see KAR) –GUI changes –Ontology management Concept-based Actor Search Workflow Components (MoML/KAR) Ontologies (OWL) Default + Other Semantic Annotations urn ids instance expressions

SDM-AHM NCSUNext SDM-C: Workflows

SDM-AHM NCSUNext SDM-C: Workflows The GOOD: Kepler Archives Purpose: Encapsulate WF data and actors in an archive file –… inlined or by reference –… version control  More robust workflow exchange  Easy management of semantic annotations  Plug-in architecture (Drop in and use)  Easy documentation updates A jar-like archive file (.kar) including a manifest All entities have unique ids (LSID) Custom object manager and class loader UI and API to create, define, search and load.kar files

SDM-AHM NCSUNext SDM-C: Workflows KAR File Example

SDM-AHM NCSUNext SDM-C: Workflows Kepler Object Manager Designed to access local and distributed objects Objects: data, metadata, annotations, actor classes, supporting libraries, native libraries, etc. archived in kar files Advantages: –Reduce the size of Kepler distribution Only ship the core set of generic actors and domains –Easy exchange of full or partial workflows for collaborations –Publish full workflows with their bound data Becomes a provenance system for derived data objects => Separate SPA workflow repository and distribution

SDM-AHM NCSUNext SDM-C: Workflows Provenance Framework Provenance –Track origin and derivation information about scientific workflows, their runs and derived information (datasets, metadata…) Need for Provenance –Association of process and results –reproduce results –“explain & debug” results (via lineage tracing, parameter settings, …) –optimize: “Smart Re-Runs” Types of Provenance Information: –Data provenance Intermediate and end results including files and db references –Process (=workflow instance) provenance Keep the wf definition with data and parameters used in the run –Error and execution logs –Workflow design provenance (quite different) WF design is a (little supported) process (art, magic, …) for free via cvs: edit history need more “structure” (e.g. templates) for individual & collaborative workflow design

SDM-AHM NCSUNext SDM-C: Workflows Kepler Provenance Recording Utility Parametric and customizable –Different report formats –Variable levels of detail Verbose-all, verbose-some, medium, on error –Multiple cache destinations Saves information on –User name, Date, Run, etc… Joint work with Oscar Barney

SDM-AHM NCSUNext SDM-C: Workflows Provenance: Next Steps.kar file generation, registration and search for provenance information Possible data/metadata formats Automatic report generation from accumulated data A relational schema for the provenance info in addition to the existing XML Smart re-runs

SDM-AHM NCSUNext SDM-C: Workflows The Future From GOOD via BAD to UGLY The good news (about ‘bad’ and ‘ugly’) –Lots of interesting challenges! –… so ‘ugly’ is actually good!

SDM-AHM NCSUNext SDM-C: Workflows What we don’t (yet) have … THE BAD Much is still to do (or still ongoing) –Detached execution many options; depend on requirements –Kepler WF repository w/ dynamic actor plug-in –Smart Reruns avoid doing (old) work twice –Smarter Reruns (too smart?) reuse previous results for speed-up of (new) work –NIMROD Director, CONDOR Director … –Task manager / monitor –Support for WF design & reuse Semantic extensions “Design Patterns”, Templates

SDM-AHM NCSUNext SDM-C: Workflows What we don’t have … THE BAD cont’d Vertical SDM Integration –Workflow layer could be used to embed other SDM components and glue them together –Scope & Architecture unclear Data Mining tools  new WF actors Parallel-R  new WF actors !? SEA, Bitmap tools  new !? MPI-IO  alternative to current Kepler data access!? … –Not only a technical problem e.g. need for driving use-cases that require combination of several SDM layers together

SDM-AHM NCSUNext SDM-C: Workflows Challenges Easier said … –“We’re not going to reinvent the wheel …” –“We just use XYZ …” XYZ in {CCA, HDF5, PnetCDF, Ccaffeine, Condor, MPI-IO, parallel-R, …} … than done … –Incompatible, isolated solutions and frameworks –Can’t use workflow/actor/director A with B Coming up with a coherent, overall architecture is hard!

SDM-AHM NCSUNext SDM-C: Workflows HTC Example (using: NIMROD) need to make Kepler NIMROD/Condor/… “aware” similar need for HPC support

SDM-AHM NCSUNext SDM-C: Workflows Another Distribution Approach Client Servers Computer Network Service Locator (Peer Discovery) Simulation is orchestrated in a centralized manner Source: Daniel Lázaro Cuadrado, Aalborg University

SDM-AHM NCSUNext SDM-C: Workflows What we don’t have … THE UGLY Workflow Design & (Re-)Usability –Difficult Marriage of Dataflow and Control-flow e.g. PIW, TSI-1/2, GEON-A-type-WF, … –WF development, deployment, maintenance, use from (Mess…) to Art to Commodity (  next presentation) –support for WF whole life-cycle Fault Tolerance –current embedding of control-flow into dataflow yields to non- maintainable workflows! Close Coupling of Components for HPC –CCA-style –MPI-style –Memory-to-Memory (on single nodes) –large, efficient data transfer –…

SDM-AHM NCSUNext SDM-C: Workflows WF-Design: Adapters for Semantic & Structural Incompatibility Adapters may: –be abstract (no impl.) –be concrete –bridge a semantic gap –fix a structural mismatch –be generated automatically (e.g., Taverna’s “list mismatch”) –be reused components (based on signatures) C1C1 C1C1 D1D1 C1C1 C2C2 CDC CD D DD C2C2 C2C2 D2D2 f2f2 f1f1 [S] S T f1f1 [T] f2f2 map f2f2 f1f1 [[S]] S T f1f1 [[T]] f2f2 map

SDM-AHM NCSUNext SDM-C: Workflows Additional Design Primitives for Semantic Types Extended TransformationsStarting WorkflowResulting Workflow t 9 : Actor Semantic Type Refinement (T T) T t 12 : I/O Constraint Strengthening (    ) t 10 : Port Semantic Type Refinement (C C, D D) C t 14 : Adapter Insertion T t 11 : Annotation Constraint Refinement (    ) s C 11  t 15 : Actor Replacement f f t 16 : Workflow Combination (Map) t 13 : Data Connection Refinement … f1f1 f2f2 f1f1 … f2f2  Resulting Workflow D C DC D t D 22 11 t D 22 s C 11 t D 22 s C

SDM-AHM NCSUNext SDM-C: Workflows Workflow Design Primitives End-to-End Workflow Design and Implementation –Viewed as a series of primitive “transformations” –Each takes a WF and produces a new WF –Can be combined to form design “strategies” W0W0 t W1W1 W2W2 WmWm WnWn … t t Workflow Design Workflow Implementation Top-Down Bottom-Up Input Driven Output Driven Structure Driven Semantic Driven Task Driven Data Driven

SDM-AHM NCSUNext SDM-C: Workflows Fault Tolerance & Maintenance Challenges

SDM-AHM NCSUNext SDM-C: Workflows Workflow Templates and Patterns New Ingredients Proposed Layered Architecture work w/ Anne Ngu, Shawn Bowers, Terence Critchlow

SDM-AHM NCSUNext SDM-C: Workflows Use Ideas from Fault Tolerant Shell Source: Douglas Thain, Miron Livny The Ethernet Approach to Grid Computing Good ideas in ftsh; some might be (semi-)low hanging fruits for Kepler …

SDM-AHM NCSUNext SDM-C: Workflows Kepler Coupling Components & Codes Types of Coupling … –Loosely coupled (“1 st Phase”) Web Services (SPA, GEON, SEEK, …), ssh actors,.. + reusability (behavorial polymorphism) + scalability (# components) – efficiency –Tight(er) coupling (“2 nd Phase”) Via CCA (SciRUN-2, Ccaffeine, …) (Cipres uses CORBA) HPC needs: code-coupling as efficient & flexible as possible (e.g. Scott’s challenges…) –memory-to-memory (single node or shared memory), –MPI (multiple-nodes) –optimizations for transfer of data & control (streaming, socket-based connections)

SDM-AHM NCSUNext SDM-C: Workflows Accord-CCA: Ccaffeine w/ Self-Managed Behavior Source: Hua Liu and Manish Parashar cf. w/ mobile models, reconfiguration in Ptolemy II … begging for a Kepler design and implementation …

SDM-AHM NCSUNext SDM-C: Workflows Different “Directors” for Different Concerns Example: –Ptolemy Directors – “factoring out” the concern of workflow “orchestration” (MoC) –common aspects of overall execution not left to the actors Similarly: –“Black Box” (“flight recorder”) a kind of “recording central” to avoid wiring 100’s of components to recording-actor(s) –“Red Box” (error handling, fault tolerance) use ftsh ideas; tempaltes –“Yellow Box” (type checking) for workflow design –“Blue Box” (shipping-and-handling) central handling of data transport (by value, by reference, by scp, SRB, GridFTP, …) –“CCA++ Boxes” Change behavior (e.g. algorithm) of a component Change behavior (i.e., wiring) of a workflow in-flight SDF/PN/DE/… Provenance Recorder Static Analysis On Error Component Mgr Composition Mgr

SDM-AHM NCSUNext SDM-C: Workflows Summary The GOOD: –lots to build upon The BAD: –no common / integrated architecture  use Kepler/SPA as a glue  this might be harder than it sounds  needs a mix of end-to-end application-drive and serious design effort for the integration architecture The UGLY: –HPC challenges: close coupling, fault tolerance, … –The good news: there’s work to be done!

SDM-AHM NCSUNext SDM-C: Workflows Use of Semantics in SWF… “Smart” Search –Concept-based, e.g., “find all datasets containing biomass measurements” Improved Linking, Merging, Integration –Establishing links between data through semantic annotations & ontologies –Combining heterogeneous sources based on annotations –Concatenate, Union (merge), Join, etc. Transforming –Construct mappings from schema S1 to S2 based on annotations Semantic Propagation –“Pushing” semantic annotations through transformations/queries

SDM-AHM NCSUNext SDM-C: Workflows Helping with “shims” / adapters Services can be semantically compatible, but structurally incompatible Source Actor Source Actor Target Actor Target Actor PsPs PtPt Semantic Type P s Semantic Type P t Structural Type P t Structural Type P s Desired Connection Incompatible Compatible (⋠)(⋠) (⊑)(⊑) (Ps)(Ps) (Ps)(Ps)  (≺)(≺) Ontologies (OWL) Source: [Bowers-Ludaescher, DILS’04]