National Aeronautics and Space Administration Jet Propulsion Laboratory March 17, 2009 Workflow Orchestration: Conducting Science Efficiently on the Grid March 17, 2009 David Woollard NASA Jet Propulsion Lab 4800 Oak Grove Drive Pasadena, CA Dept. of Computer Science University of Southern California Los Angeles, CA 90089
National Aeronautics and Space Administration Jet Propulsion Laboratory Validating Computational Science Computational science, like all science, requires validation Validation comes in two forms: –Scaling (in data and computation) –Independent replication Both forms require significant computational resources –Grid is a promising resource Workflow Orchestration - March 17,
National Aeronautics and Space Administration Jet Propulsion Laboratory Vision of the Grid Workflow Orchestration - March 17, Center for Software and Systems Engineering University of Southern California Science Data Systems Section NASA Jet Propulsion Laboratory Aerospace Corporation Northrop Grumman Boeing Corporation Lawrence Livermore National Lab Columbia Supercomputing Center NASA Ames Research Center Supercomputing Center University of California San Diego
National Aeronautics and Space Administration Jet Propulsion Laboratory Vision of the Grid Workflow Orchestration - March 17, Like the power grid, the computational Grid should scale to the demands of individual users.
National Aeronautics and Space Administration Jet Propulsion Laboratory Workflows orchestrate processes on the Grid tasks, data, and rulesWorkflows are a processing model that incorporate tasks, data, and rules. tasks data rulesWorkflow management systems execute tasks on the Grid using data once the task’s dependencies are satisfied based on rules. Workflow-Based Specification Workflow Orchestration - March 17, Tas k 1 Tas k 2 Tas k 3 Tas k 4 Tas k 5
National Aeronautics and Space Administration Jet Propulsion Laboratory Scaling the Experiment Workflow Orchestration - March 17, OtherInstitutions Task 1Task 1 Task 2Task 2 Task 3Task 3 Task 4Task 4 Task 5Task 5
National Aeronautics and Space Administration Jet Propulsion Laboratory Independent Replication Workflow Orchestration - March 17, Collaborator3 rd Party Task 1Task 1 Task 2Task 2 Task 3Task 3 Task 4Task 4 Task 5Task 5
National Aeronautics and Space Administration Jet Propulsion Laboratory Heterogeneous Environments Workflow Orchestration - March 17, LaboratoryInstitutionCo-laboratory Task 1Task 1 Task 2Task 2 Task 3Task 3 Task 4Task 4 Task 5Task 5 Task 1Task 1 Task 2Task 2 Task 3Task 3 Task 4Task 4 Task 5Task 5 Task 1Task 1 Task 2Task 2 Task 3Task 3 Task 4Task 4 Task 5Task 5 Workflow Engine 1 Grid Infrastructure 1 Workflow Engine 1 Grid Infrastructure 2 Workflow Engine 2 Grid Infrastructure 2 Collaborator3 rd Party
National Aeronautics and Space Administration Jet Propulsion Laboratory Research Challenge Scientific validation requires: –Scaling –Replication Existing technologies exhibit three challenges: –Require scientists to become engineers or vice versa –Existing workflow specifications entwine scientific and engineering concerns –Existing workflow specifications are not portable Workflow Orchestration - March 17,
National Aeronautics and Space Administration Jet Propulsion Laboratory A Model-Driven Approach Workflow Orchestration - March 17, Computation Independent Model Implementation Independent Model Implementation Workflow Model Domain-Specific Software Architecture Deployment
National Aeronautics and Space Administration Jet Propulsion Laboratory Agenda In the rest of this talk, we will cover: Models verses languages The role of software architecture Transforming workflows to domain-specific software architectures Performance Future Work and Conclusions Workflow Orchestration - March 17,
National Aeronautics and Space Administration Jet Propulsion Laboratory A Plethora of Workflow Languages Workflow Orchestration - March 17, Yu & Buyya presented a taxonomy [Yu & Buyya 05] –Based on workflow properties like model representation and scheduling policy –Illustration of divergence in the field As of last year, researchers such as Osterweil, et. al. [08] still advocated more advanced language features Considered a Grand Challenge [Gil, et al. 07]
National Aeronautics and Space Administration Jet Propulsion Laboratory Making Decisions in Design Space Existing workflow languages violate separation of concerns –Scientists should work in languages applicable to the design space, not the solution space –Engineers should not have to become scientists to be able to scale workflow-based systems If workflow languages become the realm of the scientist, how does the software engineer effect change? Workflow Orchestration - March 17, Manipulation of the system at the architectural level
National Aeronautics and Space Administration Jet Propulsion Laboratory Orchestration Through Connectors Lau, et al., have proposed exogenous connectors [Lau, et al. 06]. –encapsulate both control and data flow in a software system –can be hierarchically composed to simulate control flow Control can be managed through several constructs: –Sequence –Conditional –Branch & Bound Workflow Orchestration - March 17, AB AB C AB AB
National Aeronautics and Space Administration Jet Propulsion Laboratory Invoking Connectors Different Grid infrastructures interact with tasks in multiple ways [Woollard 08]: –Synchronous communication –Events –Web services Workflow Orchestration - March 17,
National Aeronautics and Space Administration Jet Propulsion Laboratory Custom Handlers Workflow Orchestration - March 17, Control Flow Data Flow ExogenousConnector Invoking Connector Control Flow Data Flow Component Control Flow Data Flow Internal Logic Services Custom Handler
National Aeronautics and Space Administration Jet Propulsion Laboratory SWSA: A Domain Architecture Workflow Orchestration - March 17,
National Aeronautics and Space Administration Jet Propulsion Laboratory Implementation Prism-MW, an architecturally-aware middleware –Components, Connectors, Topologies and Architecture are reified as first class elements Exogenous connectors, invoking connectors, and component wrappers around tasks are build with Prism Workflow Orchestration - March 17,
National Aeronautics and Space Administration Jet Propulsion Laboratory Performance Studies Overhead induced in computation time and memory connectors [Woollard, et al. 09]. Workflow Orchestration - March 17, Impact of architectural deployment on computation [Woollard, et al. 09]. -Modified an existing time series workflow used at JPL -Deployed the system using OpenDAP and grid technology to co-locate data and computation Reduced typical analysis from 9+ hours to under 2 minutes
National Aeronautics and Space Administration Jet Propulsion Laboratory Deployment & Optimization In the future, we plan to utilize advanced architectural modeling and deployment analysis to guide software engineers in deployment strategy Workflow Orchestration - March 17,
National Aeronautics and Space Administration Jet Propulsion Laboratory Conclusion Computational science requires validation Existing grid and workflow technologies are promising, but lack support for scaling and replication across heterogeneous Grid environments A model-driven approach allows scientists to manipulate workflow specifications, while software engineers can effect the transformed software architectures Workflow Orchestration - March 17,
National Aeronautics and Space Administration Jet Propulsion Laboratory Thank You [Yu & Buyya 05] Yu, J. and Buyya, R. A Taxonomy of Workflow Management Systems for Grid Computing. Journal of Grid Computing 3(3-4): pp [Osterweil 08] Osterweil, L., et. al. Experience in using a process language to define scientific workflow and generate dataset provenance. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, Atlanta, Georgia [Gil, et. al. 07] Gil, Y., et. al. Examining the Challenges of Scientific Workflows. IEEE Computer 40(12): pp [Lau, et. al. 06] Lau, K., et. al. A Software Component Model and its Preliminary Formalisation. In F.S. de Boer et al., editors, Proceedings of Fourth International Symposium on Formal Methods for Components and Objects, Lecture Notes in Computer Science 4111(1-21) [Woollard 08] Woollard, D. Supporting the Engineering Aspects of e-Science Through Workflow Services. Proceedings of the First Brazilian e-Science Workshop, Campinas, Brazil, [Woollard, et. al. 089 Woollard, D. et. al. Injecting Software Architectural Constraints into Legacy Scientific Applications. To appear in Proceedings of the ICSE 2009 Workshop on Software Engineering for Computational Science and Engineering. Vancouver, Canada, Workflow Orchestration - March 17,