Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)

Slides:



Advertisements
Similar presentations
1 Slides presented by Hank Childs at the VACET/SDM workshop at the SDM Center All-Hands Meeting. November 26, 2007 Snoqualmie, Wa Work performed under.
Advertisements

Provenance GGF18 Kepler/COW+RWS, Kepler/COW+RWS, Bowers, McPhiilips et al. Provenance Management in a COllection-oriented Scientific Workflow.
Workflow automation for processing plasma fusion simulation data Norbert Podhorszki Bertram Ludäscher Scientific Computing Group Oak Ridge National Laboratory.
UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
DESIGN AND IMPLEMENTATION OF SOFTWARE COMPONENTS FOR A REMOTE LABORATORY J. Fernandez, J. Crespo, R. Barber, J. Carretero University Carlos III of Madrid.
ABSTRACT The goal of this project was to create a more realistic and interactive appliance interface for a Usability Science class here at Union. Usability.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
ProActive Task Manager Component for SEGL Parameter Sweeping Natalia Currle-Linde and Wasseim Alzouabi High Performance Computing Center Stuttgart (HLRS),
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Ngu, Texas StatePtolemy Miniconference, February 13, 2007 Flexible Scientific Workflows Using Dynamic Embedding Anne H.H. Ngu, Nicholas Haasch Terence.
Integrated Scientific Workflow Management for the Emulab Network Testbed Eric Eide, Leigh Stoller, Tim Stack, Juliana Freire, and Jay Lepreau and Jay Lepreau.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
WORKFLOW IN MOBILE ENVIRONMENT. WHAT IS WORKFLOW ?  WORKFLOW IS A COLLECTION OF TASKS ORGANIZED TO ACCOMPLISH SOME BUSINESS PROCESS.  EXAMPLE: Patient.
January, 23, 2006 Ilkay Altintas
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Christopher Jeffers August 2012
SDM Center A Quick Update on the TSI and PIW workflows SDM All Hands March 2-3, Terence Critchlow, Xiaowen Xin, Bertram.
1 Ilkay ALTINTAS Assistant Director, National Laboratory for Advanced Data Research Manager, Scientific Workflow Automation Technologies Laboratory San.
What is a life cycle model? Framework under which a software product is going to be developed. – Defines the phases that the product under development.
University of California, Davis Daniel Zinn 1 University of California, Davis Daniel Zinn 1 Parallel Virtual Machines in Kepler Daniel Zinn Xuan Li Bertram.
1 Scientific Data Management Center DOE Laboratories: ANL: Rob Ross LBNL:Doron Rotem LLNL:Chandrika Kamath ORNL: Nagiza Samatova.
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life UC DAVIS Department of Computer Science The Kepler/pPOD Team Shawn.
Testing Workflow In the Unified Process and Agile/Scrum processes.
Supercomputing Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
1 Ilkay ALTINTAS - July 24th, 2007 Ilkay ALTINTAS Director, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, UCSD.
Workflow Project Status Update Luciano Piccoli - Fermilab, IIT Nov
SCIRun and SPA integration status Steven G. Parker Ayla Khan Oscar Barney.
Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
LHCb Software Week November 2003 Gennady Kuznetsov Production Manager Tools (New Architecture)
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
SEE-GRID-SCI The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no.
SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National Laboratory.
Your name here SPA: Successes, Status, and Future Directions Terence Critchlow And many, many, others Scientific Process Automation PNNL.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.
6 February 2009 ©2009 Cesare Pautasso | 1 JOpera and XtremWeb-CH in the Virtual EZ-Grid Cesare Pautasso Faculty of Informatics University.
SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Ling Liu, Calton Pu GT Reagan Moore, Bertam Ludaescher, SDSC Amarnath Gupta.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR.
Presented by Scientific Data Management Center Nagiza F. Samatova Oak Ridge National Laboratory Arie Shoshani (PI) Lawrence Berkeley National Laboratory.
GA 1 CASC Discovery of Access Patterns to Scientific Simulation Data Ghaleb Abdulla LLNL Center for Applied Scientific Computing.
Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:
SDM Center Experience with Fusion Workflows Norbert Podhorszki, Bertram Ludäscher Department of Computer Science University of California, Davis UC DAVIS.
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
EGI-InSPIRE RI EGI Community Forum 2012 EGI-InSPIRE EGI-InSPIRE RI EGI Community Forum 2012 Kepler Workflow Manager.
ACCESSING DATA IN THE NIS USING THE KEPLER WORKFLOW SYSTEM Corinna Gries.
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
Workflow Management Concepts and Requirements For Scientific Applications.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
DECTRIS Ltd Baden-Daettwil Switzerland Continuous Integration and Automatic Testing for the FLUKA release using Jenkins (and Docker)
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
VisIt Project Overview
Lawrence Livermore National Laboratory
System Concept Simulation for Concurrent Engineering
Code Analysis, Repository and Modelling for e-Neuroscience
Overview of Workflows: Why Use Them?
Code Analysis, Repository and Modelling for e-Neuroscience
Gordon Erlebacher Florida State University
Presentation transcript:

Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU) Steve Parker (Univ. of Utah) Bertram Ludaescher (UC Davis) SIAM CSE Conference February, 2007 UCRL-PRES

What is a “scientific workflow”? l Can be arbitrarily complex  Conditionals, loops / iterations, parallel execution  Human interactions l A scientific workflow is any workflow performed in order to accomplish a larger scientific goal Definition A workflow is a predefined sequence of actions which performs a specific task.

Scientific workflows exist in all domains Promoter Identification ROADNet workflow courtesy of A. Rajasekar SDSC

If we can automate a workflow, application scientists can spend more time doing science

An executable workflow is defined within a tool in a way that allows the task to be run l There are many workflow engines available  l A “Director” is responsible for task scheduling l An “Actor” is a single task that the workflow needs to schedule l I/O Ports connect actors

Creating an executable workflow requires precisely defining what needs to be done l Submit a batch job to supercomputer l When the job starts running  Track progress of simulation  Move output files to an archive  Move output files to analysis machine l Clean up Overall architect (& prototypical user): Scott Klasky (ORNL) WF design & implementation: Norbert Podhorszki (UC Davis) Execution Log (=> Data Provenance) Splitting output enables parallel processing of same data Each actor executes in parallel as long as it has needed inputs Submit job Monitor progress and do analysis Cleanup

Creating an executable workflow requires precisely defining what needs to be done Overall architect (& prototypical user): Scott Klasky (ORNL) WF design & implementation: Norbert Podhorszki (UC Davis) Wait for files to appear Convert files to new data format Send files to archive Generate image Configure parameters based on user and machine Image generated using SCIRun (Univ of Utah)

Now that I have an executable workflow, so what? l Instead of performing the task by hand each time, you are able to update the workflow parameters, start workflow executing, and do other things

Now that I have an executable workflow, so what? l Monitoring for files l File transfer with automatic restart on failure l Automatic generation of images l Instead of performing the task by hand each time, you are able to update the workflow parameters, start workflow executing, and do other things l Mundane data management tasks are taken care of Actors can be reused across workflows

l Instead of performing the task by hand each time, you are able to update the workflow parameters, start workflow executing, and do other things l Mundane data management tasks are taken care of l Workflow executes in parallel Now that I have an executable workflow, so what? Logging, archiving, and image generation proceed in parallel without additional coding

Now that I have an executable workflow, so what? l Instead of performing the task by hand each time, you are able to update the workflow parameters, start workflow executing, and do other things l Mundane data management tasks are taken care of l Workflow executes in parallel l Provenance tracking Log files reflect both current status of simulation run and provide a permanent record of execution Improved provenance tracking is a major focus of ongoing work.

Scientific workflow automation has potential to reduce the data management burden l As experimental and simulation grows, managing the data efficiently becomes increasingly important l Scientific workflow technology removes much of the mundane data management burden, freeing scientists to do science The CIPRES project has as a key goal the creation of software infrastructure that allows developers in the community to easily contribute new software tools,... The modular nature of Kepler met our requirements, as it is a JAVA platform that allows users to construct linear, looping, and complex workflows from just the kinds of components. The CIPRES community is developing. By adopting this tool, we were able to focus on developing appropriate framework and registry tools for our community, and use the friendly Kepler user application interface as an entrée to our services. We are very excited about the progress we have made, and think the tool will be revolutionary for our user base. - Mark A. Miller, PI, NSF CIPRES project, 2006

This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W ENG-48.