Pegasus For Automated Workflows Derrick Kearney HUBzero® Platform for Scientific Collaboration Purdue University This work licensed under Creative Commons.

Slides:



Advertisements
Similar presentations
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Advertisements

Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
Managing Workflows Within HUBzero: How to Use Pegasus to Execute Computational Pipelines Ewa Deelman USC Information Sciences Institute Acknowledgement:
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Submit For Job Submission Derrick Kearney HUBzero® Platform for Scientific Collaboration Purdue University This work licensed under Creative Commons See.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
ProActive Task Manager Component for SEGL Parameter Sweeping Natalia Currle-Linde and Wasseim Alzouabi High Performance Computing Center Stuttgart (HLRS),
1 Using Workspaces to Develop Simulation Tools Michael McLennan Software Architect HUBzero™ Platform for Scientific Collaboration This work licensed under.
Virtual Geophysics Laboratory (VGL) VGL v1.2 NeCTAR Project Close R.Fraser, T.Rankine, J.Vote, L.Wyborn, B.Evans, R.Woodcock, C.Kemp July 2013 CSIRO |
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
Managing Workflows with the Pegasus Workflow Management System
Introduction to UNIX/Linux Exercises Dan Stanzione.
1 Uploading and Publishing New Tools Michael McLennan Software Architect HUBzero™ Platform for Scientific Collaboration This work licensed under Creative.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
End User Tools. Target environment: Cluster and Grids (distributed sets of clusters) Grid Protocols Grid Resources at UW Grid Storage Grid Middleware.
Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.
HUBzero Cyberinfrastructure: Your Workday on Steroids Michael McLennan Director, HUBzero® Platform for Scientific Collaboration Purdue University 1.
The HUBzero Platform for Scientific Collaboration Michael McLennan Director, HUBzero® Platform for Scientific Collaboration Purdue University 1.
1 More Rappture Objects Michael McLennan Software Architect HUBzero™ Platform for Scientific Collaboration This work licensed under Creative Commons See.
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.
Introduction to Scientific Workflows and Pegasus Karan Vahi Science Automation Technologies Group USC Information Sciences Institute.
MaterialsHub - A hub for computational materials science and tools.  MaterialsHub aims to provide an online platform for computational materials science.
Regression Testing Michael McLennan HUBzero® Platform for Scientific Collaboration Purdue University This work licensed under Creative Commons See license.
RISICO on the GRID architecture First implementation Mirko D'Andrea, Stefano Dal Pra.
GNU Compiler Collection (GCC) and GNU C compiler (gcc) tools used to compile programs in Linux.
Managing large-scale workflows with Pegasus Karan Vahi ( Collaborative Computing Group USC Information Sciences Institute Funded.
Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
Dr. Ahmed Abdeen Hamed, Ph.D. University of Vermont, EPSCoR Research on Adaptation to Climate Change (RACC) Burlington Vermont USA MODELING THE IMPACTS.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California.
WEEK INTRODUCTION CSC426 SOFTWARE ENGINEERING.
1 Introducing the Rappture Toolkit Michael McLennan HUBzero® Platform for Scientific Collaboration Purdue University This work licensed under Creative.
What’s Under the Hood? Michael McLennan HUBzero® Platform for Scientific Collaboration Purdue University This work licensed under Creative Commons See.
My Python Programmes NAME:.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Pegasus WMS: Leveraging Condor for Workflow Management Ewa Deelman, Gaurang Mehta, Karan Vahi, Gideon Juve, Mats Rynge, Prasanth.
Experiment Management from a Pegasus Perspective Jens-S. Vöckler Ewa Deelman
Virtual Data Management for CMS Simulation Production A GriPhyN Prototype.
Using Simulation Workspaces Michael McLennan HUBzero® Platform for Scientific Collaboration Purdue University 1 This work licensed under Creative Commons.
Korea Workshop May GAE CMS Analysis (Example) Michael Thomas (on behalf of the GAE group)
1 Introducing the Rappture Toolkit Michael McLennan Software Architect HUBzero™ Platform for Scientific Collaboration This work licensed under Creative.
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
Provenance Research BIBI RAJU, TODD ELSETHAGEN, ERIC STEPHAN 1 Pacific Northwest National Laboratory, Richland, WA.
HUBbub 2013: Developing hub tools that submit HPC jobs Rob Campbell Purdue University Thursday, September 5, 2013.
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
Millions of Jobs or a few good solutions …. David Abramson Monash University MeSsAGE Lab X.
1 Regression Testing Michael McLennan HUBzero® Platform for Scientific Collaboration Purdue University This work licensed under Creative Commons See license.
Using & Contributing Tools Michael McLennan Director, HUBzero Platform for Scientific Collaboration Purdue University Quake Summit 2010, San Francisco,
ACCESSING DATA IN THE NIS USING THE KEPLER WORKFLOW SYSTEM Corinna Gries.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
Chimera Workshop September 4 th, Outline l Install Chimera l Run Chimera –Hello world –Convert simple shell pipeline –Some diamond, etc. l Get.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
1 Support for parameter study applications in the P-GRADE Portal Gergely Sipos MTA SZTAKI (Hungarian Academy of Sciences)
HUBzero® Platform for Scientific Collaboration Copyright © 2012 HUBzero Foundation, LLC International Workshop on Science Gateways, ETH Zürich, June 3-5,
Integrating Scientific Tools and Web Portals
Pegasus WMS Extends DAGMan to the grid world
Cloudy Skies: Astronomy and Utility Computing
Using simulation workspaces to “submit” jobs and workflows
Simplify Your Science with Workflow Tools
MaterialsHub - A hub for computational materials science and tools.
MIK 2.1 DBNS - introduction to WS-PGRADE, 2013
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
gLite Job Management Christos Theodosiou
Overview of Workflows: Why Use Them?
Mats Rynge USC Information Sciences Institute
Network Analysis using Python
Frieda meets Pegasus-WMS
Presentation transcript:

Pegasus For Automated Workflows Derrick Kearney HUBzero® Platform for Scientific Collaboration Purdue University This work licensed under Creative Commons See license online: by-nc-sa/3.0

Building Blocks of Programs

Function2 Function3 Function1 Inputs Outputs Function2 Function3 Function1 Inputs Outputs

Building Blocks of Science

Types of Workflows Sequential Workflows Execute steps in order until all of the work has been completed Could include activities that run in parallel CNTBands Science Domain: Nanoelectronics Scientists: Lundstrom et al. (Purdue)

Types of Workflows Wideband Workflows Execute the same function many (1000's) of times Massively parallel Scatter / Gather Sweeps Epigenomics Science Domain: Bioinformatics Scientists: Ben Berman et al. (USC)

Pegasus Developed at USC Ewa Deelman et al. Website: pegasus.isi.edu Open Source Bindings for your favorite languages Benefits: Performance Portability Provenance Data Management Error Recovery

How does Pegasus Work? If you can draw it...… they can make it run Grid sayhi inquire f.a f.b f.c DAX DAG

HUBzero Infrastructure Example Workflow $ cat /apps/pegtut/current/bin/sayhi.sh #!/bin/bash # output something on stdout echo "Hello `cat ${1}`!" # print greeting to a file echo "Hello `cat ${1}`!" >f.b Tool Session Containers sayhi.sh inquire.sh f.a sayhi inquire f.a f.b f.c

HUBzero Infrastructure Example Workflow $ cat /apps/pegtut/current/bin/inquire.sh #!/bin/bash # output some thing to stdout echo "`cat ${1}` How are you?" # print greeting to a file echo "`cat ${1}` How are you?" >f.c Tool Session Containers sayhi.sh inquire.sh f.a sayhi inquire f.a f.b f.c

HUBzero Infrastructure Example Workflow $ cat f.a pete Tool Session Containers sayhi.sh inquire.sh f.a sayhi inquire f.a f.b f.c

How does Pegasus Work? Step 1. Draw the workflow sayhi inquire f.a f.b f.c

How does Pegasus Work? Step 2. Convert Workflow to DAX using the Python API import os from Pegasus.DAX3 import * sayhipath = '/apps/pegtut/current/bin/sayhi.sh' inquirepath = '/apps/pegtut/current/bin/inquire.sh' # create an abstract DAX dax = ADAG("sayhi_inquire") sayhi inquire f.a f.b f.c

How does Pegasus Work? Step 2. Convert Workflow to DAX – Declare files and executables to replica catalog # Add input file to the DAX-level replica catalog a = File("f.a") a.addPFN(PFN("file://" + os.path.join(os.getcwd(),"f.a"), "local")) dax.addFile(a) # Add executables to the DAX-level replica catalog e_sayhi = Executable(namespace="sayhi_inquire", \ name="sayhi", version="1.0", \ os="linux", arch="x86_64", \ installed=False) e_sayhi.addPFN(PFN("file://" + sayhipath, "condorpool")) dax.addExecutable(e_sayhi) e_inquire = Executable(namespace="sayhi_inquire", \ name="inquire", version="1.0", \ os="linux", arch="x86_64", installed=False) e_inquire.addPFN(PFN("file://" + inquirepath, "condorpool")) dax.addExecutable(e_inquire) sayhi inquire f.a f.b f.c

How does Pegasus Work? Step 2. Convert Workflow to DAX – Add jobs to the DAX # Add the sayhi job sayhi = Job(namespace="sayhi_inquire", \ name="sayhi", version="1.0") sayhi.addArguments('f.a') b = File("f.b") sayhi.uses(a, link=Link.INPUT) sayhi.uses(b, link=Link.OUTPUT) dax.addJob(sayhi) # Add the inquire job (depends on the sayhi job) inquire = Job(namespace="sayhi_inquire", \ name="inquire", version="1.0") inquire.addArguments('f.b') c = File("f.c") inquire.uses(b, link=Link.INPUT) inquire.uses(c, link=Link.OUTPUT) dax.addJob(inquire) sayhi inquire f.a f.b f.c

How does Pegasus Work? Step 2. Convert Workflow to DAX – Add jobs to the DAX # Add the sayhi job sayhi = Job(namespace="sayhi_inquire", \ name="sayhi", version="1.0") sayhi.addArguments('f.a') b = File("f.b") sayhi.uses(a, link=Link.INPUT) sayhi.uses(b, link=Link.OUTPUT) dax.addJob(sayhi) # Add the inquire job (depends on the sayhi job) inquire = Job(namespace="sayhi_inquire", \ name="inquire", version="1.0") inquire.addArguments('f.b') c = File("f.c") inquire.uses(b, link=Link.INPUT) inquire.uses(c, link=Link.OUTPUT) dax.addJob(inquire) sayhi inquire f.a f.b f.c

How does Pegasus Work? Step 2. Convert Workflow to DAX – Add control-flow dependencies # Add control-flow dependencies dax.addDependency(Dependency(parent=sayhi, child=inquire)) sayhi inquire f.a f.b f.c

How does Pegasus Work? Step 2. Convert Workflow to DAX – Write DAX to file # Write the DAX to file with open('sayhiinquire.dax','w') as fp: dax.writeXML(fp) sayhi inquire f.a f.b f.c

Running the DAX Step 3. Convert Workflow to DAX – Write DAX to file $ submit pegasus-plan --dax sayhiinquire.dax sayhi inquire f.a f.b f.c

User's Workspace Terminal Grid HUBzero Infrastructure Running The DAX Tool Session Containers $ submit pegasus-plan --dax sayhiinquire.dax (989.0) Job Submitted at WF-DiaGrid (989.0) DAG Running at WF-DiaGrid … (989.0) DAG Done at WF-DiaGrid $ cat f.b Hello pete! $ cat f.c Hello pete! How are you? Grid Submit Proxy

User's Workspace Terminal Grid HUBzero Infrastructure Try creating and running a DAX Tool Session Containers $ use pegasus $ geany f.a $ cp -r /apps/pegtut/current/examples/sayhi_inquire. $ cd sayhi_inquire $./createdax.py $ submit pegasus-plan –dax sayhiinquire.dax Grid Submit Proxy