Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute

Slides:



Advertisements
Similar presentations
Database System Concepts and Architecture
Advertisements

Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
Managing Workflows Within HUBzero: How to Use Pegasus to Execute Computational Pipelines Ewa Deelman USC Information Sciences Institute Acknowledgement:
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
Ewa Deelman, Optimizing for Time and Space in Distributed Scientific Workflows Ewa Deelman University.
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Minerva Infrastructure Meeting – October 04, 2011.
Managing Workflows with the Pegasus Workflow Management System
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Ewa Deelman, Pegasus and DAGMan: From Concept to Execution Mapping Scientific Workflows onto the National.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.
Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta,
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.
OSG Public Storage and iRODS
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Introduction to Scientific Workflows and Pegasus Karan Vahi Science Automation Technologies Group USC Information Sciences Institute.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Managing large-scale workflows with Pegasus Karan Vahi ( Collaborative Computing Group USC Information Sciences Institute Funded.
Combining the strengths of UMIST and The Victoria University of Manchester Utility Driven Adaptive Workflow Execution Kevin Lee School of Computer Science,
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Dr. Ahmed Abdeen Hamed, Ph.D. University of Vermont, EPSCoR Research on Adaptation to Climate Change (RACC) Burlington Vermont USA MODELING THE IMPACTS.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
GriPhyN Virtual Data System Grid Execution of Virtual Data Workflows Mike Wilde Argonne National Laboratory Mathematics and Computer Science Division.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Pegasus WMS: Leveraging Condor for Workflow Management Ewa Deelman, Gaurang Mehta, Karan Vahi, Gideon Juve, Mats Rynge, Prasanth.
Experiment Management from a Pegasus Perspective Jens-S. Vöckler Ewa Deelman
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Pegasus-a framework for planning for execution in grids Karan Vahi USC Information Sciences Institute May 5 th, 2004.
Planning Ewa Deelman USC Information Sciences Institute GriPhyN NSF Project Review January 2003 Chicago.
LIGO Plans for OSG J. Kent Blackburn LIGO Laboratory California Institute of Technology Open Science Grid Technical Meeting UCSD December 15-17, 2004.
Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center.
GraDS MacroGrid Carl Kesselman USC/Information Sciences Institute.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
Ewa Deelman, Managing Scientific Workflows on OSG with Pegasus Ewa Deelman USC Information Sciences.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
VO Experiences with Open Science Grid Storage OSG Storage Forum | Wednesday September 22, 2010 (10:30am)
HUBzero® Platform for Scientific Collaboration Copyright © 2012 HUBzero Foundation, LLC International Workshop on Science Gateways, ETH Zürich, June 3-5,
Pegasus WMS Extends DAGMan to the grid world
U.S. ATLAS Grid Production Experience
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Ewa Deelman University of Southern California
Overview of Workflows: Why Use Them?
Mats Rynge USC Information Sciences Institute
High Throughput Computing for Astronomers
A General Approach to Real-time Workflow Monitoring
Frieda meets Pegasus-WMS
Presentation transcript:

Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute

Pegasus: Planning for Execution in Grids Abstract Workflows - Pegasus input workflow description workflow “high-level language” only identifies the computations that a user wants to do devoid of resource descriptions devoid of data locations Pegasus a workflow “compiler” target language - DAGMan’s DAG and Condor submit files transforms the workflow for performance and reliability automatically locates physical locations for both workflow components and data finds appropriate resources to execute the components provides runtime provenance

Typical Pegasus Deployment Pegasus and DAGMan are client tools No special requirements on the infrastructure

Basic Workflow Mapping Select where to run the computations Change task nodes into nodes with executable descriptions Execution location Environment variables initializes Appropriate command-line parameters set Select which data to access Add stage-in nodes to move data to computations Add stage-out nodes to transfer data out of remote sites to storage Add data transfer nodes between computation nodes that execute on different resources

Basic Workflow Mapping Add nodes that register the newly-created data products Add nodes to create an execution directory on a remote site Write out the workflow in a form understandable by a workflow engine Include provenance capture steps

Pegasus Workflow Mapping Original workflow: 15 compute nodes devoid of resource assignment Resulting workflow mapped onto 3 Grid sites: 13 data stage-in nodes 11 compute nodes (4 reduced based on available intermediate data) 8 inter-site data transfers 14 data stage-out nodes to long- term storage 14 data registration nodes (data cataloging) tasks

Catalogs used for discovery To execute on the OSG Pegasus needs to discover Data ( the input data that is required by the LIGO workflows ) Executables ( Are there any LIGO executables installed before hand) Site Layout (What are the services running on an OSG site)

Discovery of Data Replica Catalog stores mappings between logical files and their target locations. LIGO tracks it’s data through LDR (based on Globus RLS). Pegasus queries LDR discover input files for the workflow track data products created data reuse Pegasus also interfaces with a variety of replica catalogs File based Replica Catalog useful for small datasets cannot be shared across users. Database based Replica Catalog useful for medium sized datasets. can be used across users. How to: A single client rc-client to interface with all type of replica catalogs

Discovery of Site Layout Pegasus queries a site catalog to discover site layout Installed job-managers for different types of schedulers Installed GridFTP servers Local Replica Catalogs where data residing in that site has to be catalogued Site Wide Profiles like environment variables Work and storage directories For the OSG, Pegasus interfaces with VORS (Virtual Organization Resource Selector) to generate a site catalog for OSG How to: A single client pegasus-get-sites to generate site catalog for OSG, Teragrid

Discovery of Executables Transformation Catalog maps logical transformations to their physical locations Used to discover application codes installed on the grid sites discover statically compiled codes, that can be deployed at grid sites on demand LIGO Workflows Do not rely on preinstalled application executables These executables are staged by Pegasus, at runtime to the remote OSG sites. How to: A single client tc-client to interface with all type of transformation catalogs

Simple Steps to run on OSG 1. Specify your computation in terms of DAX Write a simple DAX (abstract workflow)generator Java based API provided with Pegasus Details on 2. Set up your catalogs Use pegasus-get-sites to generate site catalog and transformation catalog for OSG Record the locations of your input files in a replica client using rc-client 3. Plan your workflow Use pegasus-plan to generate your executable workflow that is mapped to OSG 4. Submit your workflow Use pegasus-run to submit your workflow 5. Monitor your workflow Use pegasus-status to monitor the execution of your workflow

Workflow Reduction (Data Reuse) How to: Files need to be cataloged in replica catalog at runtime. The registration flags for these files need to be set in the DAX

Pegasus: Efficient Space Usage Similar order of intermediate/output files Not enough space-failures occur Solution: Determine which data are no longer needed and when Add nodes to the workflow do cleanup data along the way Benefits: can significantly decrease the amount of space needed to run a workflow Input data is staged dynamically, new data products are generated during execution For large workflows 10,000+ input files Next steps: Work on a Storage-Aware Scheduling Algorithm How to: Dynamic cleanup by default is on. To turn it off, pass --nocleanup to pegasus-plan

Job clustering Useful for small granularity jobs Level-based clustering Vertical clustering Arbitrary clustering How to: To turn job clustering on, pass --cluster to pegasus-plan

Montage application ~7,000 compute jobs in instance ~10,000 nodes in the executable workflow same number of clusters as processors speedup of ~15 on 32 processors Small 1,200 Montage Workflow Performance optimization through workflow restructuring

Ewa Deelman, Managing execution environment changes through partitioning How to: 1) Partition the workflow into smaller partitions at runtime using partitiondax tool. 2) Pass the partitioned dax to pegasus-plan using the --pdax option. Provides reliability—can replan at partition-level Provides scalability—can handle portions of the workflow at a time

Reliability Job Level Retry Leverages DAGMAN retry capabilities. Allows us to overcome transient grid failures. Re-planning Partitions can be re-planned in case of errors. Only happens after retries have been exhausted. Try alternative data sources for staging data Provide a rescue-DAG when all else fails Clustering of data transfers and execution tasks, alleviates: GridFTP server overloads High load on head node.

Helping to meet LIGO/OSG milestones Work done by Kent Blackburn, David Meyers, Michael Samidi, Caltech 1 st Milestone ( month 4): Run at UCSD with DC of 25 slots for one week 2 nd Milestone (month 8): Run on OSG with DC of 100 slots for one week

Future OSG-focused Developments Working towards 3 rd Milestone: (month 12): reach 1000 slots peak in MonaLisa on OSG Limit use of shared file systems / job manager fork / total number of jobs. Use of node clustering techniques Support for the placement of the data and jobs in temporary OSG directories. Support for local disk I/O and vertical clustering of jobs SRM/dcache support for Pegasus workflows

What does Pegasus do for an application? Provides an OSG-aware workflow management tool Queries OSG services (like VORS) to get information about sites in the OSG Deploys user executables as part of the workflow. Reduced Storage footprint. Data is also cleaned as the workflow progresses. Data Management within the workflow Interfaces with the variety of Replica Catalog’s (including RLS) to discover data Does replica selection to select replicas. Manages data transfer by interfacing to various transfer services like RFT, Stork and clients like globus-url-copy. No need to stage-in data before hand. We do it within the workflow as and when it is required. Improves application performance and execution Job clustering Support for condor glidein’s Techniques exist to minimize load on remote Grid resources during large scale execution of workflows. Data Reuse Avoids duplicate computations Can reuse data that has been generated earlier.

Pegasus developments in the next 2 years New releases will continue to be included in VDT Continued improvements in documentation Development of accounting capabilitities Improvements in the monitoring capabilities Providing command-line and gui tools to view the progress of the workflow Development of new debugging and diagnostic tools Debugging across Pegasus/DAGMan/Grid Research areas: Management of multiple workflows Automatic resource provisioning Welcome community input on future developments:

Relevant Links Pegasus: Distributed as part of VDT (Standalone version in VDT 1.7 and later) Can be downloaded directly from Interested in trying out Pegasus Do the tutorial Send to to do tutorial on ISI Quickstart Guide Available at More detailed documentation appearing soon. Relevant Papers Support lists We are looking for new applications and developers interested in using our tools