1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.

Slides:



Advertisements
Similar presentations
Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
Advertisements

Managing Workflows Within HUBzero: How to Use Pegasus to Execute Computational Pipelines Ewa Deelman USC Information Sciences Institute Acknowledgement:
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
Distributed Computations
Ewa Deelman, Optimizing for Time and Space in Distributed Scientific Workflows Ewa Deelman University.
Ewa Deelman, Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences.
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Ewa Deelman Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.
Ewa Deelman, Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences.
Managing Workflows with the Pegasus Workflow Management System
Ewa Deelman, Pegasus and DAGMan: From Concept to Execution Mapping Scientific Workflows onto the National.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.
Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta,
1 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Part II Designing Workflows AAAI-08 Tutorial on Computational.
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.
Large-Scale Science Through Workflow Management Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Managing large-scale workflows with Pegasus Karan Vahi ( Collaborative Computing Group USC Information Sciences Institute Funded.
Experiences Using Cloud Computing for A Scientific Workflow Application Jens Vöckler, Gideon Juve, Ewa Deelman, Mats Rynge, G. Bruce Berriman Funded by.
Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute.
Pegasus: Planning for Execution in Grids Ewa Deelman Information Sciences Institute University of Southern California.
1 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 AAAI-08 Tutorial on Computational Workflows for Large-Scale.
Dr. Ahmed Abdeen Hamed, Ph.D. University of Vermont, EPSCoR Research on Adaptation to Climate Change (RACC) Burlington Vermont USA MODELING THE IMPACTS.
Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
CyberShake Study 15.4 Technical Readiness Review.
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part VI AI Workflows AAAI-08 Tutorial on Computational Workflows.
GriPhyN Virtual Data System Grid Execution of Virtual Data Workflows Mike Wilde Argonne National Laboratory Mathematics and Computer Science Division.
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar University of Wisconsin-Madison May 25 th, 2004 CERN.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Pegasus WMS: Leveraging Condor for Workflow Management Ewa Deelman, Gaurang Mehta, Karan Vahi, Gideon Juve, Mats Rynge, Prasanth.
Experiment Management from a Pegasus Perspective Jens-S. Vöckler Ewa Deelman
Pegasus-a framework for planning for execution in grids Karan Vahi USC Information Sciences Institute May 5 th, 2004.
Planning Ewa Deelman USC Information Sciences Institute GriPhyN NSF Project Review January 2003 Chicago.
Pegasus: Planning for Execution in Grids Ewa Deelman, Carl Kesselman, Gaurang Mehta, Gurmeet Singh, Karan Vahi Information Sciences Institute University.
CyberShake Study 15.3 Science Readiness Review. Study 15.3 Scientific Goals Calculate a 1 Hz map of Southern California Produce meaningful 2 second results.
Experiences Running Seismic Hazard Workflows Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF.
Southern California Earthquake Center SCEC Collaboratory for Interseismic Simulation and Modeling (CISM) Infrastructure Philip J. Maechling (SCEC) September.
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
1 Pegasus and wings WINGS/Pegasus Provenance Challenge Ewa Deelman Yolanda Gil Jihie Kim Gaurang Mehta Varun Ratnakar USC Information Sciences Institute.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
Ewa Deelman, Managing Scientific Workflows on OSG with Pegasus Ewa Deelman USC Information Sciences.
Resource Allocation and Scheduling for Workflows Gurmeet Singh, Carl Kesselman, Ewa Deelman.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Overview of Scientific Workflows: Why Use Them?
Pegasus WMS Extends DAGMan to the grid world
Cloudy Skies: Astronomy and Utility Computing
Dynamic Deployment of VO Specific Condor Scheduler using GT4
U.S. ATLAS Grid Production Experience
Simplify Your Science with Workflow Tools
Seismic Hazard Analysis Using Distributed Workflows
CyberShake Study 16.9 Discussion
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Clouds: An Opportunity for Scientific Applications?
Ewa Deelman University of Southern California
Pegasus Workflows on XSEDE
Overview of Workflows: Why Use Them?
Mats Rynge USC Information Sciences Institute
rvGAHP – Push-Based Job Submission Using Reverse SSH Connections
High Throughput Computing for Astronomers
A General Approach to Real-time Workflow Monitoring
CyberShake Study 14.2 Science Readiness Review
Frieda meets Pegasus-WMS
CyberShake Study 18.8 Technical Readiness Review
CyberShake Study 18.8 Planning
Presentation transcript:

1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks to Ewa Deelman) AAAI-08 Tutorial on Computational Workflows for Large-Scale Artificial Intelligence Research

2 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Pegasus-Workflow Management System Leverages abstraction for workflow description to obtain ease of use, scalability, and portability Provides a compiler to map from high-level descriptions to executable workflows –Correct mapping –Performance enhanced mapping Provides a runtime engine to carry out the instructions –Scalable manner –Reliable manner Ewa Deelman, Gaurang Mehta, Karan Vahi (USC/ISI) in collaboration with Miron Livny (UW Madison)

3 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Underlying Grid Middleware Services Pegasus uses Globus ( grid services: Gridftp for efficient and reliable data transfer Grid proxys/certificates Pegasus uses the Condor ( job management system: CondorG for job submission to shared resources DAGman to manage job interdependencies

4 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Basic Workflow Mapping Select where to run the computations Apply a scheduling algorithm –HEFT, min-min, round-robin, random –The quality of the scheduling depends on the quality of information Transform task nodes into nodes with executable descriptions –Execution location –Environment variables initializes –Appropriate command-line parameters set Select which data to access Add stage-in nodes to move data to computations Add stage-out nodes to transfer data out of remote sites to storage Add data transfer nodes between computation nodes that execute on different resources

5 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Basic Workflow Mapping Add nodes to create an execution directory on a remote site Add nodes that register the newly-created data products Add data cleanup nodes to remove data from remote sites when no longer needed reduces workflow data footprint Provide provenance capture steps Information about source of data, executables invoked, environment variables, parameters, machines used, performance Ewa Deelman,

Pegasus Workflow Mapping Original workflow: 15 compute nodes devoid of resource assignment Resulting workflow mapped onto 3 Grid sites: 11 compute nodes (4 reduced based on available intermediate data) 12 data stage-in nodes 8 inter-site data transfers 14 data stage-out nodes to long- term storage 14 data registration nodes (data cataloging) jobs to execute

7 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Some challenges in workflow mapping Automated management of data Through workflow modification Efficient mapping the workflow instances to resources Performance Data space optimizations Fault tolerance (involves interfacing with the workflow execution system) –Recovery by replanning –plan “B” Mapping not a one shot thing Providing feedback to the user Feasibility, time estimates

8 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Pegasus Deployment

9 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Node clustering Useful for small granularity jobs Level- based clustering Vertical clustering Arbitrary clustering

10 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Data Reuse When it is cheaper to access the data than to regenerate it Keeping track of data as it is generated supports workflow- level checkpointing Mapping Complex Workflows Onto Grid Environments, E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, K. Backburn, A. Lazzarini, A. Arbee, R. Cavanaugh, S. Koranda, Journal of Grid Computing, Vol.1, No. 1, 2003., pp25-39.

11 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Efficient data handling Input data is staged dynamically New data products are generated during execution For large workflows 10,000+ files Similar order of intermediate and output files Total space occupied is far greater than available space—failures occur Solution: Determine which data is no longer needed and when Add nodes to the workflow do cleanup data along the way Issues: minimize the number of nodes and dependencies added so as not to slow down workflow execution deal with portions of workflows scheduled to multiple sites deal with files on partition boundaries

12 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Workflow Footprint In order to improve the workflow footprint, we need to determine when data are no longer needed: Because data was consumed by the next component and no other component needs it Because data was staged-out to permanent storage Because data are no longer needed on a resource and have been stage-out to the resource that needs it Ewa Deelman,

13 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Cleanup Disk Space as Workflow Progresses

14 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, % improvement 56% improvement LIGO Workflows Full workflow: 185,000 nodes 466,000 edges 10 TB of input data 1 TB of output data. 166 nodes

15 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Montage application ~7,000 compute jobs in workflow instance ~10,000 nodes in the executable workflow same number clusters as processors speedup of ~15 on 32 processors Small 1,200 Montage Workflow Montage [Deelman et al 05]

16 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Pegasus + Condor + TeraGrid for Southern California Earthquake Center (SCEC) [Deelman et al 06] Executable workflow Hazard Map Condor Glide- ins Pegasus Condor DAGManGlobus Provenance Tracking Catalog

17 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Pegasus: Scale [Deelman et al 06] SCEC workflows run each week using Pegasus and DAGMan on the TeraGrid and USC resources. Cumulatively, the workflows consisted of over half a million tasks and used over 2.5 CPU Years, Largest workflow O(100,000) nodes Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance tracking: The CyberShake Example, Ewa Deelman, Scott Callaghan, Edward Field, Hunter Francoeur, Robert Graves, Nitin Gupta, Vipin Gupta, Thomas H. Jordan, Carl Kesselman, Philip Maechling, John Mehringer, Gaurang Mehta, David Okaya, Karan Vahi, Li Zhao, e-Science 2006, Amsterdam, December 4-6, 2006, best paper award