Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.

Slides:



Advertisements
Similar presentations
3 September 2004NVO Coordination Meeting1 Grid-Technologies NVO and the Grid Reagan W. Moore George Kremenek Leesa Brieger Ewa Deelman Roy Williams John.
Advertisements

Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Universität Dortmund Robotics Research Institute Information Technology Section Grid Metaschedulers An Overview and Up-to-date Solutions Christian.
Ewa Deelman, Optimizing for Time and Space in Distributed Scientific Workflows Ewa Deelman University.
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
An Astronomical Image Mosaic Service for the National Virtual Observatory
A Grid-Enabled Engine for Delivering Custom Science- Grade Images on Demand
An Astronomical Image Mosaic Service for the National Virtual Observatory / ESTO.
Ewa Deelman Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.
1 The Application-Infrastructure Gap Dynamic and/or Distributed Applications A 1 B Shared Distributed Infrastructure.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
Managing Workflows with the Pegasus Workflow Management System
Ewa Deelman, Pegasus and DAGMan: From Concept to Execution Mapping Scientific Workflows onto the National.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta,
Why Build Image Mosaics for Wide Area Surveys? An All-Sky 2MASS Mosaic Constructed on the TeraGrid A. C. Laity, G. B. Berriman, J. C. Good (IPAC, Caltech);
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.
10/20/05 LIGO Scientific Collaboration 1 LIGO Data Grid: Making it Go Scott Koranda University of Wisconsin-Milwaukee.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information.
INFSO-RI Enabling Grids for E-sciencE The US Federation Miron Livny Computer Sciences Department University of Wisconsin – Madison.
Large-Scale Science Through Workflow Management Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Combining the strengths of UMIST and The Victoria University of Manchester Utility Driven Adaptive Workflow Execution Kevin Lee School of Computer Science,
 The workflow description modified to output a VDS DAX.  The workflow description toolkit developed allows any concrete workflow description to be migrated.
Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute.
Pegasus: Planning for Execution in Grids Ewa Deelman Information Sciences Institute University of Southern California.
GriPhyN Status and Project Plan Mike Wilde Mathematics and Computer Science Division Argonne National Laboratory.
Scientific Workflow Scheduling in Computational Grids Report: Wei-Cheng Lee 8th Grid Computing Conference IEEE 2007 – Planning, Reservation,
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
Dr. Ahmed Abdeen Hamed, Ph.D. University of Vermont, EPSCoR Research on Adaptation to Climate Change (RACC) Burlington Vermont USA MODELING THE IMPACTS.
Grid Workload Management Massimo Sgaravatto INFN Padova.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Combining the strengths of UMIST and The Victoria University of Manchester Adaptive Workflow Processing and Execution in Pegasus Kevin Lee School of Computer.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida DGL: The Assembly Language for Grid Computing Arun swaran Jagatheesan.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
GriPhyN Virtual Data System Grid Execution of Virtual Data Workflows Mike Wilde Argonne National Laboratory Mathematics and Computer Science Division.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Pegasus WMS: Leveraging Condor for Workflow Management Ewa Deelman, Gaurang Mehta, Karan Vahi, Gideon Juve, Mats Rynge, Prasanth.
Pegasus-a framework for planning for execution in grids Karan Vahi USC Information Sciences Institute May 5 th, 2004.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for.
Planning Ewa Deelman USC Information Sciences Institute GriPhyN NSF Project Review January 2003 Chicago.
Pegasus: Planning for Execution in Grids Ewa Deelman, Carl Kesselman, Gaurang Mehta, Gurmeet Singh, Karan Vahi Information Sciences Institute University.
GriPhyN Project Paul Avery, University of Florida, Ian Foster, University of Chicago NSF Grant ITR Research Objectives Significant Results Approach.
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
1 Pegasus and wings WINGS/Pegasus Provenance Challenge Ewa Deelman Yolanda Gil Jihie Kim Gaurang Mehta Varun Ratnakar USC Information Sciences Institute.
1 Artemis: Integrating Scientific Data on the Grid Rattapoom Tuchinda Snehal Thakkar Yolanda Gil Ewa Deelman.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
Ewa Deelman, Managing Scientific Workflows on OSG with Pegasus Ewa Deelman USC Information Sciences.
Resource Allocation and Scheduling for Workflows Gurmeet Singh, Carl Kesselman, Ewa Deelman.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Pegasus WMS Extends DAGMan to the grid world
Cloudy Skies: Astronomy and Utility Computing
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Ewa Deelman University of Southern California
STORK: A Scheduler for Data Placement Activities in Grid
Mats Rynge USC Information Sciences Institute
Frieda meets Pegasus-WMS
Presentation transcript:

Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute

Ewa DeelmanInformation Sciences Institute Pegasus Acknowledgements l Ewa Deelman, Carl Kesselman, Saurabh Khurana, Gaurang Mehta, Sonal Patil, Gurmeet Singh, Mei- Hui Su, Karan Vahi (Center for Grid Computing, ISI) l James Blythe, Yolanda Gil (Intelligent Systems Division, ISI) l Collaboration with Miron Livny (UW Madison) l l Research funded as part of the NSF GriPhyN, NVO and SCEC projects and EU-funded GridLab

Ewa DeelmanInformation Sciences Institute Outline l Workflow Management in Grids l Pegasus, Planning for Execution in Grids l Applications Using Pegasus l In-time planning l Future Research Directions

Ewa DeelmanInformation Sciences Institute Grid Applications l Increasing in the level of complexity l Use of individual application components l Reuse of individual intermediate data products (files) l Description of Data Products using Metadata Attributes l Execution environment is complex and very dynamic u Resources come and go u Data is replicated u Components can be found at various locations or staged in on demand l Separation between u the application description u the actual execution description

Ewa DeelmanInformation Sciences Institute Abstract Workflow Generation Concrete Workflow Generation

Ewa DeelmanInformation Sciences Institute Why Automate Workflow Generation? l Usability: Limit User’s necessary Grid knowledge l Monitoring and Directory Service l Replica Location Service l Complexity: u User needs to make choices l Alternative application components l Alternative files l Alternative locations u The user may reach a dead end u Many different interdependencies may occur among components l Solution cost: u Evaluate the alternative solution costs l Performance l Reliability l Resource Usage l Global cost: u minimizing cost within a community or a virtual organization u requires reasoning about individual user’s choices in light of other user’s choices

Ewa DeelmanInformation Sciences Institute GriPhyN’s Executable Workflow Construction l Build an abstract workflow based on VDL descriptions (Chimera) l Build an executable workflow based on the abstract workflows (Pegasus) l Execute the workflow (Condor’s DAGMan)

Ewa DeelmanInformation Sciences Institute VDL and Abstract Workflow VDL descriptions User request data file “c” Abstract Workflow

Ewa DeelmanInformation Sciences Institute Condor’s DAGMan l Developed at UW Madison (Livny) l Executes a concrete workflow l Makes sure the dependencies are followed l Execute the jobs specified in the workflow u Execution u Data movement u Catalog updates l Provides a “rescue DAG” in case of failure

Ewa DeelmanInformation Sciences Institute Pegasus: Planning for Execution in Grids l Maps from abstract to concrete workflow u Algorithmic and AI-based techniques l Automatically locates physical locations for both components (transformations) and data l Finds appropriate resources to execute l Reuses existing data products where applicable l Publishes newly derived data products u Chimera virtual data catalog u Provides provenance information

Ewa DeelmanInformation Sciences Institute Information Components Used by Pegasus l Globus Monitoring and Discovery Service (MDS) u Locates available resources u Finds resource properties l Dynamic: load, queue length l Static: location of gridftp server, RLS, etc l Globus Replica Location Service u Locates data that may be replicated u Registers new data products l Transformation Catalog u Locates installed executables

Ewa DeelmanInformation Sciences Institute Example Workflow Reduction l Original abstract workflow l If “b” already exists (as determined by query to the RLS), the workflow can be reduced

Ewa DeelmanInformation Sciences Institute Mapping from abstract to concrete l Query RLS, MDS, and TC, schedule computation and data movement

Ewa DeelmanInformation Sciences Institute Montage l Montage (NASA and NVO) u Deliver science-grade custom mosaics on demand u Produce mosaics from a wide range of data sources (possibly in different spectra) u User-specified parameters of projection, coordinates, size, rotation and spatial sampling. Mosaic created by Pegasus based Montage from a run of the M101 galaxy images on the Teragrid.

Ewa DeelmanInformation Sciences Institute Small Montage Workflow ~1200 nodes

Ewa DeelmanInformation Sciences Institute Montage Acknowledgments l Bruce Berriman, John Good, Anastasia Laity, Caltech/IPAC l Joseph C. Jacob, Daniel S. Katz, JPL l caltech.edu/ l Testbed for Montage: Condor pools at USC/ISI, UW Madison, and Teragrid resources at NCSA, PSC, and SDSC. Montage is funded by the National Aeronautics and Space Administration's Earth Science Technology Office, Computational Technologies Project, under Cooperative Agreement Number NCC5-626 between NASA and the California Institute of Technology.

Ewa DeelmanInformation Sciences Institute Applications Using Chimera, Pegasus and DAGMan l GriPhyN applications: u High-energy physics: Atlas, CMS (many) u Astronomy: SDSS (Fermi Lab, ANL) u Gravitational-wave physics: LIGO (Caltech, AEI) l Astronomy: u Galaxy Morphology (NCSA, JHU, Fermi, many others, NVO-funded) l Biology u BLAST (ANL, PDQ-funded) l Neuroscience u Tomography for Telescience(SDSC, NIH-funded)

Ewa DeelmanInformation Sciences Institute Current System

Ewa DeelmanInformation Sciences Institute time Levels of abstraction Application -level knowledge Logical tasks Tasks bound to resources and sent for execution User’s Request Relevant components Full abstract workflow Partial execution Not yet executed Workflow refinement Task matchmaker Workflow repair Policy info Workflow Refinement and execution

Ewa DeelmanInformation Sciences Institute Incremental Refinement l Partition Abstract workflow into partial workflows

Ewa DeelmanInformation Sciences Institute Meta-DAGMan

Ewa DeelmanInformation Sciences Institute Conclusions l Pegasus maps complex workflows onto the Grid l Uses Grid information services to find resources, data and executables l Reduces the workflow based on existing intermediate products l Used in many applications l Part of GriPhyN’s Virtual Data Toolkit

Ewa DeelmanInformation Sciences Institute Future Directions l Investigate various scheduling techniques l Investigating fault tolerance issues l Enable flexible interactions between workflow refiners (GriPhyN-wide scope: Pegasus, DAGMan) l GGF10 workshop on workflow management l GGF Workflow management research group

Ewa DeelmanInformation Sciences Institute Summary: The Grid Now l Syntax-based matchmaking of resources to job requirements u Condor matchmaker u Attribute based discovery and selection l Scheduling of jobs based on Grid-able users that specify job execution sequences and computing requirements u Scripting languages u Workflow languages, u Task graphs l Explicit mappings from task to jobs, simple job brokers l Explicit service negotiation and recovery strategies The Future Grid l Knowledge-based reasoning about resources enables u Semantic matchmaking u Aggregate resource reasoning l Task-level reasoning to plan and schedule jobs and resources u More agility and coordination l Wide range of users can specify high level requirements in a mixed- initiative mode u Mapping of high-level requirements to details required for execution l End-to-end resource negotiation and adaptive strategies to accommodate failure