Pegasus: Planning for Execution in Grids Ewa Deelman Information Sciences Institute University of Southern California.

Slides:



Advertisements
Similar presentations
3 September 2004NVO Coordination Meeting1 Grid-Technologies NVO and the Grid Reagan W. Moore George Kremenek Leesa Brieger Ewa Deelman Roy Williams John.
Advertisements

Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
Ewa Deelman, Optimizing for Time and Space in Distributed Scientific Workflows Ewa Deelman University.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Montage: Experiences in Astronomical Image Mosaicking on the TeraGrid ESTO.
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
An Astronomical Image Mosaic Service for the National Virtual Observatory
A Grid-Enabled Engine for Delivering Custom Science- Grade Images on Demand
An Astronomical Image Mosaic Service for the National Virtual Observatory / ESTO.
Ewa Deelman Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.
CREATING A MULTI-WAVELENGTH GALACTIC PLANE ATLAS WITH AMAZON WEB SERVICES G. Bruce Berriman, John Good IPAC, California Institute of Technolog y Ewa Deelman,
1 The Application-Infrastructure Gap Dynamic and/or Distributed Applications A 1 B Shared Distributed Infrastructure.
Managing Workflows with the Pegasus Workflow Management System
Ewa Deelman, Pegasus and DAGMan: From Concept to Execution Mapping Scientific Workflows onto the National.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
LIGO Meeting LIGO Scientific Collaboration - University of Wisconsin - Milwaukee LIGO-G Z SuperComputing 2003 grid-distributed wide-area.
Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta,
Why Build Image Mosaics for Wide Area Surveys? An All-Sky 2MASS Mosaic Constructed on the TeraGrid A. C. Laity, G. B. Berriman, J. C. Good (IPAC, Caltech);
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.
LIGO-G9900XX-00-M LIGO and GriPhyN Carl Kesselman Ewa Deelman Sachin Chouksey Roy Williams Peter Shawhan Albert Lazzarini Kent Blackburn CaltechUSC/ISI.
10/20/05 LIGO Scientific Collaboration 1 LIGO Data Grid: Making it Go Scott Koranda University of Wisconsin-Milwaukee.
LIGO- G E LIGO Grid Applications Development Within GriPhyN Albert Lazzarini LIGO Laboratory, Caltech GriPhyN All Hands.
Patrick R Brady University of Wisconsin-Milwaukee
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
INFSO-RI Enabling Grids for E-sciencE The US Federation Miron Livny Computer Sciences Department University of Wisconsin – Madison.
Large-Scale Science Through Workflow Management Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Combining the strengths of UMIST and The Victoria University of Manchester Utility Driven Adaptive Workflow Execution Kevin Lee School of Computer Science,
 The workflow description modified to output a VDS DAX.  The workflow description toolkit developed allows any concrete workflow description to be migrated.
Experiences Using Cloud Computing for A Scientific Workflow Application Jens Vöckler, Gideon Juve, Ewa Deelman, Mats Rynge, G. Bruce Berriman Funded by.
Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute.
GriPhyN Status and Project Plan Mike Wilde Mathematics and Computer Science Division Argonne National Laboratory.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
Dr. Ahmed Abdeen Hamed, Ph.D. University of Vermont, EPSCoR Research on Adaptation to Climate Change (RACC) Burlington Vermont USA MODELING THE IMPACTS.
Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
GriPhyN EAC Meeting (Jan. 7, 2002)Carl Kesselman1 University of Southern California GriPhyN External Advisory Committee Meeting Gainesville,
Combining the strengths of UMIST and The Victoria University of Manchester Adaptive Workflow Processing and Execution in Pegasus Kevin Lee School of Computer.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
Part 9: MyProxy Pragmatics This presentation and lab ends the GRIDS Center agenda Q: When do we convene again tomorrow?
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
GriPhyN Virtual Data System Grid Execution of Virtual Data Workflows Mike Wilde Argonne National Laboratory Mathematics and Computer Science Division.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Pegasus-a framework for planning for execution in grids Karan Vahi USC Information Sciences Institute May 5 th, 2004.
Planning Ewa Deelman USC Information Sciences Institute GriPhyN NSF Project Review January 2003 Chicago.
Pegasus: Planning for Execution in Grids Ewa Deelman, Carl Kesselman, Gaurang Mehta, Gurmeet Singh, Karan Vahi Information Sciences Institute University.
State of LSC Data Analysis and Software LSC Meeting LIGO Hanford Observatory November 11 th, 2003 Kent Blackburn, Stuart Anderson, Albert Lazzarini LIGO.
GriPhyN Project Paul Avery, University of Florida, Ian Foster, University of Chicago NSF Grant ITR Research Objectives Significant Results Approach.
LIGO-G Z LSC ASIS Meeting LIGO Scientific Collaboration - University of Wisconsin - Milwaukee 1 Update on November 2003 grid-distributed.
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
LIGO-G Z1 Using Condor for Large Scale Data Analysis within the LIGO Scientific Collaboration Duncan Brown California Institute of Technology.
1 Pegasus and wings WINGS/Pegasus Provenance Challenge Ewa Deelman Yolanda Gil Jihie Kim Gaurang Mehta Varun Ratnakar USC Information Sciences Institute.
1 Artemis: Integrating Scientific Data on the Grid Rattapoom Tuchinda Snehal Thakkar Yolanda Gil Ewa Deelman.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
Ewa Deelman, Managing Scientific Workflows on OSG with Pegasus Ewa Deelman USC Information Sciences.
Resource Allocation and Scheduling for Workflows Gurmeet Singh, Carl Kesselman, Ewa Deelman.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Pegasus WMS Extends DAGMan to the grid world
Cloudy Skies: Astronomy and Utility Computing
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Ewa Deelman University of Southern California
Mats Rynge USC Information Sciences Institute
Frieda meets Pegasus-WMS
Presentation transcript:

Pegasus: Planning for Execution in Grids Ewa Deelman Information Sciences Institute University of Southern California

Ewa DeelmanInformation Sciences Institute Pegasus Acknowledgements l Ewa Deelman, Carl Kesselman, Saurabh Khurana, Gaurang Mehta, Sonal Patil, Gurmeet Singh, Mei-Hui Su, Karan Vahi (ISI) l James Blythe, Yolanda Gil (ISI) l l Research funded as part of the NSF GriPhyN, NVO and SCEC projects.

Ewa DeelmanInformation Sciences Institute Outline l General Scientific Workflow Issues on the Grid u Mapping complex applications onto the Grid l Pegasus l Pegasus Application Portal u LIGO-gravitational-wave physics u Montage-astronomy l Incremental Workflow Refinement l Futures

Ewa DeelmanInformation Sciences Institute Grid Applications l Increasing in the level of complexity l Use of individual application components l Reuse of individual intermediate data products (files) l Description of Data Products using Metadata Attributes l Execution environment is complex and very dynamic u Resources come and go u Data is replicated u Components can be found at various locations or staged in on demand l Separation between u the application description u the actual execution description

Ewa DeelmanInformation Sciences Institute Abstract Workflow Generation Concrete Workflow Generation

Ewa DeelmanInformation Sciences Institute Why Automate Workflow Generation? l Usability: Limit User’s necessary Grid knowledge l Monitoring and Directory Service l Replica Location Service l Complexity: u User needs to make choices l Alternative application components l Alternative files l Alternative locations u The user may reach a dead end u Many different interdependencies may occur among components l Solution cost: u Evaluate the alternative solution costs l Performance l Reliability l Resource Usage l Global cost: u minimizing cost within a community or a virtual organization u requires reasoning about individual user’s choices in light of other user’s choices

Ewa DeelmanInformation Sciences Institute Executable Workflow Construction l Chimera builds an abstract workflow based on VDL descriptions l Pegasus takes the abstract workflow and produces and executable workflow for the Grid l Condor’s DAGMan executes the workflow

Ewa DeelmanInformation Sciences Institute Pegasus: Planning for Execution in Grids l Maps from abstract to concrete workflow u Algorithmic and AI based techniques l Automatically locates physical locations for both components (transformations) and data u Use Globus RLS and the Transformation Catalog l Finds appropriate resources to execute u via Globus MDS l Reuses existing data products where applicable l Publishes newly derived data products u Chimera virtual data catalog

Ewa DeelmanInformation Sciences Institute Chimera is developed at ANL By I. Foster, M. Wilde, and J. Voeckler

Ewa DeelmanInformation Sciences Institute Example Workflow Reduction l Original abstract workflow l If “b” already exists (as determined by query to the RLS), the workflow can be reduced

Ewa DeelmanInformation Sciences Institute Mapping from abstract to concrete l Query RLS, MDS, and TC, schedule computation and data movement

Ewa DeelmanInformation Sciences Institute

LIGO Scientific Collaboration l Continuous gravitational waves are expected to be produced by a variety of celestial objects l Only a small fraction of potential sources are known l Need to perform blind searches, scanning the regions of the sky where we have no a priori information of the presence of a source u Wide area, wide frequency searches l Search is performed for potential sources of continuous periodic waves near the Galactic Center and the galactic core l The search is very compute and data intensive l LSC used the occasion of SC2003 to initiate a month-long production run with science data collected during 8 weeks in the Spring of 2003

Ewa DeelmanInformation Sciences Institute Additional resources used: Grid3 iVDGL resources

Ewa DeelmanInformation Sciences Institute LIGO Acknowledgements l Bruce Allen, Scott Koranda, Brian Moe, Xavier Siemens, University of Wisconsin Milwaukee, USA l Stuart Anderson, Kent Blackburn, Albert Lazzarini, Dan Kozak, Hari Pulapaka, Peter Shawhan, Caltech, USA l Steffen Grunewald, Yousuke Itoh, Maria Alessandra Papa, Albert Einstein Institute, Germany l Many Others involved in the Testbed l l group.phys.uwm.edu/lscdatagrid/ l l LIGO Laboratory operates under NSF cooperative agreement PHY

Ewa DeelmanInformation Sciences Institute Montage l Montage (NASA and NVO) u Deliver science-grade custom mosaics on demand u Produce mosaics from a wide range of data sources (possibly in different spectra) u User-specified parameters of projection, coordinates, size, rotation and spatial sampling. Mosaic created by Pegasus based Montage from a run of the M101 galaxy images on the Teragrid.

Ewa DeelmanInformation Sciences Institute Small Montage Workflow ~1200 nodes

Ewa DeelmanInformation Sciences Institute Montage Acknowledgments l Bruce Berriman, John Good, Anastasia Laity, Caltech/IPAC l Joseph C. Jacob, Daniel S. Katz, JPL l caltech.edu/ l Testbed for Montage: Condor pools at USC/ISI, UW Madison, and Teragrid resources at NCSA, PSC, and SDSC. Montage is funded by the National Aeronautics and Space Administration's Earth Science Technology Office, Computational Technologies Project, under Cooperative Agreement Number NCC5-626 between NASA and the California Institute of Technology.

Ewa DeelmanInformation Sciences Institute Other Applications Using Chimera and Pegasus l Other GriPhyN applications: u High-energy physics: Atlas, CMS (many) u Astronomy: SDSS (Fermi Lab, ANL) l Astronomy: u Galaxy Morphology (NCSA, JHU, Fermi, many others, NVO-funded) l Biology u BLAST (ANL, PDQ-funded) l Neuroscience u Tomography (SDSC, NIH-funded)

Ewa DeelmanInformation Sciences Institute Current System

Ewa DeelmanInformation Sciences Institute time Levels of abstraction Application -level knowledge Logical tasks Tasks bound to resources and sent for execution User’s Request Relevant components Full abstract workflow Partial execution Not yet executed Workflow refinement Task matchmaker Workflow repair Policy info Workflow Refinement and execution

Ewa DeelmanInformation Sciences Institute Incremental Refinement l Partition Abstract workflow into partial workflows

Ewa DeelmanInformation Sciences Institute Meta-DAGMan

Ewa DeelmanInformation Sciences Institute Future Directions l Incorporate AI-planning technologies in production software (Virtual Data Toolkit) l Investigate various scheduling techniques l Investigating fault tolerance issues u Selecting resources based on their reliability u Responding to failures l