Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.

Slides:



Advertisements
Similar presentations
Giggle: A Framework for Constructing Scalable Replica Location Services Ann Chervenak, Ewa Deelman, Ian Foster, Leanne Guy, Wolfgang Hoschekk, Adriana.
Advertisements

3 September 2004NVO Coordination Meeting1 Grid-Technologies NVO and the Grid Reagan W. Moore George Kremenek Leesa Brieger Ewa Deelman Roy Williams John.
Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
Virtual Data and the Chimera System* Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Ewa Deelman, Optimizing for Time and Space in Distributed Scientific Workflows Ewa Deelman University.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
An Astronomical Image Mosaic Service for the National Virtual Observatory
A Grid-Enabled Engine for Delivering Custom Science- Grade Images on Demand
An Astronomical Image Mosaic Service for the National Virtual Observatory / ESTO.
Ewa Deelman Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
1 The Application-Infrastructure Gap Dynamic and/or Distributed Applications A 1 B Shared Distributed Infrastructure.
Managing Workflows with the Pegasus Workflow Management System
Ewa Deelman, Pegasus and DAGMan: From Concept to Execution Mapping Scientific Workflows onto the National.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta,
Why Build Image Mosaics for Wide Area Surveys? An All-Sky 2MASS Mosaic Constructed on the TeraGrid A. C. Laity, G. B. Berriman, J. C. Good (IPAC, Caltech);
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
INFSO-RI Enabling Grids for E-sciencE The US Federation Miron Livny Computer Sciences Department University of Wisconsin – Madison.
Large-Scale Science Through Workflow Management Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Combining the strengths of UMIST and The Victoria University of Manchester Utility Driven Adaptive Workflow Execution Kevin Lee School of Computer Science,
 The workflow description modified to output a VDS DAX.  The workflow description toolkit developed allows any concrete workflow description to be migrated.
Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute.
Pegasus: Planning for Execution in Grids Ewa Deelman Information Sciences Institute University of Southern California.
GriPhyN Status and Project Plan Mike Wilde Mathematics and Computer Science Division Argonne National Laboratory.
Ruth Pordes, Fermilab CD, and A PPDG Coordinator Some Aspects of The Particle Physics Data Grid Collaboratory Pilot (PPDG) and The Grid Physics Network.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
Dr. Ahmed Abdeen Hamed, Ph.D. University of Vermont, EPSCoR Research on Adaptation to Climate Change (RACC) Burlington Vermont USA MODELING THE IMPACTS.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Combining the strengths of UMIST and The Victoria University of Manchester Adaptive Workflow Processing and Execution in Pegasus Kevin Lee School of Computer.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
GriPhyN Virtual Data System Grid Execution of Virtual Data Workflows Mike Wilde Argonne National Laboratory Mathematics and Computer Science Division.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Pegasus-a framework for planning for execution in grids Karan Vahi USC Information Sciences Institute May 5 th, 2004.
Planning Ewa Deelman USC Information Sciences Institute GriPhyN NSF Project Review January 2003 Chicago.
Pegasus: Planning for Execution in Grids Ewa Deelman, Carl Kesselman, Gaurang Mehta, Gurmeet Singh, Karan Vahi Information Sciences Institute University.
Virtual Data Management for CMS Simulation Production A GriPhyN Prototype.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
GriPhyN Project Paul Avery, University of Florida, Ian Foster, University of Chicago NSF Grant ITR Research Objectives Significant Results Approach.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
1 Pegasus and wings WINGS/Pegasus Provenance Challenge Ewa Deelman Yolanda Gil Jihie Kim Gaurang Mehta Varun Ratnakar USC Information Sciences Institute.
1 Artemis: Integrating Scientific Data on the Grid Rattapoom Tuchinda Snehal Thakkar Yolanda Gil Ewa Deelman.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
Ewa Deelman, Managing Scientific Workflows on OSG with Pegasus Ewa Deelman USC Information Sciences.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Pegasus WMS Extends DAGMan to the grid world
Cloudy Skies: Astronomy and Utility Computing
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Ewa Deelman University of Southern California
Mats Rynge USC Information Sciences Institute
A General Approach to Real-time Workflow Monitoring
Frieda meets Pegasus-WMS
Presentation transcript:

Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute

Ewa DeelmanInformation Sciences Institute Pegasus Acknowledgements l Ewa Deelman, Carl Kesselman, Saurabh Khurana, Gaurang Mehta, Sonal Patil, Gurmeet Singh, Mei-Hui Su, Karan Vahi (Center for Grid Computing, ISI) l James Blythe, Yolanda Gil (Intelligent Systems Division, ISI) l l Research funded as part of the NSF GriPhyN, NVO and SCEC projects.

Ewa DeelmanInformation Sciences Institute Outline l The GriPhyN project and Grid Applications l Workflow Management in Grids l Pegasus, Planning for Execution in Grids u Framework Description u Generation of Executable Workflows l Applications Using Pegasus l Future Research Directions

Ewa DeelmanInformation Sciences Institute GriPhyN Data Grid Challenge l Provide a framework that enables Virtual Organizations around the world to perform computationally demanding analysis of large, geographically distributed datasets. l The Virtual Organizations are large and highly distributed l The datasets are large, currently on the order of Terabytes and expected to grow to the level of 100s of Petabytes in the next decade l Provide a seamless access to data: experimental raw data or processed data products l Enable a user/application to ask for any domain-specific data, whether computed or not Concept of Virtual Data

Ewa DeelmanInformation Sciences Institute Grid Applications l Increasing in the level of complexity l Use of individual application components l Reuse of individual intermediate data products (files) l Description of Data Products using Metadata Attributes l Execution environment is complex and very dynamic u Resources come and go u Data is replicated u Components can be found at various locations or staged in on demand l Separation between u the application description u the actual execution description

Ewa DeelmanInformation Sciences Institute

Generating an Abstract Workflow l Available Information u Specification of component capabilities u Ability to generate the desired data products Select and configure application components to form an abstract workflow u assign input files that exist or that can be generated by other application components. u specify the order in which the components must be executed u components and files are referred to by their logical names l Logical transformation name l Logical file name l Both transformations and data can be replicated

Ewa DeelmanInformation Sciences Institute Generating a Concrete Workflow l Information u location of files and component Instances u State of the Grid resources l Select specific u Resources u Files u Add jobs required to form a concrete workflow that can be executed in the Grid environment l Data movement u Data registration u Each component in the abstract workflow is turned into an executable job

Ewa DeelmanInformation Sciences Institute Why Automate Workflow Generation? l Usability: Limit User’s necessary Grid knowledge l Monitoring and Directory Service l Replica Location Service l Complexity: u User needs to make choices l Alternative application components l Alternative files l Alternative locations u The user may reach a dead end u Many different interdependencies may occur among components l Solution cost: u Evaluate the alternative solution costs l Performance l Reliability l Resource Usage l Global cost: u minimizing cost within a community or a virtual organization u requires reasoning about individual user’s choices in light of other user’s choices

Ewa DeelmanInformation Sciences Institute GriPhyN’s Executable Workflow Construction l Build an abstract workflow based on VDL descriptions (Chimera) l Build an executable workflow based on the abstract workflows (Pegasus) l Execute the workflow (Condor’s DAGMan) VDL

Ewa DeelmanInformation Sciences Institute Chimera: Creating Abstract Workflows l Developed at ANL (Foster, Voeckler, Wilde) l Chimera’s Virtual Data Language (VDL) allows for the description of an abstract workflow l Transformations: u general description of the transformation applied to data, use logical transformation name TRgalMorph( in redshift, in pixScale, in zeroPoint, in Ho, in om, in flat, in image, out galMorph ) { … }

Ewa DeelmanInformation Sciences Institute Chimera : Creating Abstract Workflows l Derivations are instantiations of TRs u Identify particular logical input and output file names u Identify actual parameters DV d1->galMorph( redshift=" ", pixScale=" E-4", zeroPoint="0", Ho="100", om="0.3", flat="1", );

Ewa DeelmanInformation Sciences Institute Abstract Workflow Generation l Definitions for transformations and derivations are stored in Chimera’s Database l Database can be browsed l User queries Chimera giving it a logical filename

Ewa DeelmanInformation Sciences Institute VDL and Abstract Workflow VDL descriptions User request data file “c” Abstract Workflow

Ewa DeelmanInformation Sciences Institute Condor’s DAGMan l Developed at UW Madison (Livny) l Executes a concrete workflow l Makes sure the dependencies are followed l Execute the jobs specified in the workflow u Execution u Data movement u Catalog updates l Provides a “rescue DAG” in case of failure

Ewa DeelmanInformation Sciences Institute Pegasus: Planning for Execution in Grids l Maps from abstract to concrete workflow u Algorithmic and AI-based techniques l Automatically locates physical locations for both components (transformations) and data l Finds appropriate resources to execute l Reuses existing data products where applicable l Publishes newly derived data products u Chimera virtual data catalog u Provides provenance information

Ewa DeelmanInformation Sciences Institute

Information Components Used by Pegasus l Globus Monitoring and Discovery Service (MDS) u Locates available resources u Finds resource properties l Dynamic: load, queue length l Static: location of gridftp server, RLS, etc l Globus Replica Location Service u Locates data that may be replicated u Registers new data products l Transformation Catalog u Locates installed executables

Ewa DeelmanInformation Sciences Institute Example Workflow Reduction l Original abstract workflow l If “b” already exists (as determined by query to the RLS), the workflow can be reduced

Ewa DeelmanInformation Sciences Institute Mapping from abstract to concrete l Query RLS, MDS, and TC, schedule computation and data movement

Ewa DeelmanInformation Sciences Institute Applications Using Chimera, Pegasus and DAGMan l GriPhyN applications: u High-energy physics: Atlas, CMS (many) u Astronomy: SDSS (Fermi Lab, ANL) u Gravitational-wave physics: LIGO (Caltech, UWM) l Astronomy: u Galaxy Morphology (NCSA, JHU, Fermi, many others, NVO-funded) l Biology u BLAST (ANL, PDQ-funded) l Neuroscience u Tomography for Telescience(SDSC, NIH-funded)

Ewa DeelmanInformation Sciences Institute Pegasus interfaces l Main interface: command-line interface l Applications can also be integrated with a portal environment l Demonstrated the portal at SC 2003 u LIGO-gravitational-wave physics u Montage-astronomy l Much of the portal is application- independent

Ewa DeelmanInformation Sciences Institute Montage l Montage (NASA and NVO) u Deliver science-grade custom mosaics on demand u Produce mosaics from a wide range of data sources (possibly in different spectra) u User-specified parameters of projection, coordinates, size, rotation and spatial sampling. Mosaic created by Pegasus based Montage from a run of the M101 galaxy images on the Teragrid.

Ewa DeelmanInformation Sciences Institute Small Montage Workflow ~1200 nodes

Ewa DeelmanInformation Sciences Institute Montage Acknowledgments l Bruce Berriman, John Good, Anastasia Laity, Caltech/IPAC l Joseph C. Jacob, Daniel S. Katz, JPL l caltech.edu/ l Testbed for Montage: Condor pools at USC/ISI, UW Madison, and Teragrid resources at NCSA, PSC, and SDSC. Montage is funded by the National Aeronautics and Space Administration's Earth Science Technology Office, Computational Technologies Project, under Cooperative Agreement Number NCC5-626 between NASA and the California Institute of Technology.

Ewa DeelmanInformation Sciences Institute

Conclusions l Pegasus maps complex workflows onto the Grid l Uses Grid information services to find resources, data and executables l Reduces the workflow based on existing intermediate products l Used in many applications l Part of GriPhyN’s Virtual Data Toolkit

Ewa DeelmanInformation Sciences Institute Future Directions l Incorporate AI-planning technologies in production software (Virtual Data Toolkit) l Investigate various scheduling techniques l Investigating fault tolerance issues u Selecting resources based on their reliability u Responding to failures l