Pegasus-a framework for planning for execution in grids Karan Vahi USC Information Sciences Institute May 5 th, 2004.

Slides:



Advertisements
Similar presentations
Giggle: A Framework for Constructing Scalable Replica Location Services Ann Chervenak, Ewa Deelman, Ian Foster, Leanne Guy, Wolfgang Hoschekk, Adriana.
Advertisements

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
A Computation Management Agent for Multi-Institutional Grids
Sphinx Server Sphinx Client Data Warehouse Submitter Generic Grid Site Monitoring Service Resource Message Interface Current Sphinx Client/Server Multi-threaded.
Naming Computer Engineering Department Distributed Systems Course Asst. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2014.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
GriPhyN Virtual Data System Mike Wilde Argonne National Laboratory Mathematics and Computer Science Division LISHEP 2004, UERJ, Rio De Janeiro 13 Feb 2004.
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Workload Management Massimo Sgaravatto INFN Padova.
Grid Computing, B. Wilkinson, 20046c.1 Globus III - Information Services.
An Astronomical Image Mosaic Service for the National Virtual Observatory
An Astronomical Image Mosaic Service for the National Virtual Observatory / ESTO.
Ewa Deelman Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.
Managing Workflows with the Pegasus Workflow Management System
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta,
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Large-Scale Science Through Workflow Management Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Managing large-scale workflows with Pegasus Karan Vahi ( Collaborative Computing Group USC Information Sciences Institute Funded.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute.
Pegasus: Planning for Execution in Grids Ewa Deelman Information Sciences Institute University of Southern California.
Dr. Ahmed Abdeen Hamed, Ph.D. University of Vermont, EPSCoR Research on Adaptation to Climate Change (RACC) Burlington Vermont USA MODELING THE IMPACTS.
Grid Workload Management Massimo Sgaravatto INFN Padova.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
CPT Demo May Build on SC03 Demo and extend it. Phase 1: Doing Root Analysis and add BOSS, Rendezvous, and Pool RLS catalog to analysis workflow.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
Grid Scheduler: Plan & Schedule Adam Arbree Jang Uk In.
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
GriPhyN Virtual Data System Grid Execution of Virtual Data Workflows Mike Wilde Argonne National Laboratory Mathematics and Computer Science Division.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Planning Ewa Deelman USC Information Sciences Institute GriPhyN NSF Project Review January 2003 Chicago.
Virtual Data Management for CMS Simulation Production A GriPhyN Prototype.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Interoperability Achieved by GADU in using multiple Grids. OSG, Teragrid and ANL Jazz Presented by: Dinanath Sulakhe Mathematics and Computer Science Division.
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
STAR Scheduling status Gabriele Carcassi 9 September 2002.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
Pegasus WMS Extends DAGMan to the grid world
GWE Core Grid Wizard Enterprise (
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Mats Rynge USC Information Sciences Institute
Frieda meets Pegasus-WMS
Presentation transcript:

Pegasus-a framework for planning for execution in grids Karan Vahi USC Information Sciences Institute May 5 th, 2004

2May 5th, 2004 Karan Vahi, ISI People Involved USC/ISI Advanced Systems: Ewa Deelman, Carl Kesselmann, Gaurang Mehta, Mei-Hui Su, Gurmeet Singh, Karan Vahi.

3May 5th, 2004 Karan Vahi, ISI Outline l Introduction To Planning l DAX l Pegasus l Portal l Demonstration

4May 5th, 2004 Karan Vahi, ISI Planning in Grids l One has various alternatives out on the grid in terms of data and compute resources. l Planning –Select the best available resources and data sets, and schedule them on to the grid to get the best possible execution time. –Plan for the data movements between the sites

5May 5th, 2004 Karan Vahi, ISI Recipe For Planning l Understand the request –Figure out what data product the request refers to, and how to generate it from scratch. l Locations of data products –Final data product –Intermediate data products which can be used to generate the final data product. l Location of Job executables l State of the Grid –Available processors, physical memory available, job queue lengths etc.

6May 5th, 2004 Karan Vahi, ISI Constituents of Planning Domain Knowledge Resource Information Location Information Planner Plan submitted the grid

7May 5th, 2004 Karan Vahi, ISI Terms (1) l Abstract Workflow (DAX) –Expressed in terms of logical entities –Specifies all logical files required to generate the desired data product from scratch –Dependencies between the jobs –Analogous to build style dag l Concrete Workflow –Expressed in terms of physical entities –Specifies the location of the data and executables –Analogous to a make style dag

8May 5th, 2004 Karan Vahi, ISI Outline l Introduction to Planning l DAX l Pegasus l Portal l Demonstration

9May 5th, 2004 Karan Vahi, ISI DAX l The format for specifying the abstract workflow, that identifies the recipe for creating the final data product at a logical level. l In case of montage, the IPAC webservice ends up creating the dax for the user request. Developed at University Of Chicago

10May 5th, 2004 Karan Vahi, ISI DAX Example l l -a top -T60 -i -o l l -a bottom -T60 -i -o l

11May 5th, 2004 Karan Vahi, ISI Outline l Introduction to Planning l DAX l Pegasus l Demonstration l Portal

12May 5th, 2004 Karan Vahi, ISI Pegasus l A configurable system to map and execute complex workflows on the grids. –DAX Driven Configuration –Metadata Driven Configuration l Can do full ahead planning or deferred planning to map the workflows.

13May 5th, 2004 Karan Vahi, ISI Full Ahead Planning l At the time of submission of the workflow, you decide where you want to schedule the jobs in the workflow. l Allows you to perform certain optimizations by looking ahead for bottleneck jobs and then scheduling around them. l However, for large workflows the decision you make at submission time may no longer be valid or optimum at the point the job is actually run.

14May 5th, 2004 Karan Vahi, ISI Deferred Planning l Delay the decision of mapping the job to the site as late as possible. l Involves partitioning of the original dax into smaller daxes each of which refers to a partition on which Pegasus is run. l Construct a Mega DAG that ends up running pegasus automatically on the partition daxes, as each partition is ready to run.

15May 5th, 2004 Karan Vahi, ISI High Level Block Diagram

16May 5th, 2004 Karan Vahi, ISI Replica Discovery l Pegasus needs to know where the input files for the workflow reside. l In Montage case, it should know where the fits files that are required for the mProject jobs reside. l Hence Pegasus needs to discover the files that are required for executing a particular abstract workflow.

17May 5th, 2004 Karan Vahi, ISI RLS RLI LRC A LRC C LRC B Each LRC sends periodic updates to the RLI Each LRC is responsible for one pool Pegasus 1) Pegasus queries RLI with the LFN 2) RLI returns the list of LRC’s that contain the desired mappings. 3) Pegasus queries each LRC in the list to get the PFN’s. Figure (1) RLS Configuration for Pegasus Interfacing to RLS done by Karan Vahi, Shishir

18May 5th, 2004 Karan Vahi, ISI Alternate Replica Mechanisms l Replica Catalog –Pegasus supports the LDAP based Replica Catalog l User defined mechanisms –Pegasus provides the flexibility for the user to specify his own replica mechanism instead of RLS or Replica Catalog –The user just has to implement the concerned interface Design and Implementation done by Karan Vahi

19May 5th, 2004 Karan Vahi, ISI Transformation Catalog l Pegasus needs to access a catalog to determine the pools where it can run a particular piece of code. l If a site does not have the executable, one should be able to ship the executable to the remote site. l Generic TC API for users to implement their own transformation catalog. l Current Implementations –File Based –Database Based

20May 5th, 2004 Karan Vahi, ISI File based Transformation Catalog l Consists of a simple text file. –Contains Mappings of Logical Transformations to Physical Transformations. l Format of the tc.data file #poolname logical tr physical tr env isi preprocess /usr/vds/bin/preprocess VDS_HOME=/usr/vds/; l All the physical transformations are absolute path names. l Environment string contains all the environment variables required in order for the transformation to run on the execution pool.

21May 5th, 2004 Karan Vahi, ISI DB based Transformation Catalog l Presently ported on MySQL. Postgres to be tested. l Adds support for transformations, compiled for different architectures, OS, OS version and glibc combination, that would enable us to transfer transformation to remote sites if the executable does not reside there. l Supports multiple profile namespaces. At present using only the env namespace. l Supports multiple physical transformations for the same logical transformation,pool,type tuple.

22May 5th, 2004 Karan Vahi, ISI Pool Configuration (1) l Pool Config is an XML file which contains information about various pools on which DAGs may execute. l Some of the information contained in the Pool Config file is –Specifies the various job-managers which are available on the pool for the different types of condor universes. –Specifies the GridFtp storage servers associated with each pool. –Specifies the Local Replica Catalogs where data residing in the pool has to be cataloged. –Contains profiles like environment hints which are common site wide. –Contains the working and storage directories to be used on the pool.

23May 5th, 2004 Karan Vahi, ISI Pool Configuration (2) l Two Ways to construct the Pool Config File. –Monitoring and Discovery Service –Local Pool Config File (Text Based) l Client tool to generate Pool Config File –The tool genpoolconfig is used to query the MDS and/or the local pool config file/s to generate the XML Pool Config file.

24May 5th, 2004 Karan Vahi, ISI Pool Configuration (3) l This file is read by the information provider and published into MDS. l Format gvds.pool.id : gvds.pool.lrc : gvds.pool.gridftp gvds.pool.gridftp : gvds.pool.universe : gvds.pool.gridlaunch : gvds.pool.workdir : gvds.pool.profile : gvds.pool.profile :

25May 5th, 2004 Karan Vahi, ISI DAX Driven Configuration(1) l Pegasus uses IPAC/JPL webservice as an abstract workflow generator l Pegasus takes in this abstract workflow and creates a concrete workflow by consulting the various grid services described before

26May 5th, 2004 Karan Vahi, ISI (5) Full Abstract Dag DAX Driven Configuration(2) IPAC/JPL Service MCS RLS MDS Transformation Catalog Current State Generator Request Manager Concrete Planner Abstract Dag Reduction Abstract and Concrete Planner VDL Generator Submit File Generator DAGMan Submission & Monitoring Condor-G/ DAGMan (1) Abstract Workflow (DAG) (2) Abstract Dag (3) Logical File Names (LFN’s) (4) Physical File Names (PFN’s) (6) Reduced Abstract DAG (7) Logical Transformations (8) Physical Transformations and Execution Environment Information (9) Concrete Dag (10) Concrete Dag (11) DAGMan files (13) DAG(14) Log files (15) Monitoring (16) Results (12) DAGMan files

27May 5th, 2004 Karan Vahi, ISI DAG Reduction l Abstract Dag Reduction –Pegasus queries the RLS with the LFN’s referred to in the Abstract Workflow –If data products are found to be already materialized, Pegasus reuses them and thus reduces the complexity of CW

28May 5th, 2004 Karan Vahi, ISI KEY The original node Pull transfer node Registration node Push transfer node Job e Job gJob h Job d Job a Job c Job f Job i Job b Abstract Dag Reduction Pegasus Queries the RLS and finds the data products of jobs d,e,f already materialized. Hence deletes those jobs On applying the reduction algorithm additional jobs a,b,c are deleted Implemented by Karan Vahi

29May 5th, 2004 Karan Vahi, ISI Pegasus adds replica nodes for each job that materializes data (g, h, i ). These three nodes are for transferring the output files of the leaf job (f) to the output pool, since job f has been deleted by the Reduction Algorithm. KEY The original node Pull transfer node Registration node Push transfer node Node deleted by Reduction algo Inter-pool transfer node Job e Job gJob h Job d Job a Job c Job f Job i Job b Concrete Planner (1) Pegasus schedules job g,h on pool X and job i on pool Y. Hence adding an interpool transfer node Pegasus adds transfer nodes for transferring the input files for the root nodes of the decomposed dag (job g) Implemented by Karan Vahi

30May 5th, 2004 Karan Vahi, ISI Transient Files l Selective Transfer of output files –Data Sets generated by intermediate nodes in DAG are huge –However, user maybe interested only in outputs of selected jobs –Transfer of all the files could severely overload the jobmanagers on the compute sites l Need For Selective Transfer of Files –For each file at the virtual data, user can specify whether it is transient or not. –Pegasus bases it’s decision on whether to transfer the file or not on this. Implemented by Karan Vahi

31May 5th, 2004 Karan Vahi, ISI Outline l Introduction to Planning l DAX l Pegasus l Portal l Demonstration

32May 5th, 2004 Karan Vahi, ISI Portal Architecture

33May 5th, 2004 Karan Vahi, ISI Portal Demonstration

34May 5th, 2004 Karan Vahi, ISI Outline l Introduction to Planning l DAX l Pegasus l Portal l Demonstration

35May 5th, 2004 Karan Vahi, ISI Demonstration l Run a small black diamond dag using both full ahead planning and deferred planning on the isi condor pool. l Show the various configuration files (tc.data and pool.config) and how to generate them (pool.config). l Generate the condor submit files. l Submit the condor dag to condor dagman.

36May 5th, 2004 Karan Vahi, ISI Software Required!! l Submit Host –Condor DAGMAN (to submit the workflows on the grid). –Java 1.4 (to run Pegasus) –Globus 2.4 or higher –Globus RLS (the registration jobs run on the local host). –Xerces, ant, cog etc that come with the VDS distribution l Compute Sites (Machines in the pool) –Globus 2.4 or higher (gridftp server, g-u-c, MDS) –On one machine per pool, an lrc should be running. –Condor daemon running. –Various jobmanagers correctly configured.

37May 5th, 2004 Karan Vahi, ISI TC File l Walk through the editing of TC file. l A command line client is also in the works that allows you to update, add and modify the entries in your transformation catalog regardless of the underlying implementation.

38May 5th, 2004 Karan Vahi, ISI GenPoolConfig (Demo) l genpoolconfig is the client to generate the pool config file required by Pegasus. l It queries the MDS and/or a local pool config file (text based) and generates a XML file. l Am going to generate the pool config file from the text based configuration. l Usage : –genpoolconfig –Dvds.giis.host - Dvds.giis.dn --poolconfig --output

39May 5th, 2004 Karan Vahi, ISI gencdag l The Concrete planner takes the DAX produced by Chimera and converts into a set of condor dag and submit files. l Usage : gencdag –dax|--pdax --p [--dir ] [--o ] [-- force] l You can specify more then one execution pools. Execution will take place on the pools on which the executable exists. If the executable exists on more then one pool then the pool on which the executable will run is selected randomly. l Output pool is the pool where you want all the output products to be transferred to. If not specified the materialized data stays on the execution pool

40May 5th, 2004 Karan Vahi, ISI Mei’s Exploits l Mei has been running the montage code for the past one year, including some huge 6 and 10 degree dags (for the m16 cluster). l The 6 degree runs had about 13,000 compute jobs and the 10 degree run had about 40,000 compute jobs!!! l The final mosaic files can be downloaded from l

41May 5th, 2004 Karan Vahi, ISI Questions?