Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN.

Similar presentations


Presentation on theme: "INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN."— Presentation transcript:

1 INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN Milano

2 JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1 Enabling Grids for E-sciencE INFSO-RI-508833 Summary Why should we bother to schedule data jobs ? Fundamental ingredients of data jobs: –Quoting Ian Bird, the SRM functionality foreseen in LCG is:  V1.1 + space management, pin/unpin, etc  Not full set of V2.1  V3 not required  CMS still to confirm agreement with this set – Should any additional low-level interface be considered ? What interaction with matchmaking? –We consider these scenarios:  Job needing to reserve space (for output) on a given tactical (or even strategic) SE, and to release it at the end.  Job needing to pre-stage a file in from a mass-storage system, and/or to keep the file pinned until the end of execution –Should anything else be considered ?

3 JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1 Enabling Grids for E-sciencE INFSO-RI-508833 The fundamental concept Execute the jobStage-out Stage-in Execute the Job Stage-out Stage-in Release any temporary space used Allocate space for input & output data Data Placement Jobs Computational Jobs

4 JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1 Enabling Grids for E-sciencE INFSO-RI-508833 Just a few more details Stage-in Execute the jobStage-out Allocate space for input & output data Should we deal with multiple matches ? Match-making For how long? Probably File pinning should be renewed. How does the executable find the files? Always via POSIX, relative to CWD,with a mapping that is known in advance and is applied by the sites? Should mapping be carried with the job? Where? Or: when should Files should be secured to ´strategic´ storage, but how hard should we try to move them to their final destination ? occur ?

5 JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1 Enabling Grids for E-sciencE INFSO-RI-508833 SRM APIs

6 JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1 Enabling Grids for E-sciencE INFSO-RI-508833 APIs used in each node Stage-in: the SRM pin the file if already has the it;otherwise allocate space, copy the file and pin it. Previous allocation may be avoided. Release any temporary space used srmReserveSpace (either directly or via reservation framework) SrmPrepareToGet, wait and srmStatusOfGetRequest srmReleaseSpace (either directly or via reservation framework) Allocate space for input & output data File pinning SrmExtendFileLifeTime

7 JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1 Enabling Grids for E-sciencE INFSO-RI-508833 Data Placement in Condor DAGMan Condor Job Queue DaP A A.submit DaP B B.submit Job C C.submit ….. Parent A child B Parent B child C Parent C child D,E ….. Stork Job Queue C E AB D E F DAG specification The Concept C

8 JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1 Enabling Grids for E-sciencE INFSO-RI-508833 SRM: Storage Resource Manager We view an SRM as managing the use of a storage resource on a grid. It could be managing a single disk cache (we refer to this as DRM), or managing the access to a tape archiving system (we call this TRM), or both (we call this combination HRM for Hierarchical Storage system). The SRMs do not perform file transfer, but can invoke middleware components that perform file transfer, such as GridFTP.

9 JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1 Enabling Grids for E-sciencE INFSO-RI-508833 How should pinning and reservation be renewed in the job flow? Should we add more ad-hoc machinery, as done for the proxy renewal ? It is probably worth to generalise a renewal solution for renewing the allocation of various reservable resources. We are studying how to integrate an architecture for resource reservation (see T. Ferrari/E. Ronchieri's talk) –We'll need to resolve the renewal issues in that context. Should we have a different approach just for data matchmaking jobs ? How ?

10 JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1 Enabling Grids for E-sciencE INFSO-RI-508833 Agreement Service Architecture Agreement Initiators Agreement Offer Storage/Computing/ Network Agreement Service

11 JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1 Enabling Grids for E-sciencE INFSO-RI-508833 Just a DAG ? Really a DAG ? Stage-in Execute job Stage-out Match-making This can also fail, what do we do ? First This should likely be skipped in case of job failure, but, we should not forget to Release any temporary space used ? Then go back to File pinning Oh, this can fail, too!

12 JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1 Enabling Grids for E-sciencE INFSO-RI-508833 More details about Match-making What data attributes should contribute to the rank ? Currently number of close (administratively local) files. Should prefetch time estimates be contributing ? Is srmGetReqEstTime going to be there ? Should the possibility of remote access be taken into account ? Estimated size and number of accesses if remote file access is allowed ? What should be the status of a job that failed to release space ? OK, But ? And who should be told about this ? What data attributes should contribute to the requirements ? This is the same as saying: should we allow a match to occur only after some independent data movement actions are taken?

13 JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1 Enabling Grids for E-sciencE INFSO-RI-508833 Other details What should be the status of a job that failed to release space ? OK, But ? And who should be told about this ?

14 JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1 Enabling Grids for E-sciencE INFSO-RI-508833 Non-conclusive questions... Did we get a reasonable view of the non SRM v1.1 functions that are going to be there ? We will be test-driving the generic reservation framework, applied to storage. This will require to apply some renewal/extension semantics, should it be added ad-hoc ? Handling job flows with data seems to require capabilities beyond DAG. Should we be implementing a state machine? A shell? Any other idea ?

15 JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1 Enabling Grids for E-sciencE INFSO-RI-508833 References SRM V1 API http://sdm.lbl.gov/srm-wg/doc/SRM.Joint.Functional.Design.Jan2002.pdf SRM V2 API: http://sdm.lbl.gov/srm-wg/doc/SRM.spec.v2.1.1.html


Download ppt "INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN."

Similar presentations


Ads by Google