INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN.

Slides:



Advertisements
Similar presentations
INFSO-RI Enabling Grids for E-sciencE Agreement Service for Resource Reservation and Allocation: Overview Tiziana Ferrari, Elisabetta.
Advertisements

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Condor Project Computer Sciences Department University of Wisconsin-Madison Stork An Introduction Condor Week 2006 Milan.
1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Supporting MPI Applications on EGEE Grids Zoltán Farkas MTA SZTAKI.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Universität Dortmund Robotics Research Institute Information Technology Section Grid Metaschedulers An Overview and Up-to-date Solutions Christian.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
Enabling Grids for E-sciencE Medical image processing web portal : Requirements analysis. An almost end user point of view … H. Benoit-Cattin,
Glite I/O Storm Testing in EDG-LCG Framework Elena Slabospitskaya, Vadim Petukhov, (IHEP, Russia) Gilbert Grosdidier, (CNRC, France) NEC'2005, Sept 16.
A. Sim, CRD, L B N L 1 Oct. 23, 2008 BeStMan Extra Slides.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Mass Storage System Forum HEPiX Vancouver, 24/10/2003 Don Petravick (FNAL) Olof Bärring (CERN)
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
A. Sim, CRD, L B N L 1 OSG Applications Workshop 6/1/2005 OSG SRM/DRM Readiness and Plan Alex Sim / Jorge Rodriguez Scientific Data Management Group Computational.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
INFSO-RI Enabling Grids for E-sciencE Supporting legacy code applications on EGEE VOs by GEMLCA and the P-GRADE portal P. Kacsuk*,
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Peter F. Couvares (based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber) Associate Researcher, Condor Team Computer Sciences Department University.
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar and Miron Livny University of Wisconsin-Madison March 25 th, 2004 Tokyo, Japan.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Andrew C. Smith – Storage Resource Managers – 10/05/05 Functionality and Integration Storage Resource Managers.
SRM workshop – September’05 1 SRM: Expt Reqts Nick Brook Revisit LCG baseline services working group Priorities & timescales Use case (from LHCb)
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
Computing Sciences Directorate, L B N L 1 CHEP 2003 Standards For Storage Resource Management BOF Co-Chair: Arie Shoshani * Co-Chair: Peter Kunszt ** *
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar University of Wisconsin-Madison May 25 th, 2004 CERN.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management and Interoperability Peter Kunszt (JRA1 DM Cluster) 2 nd EGEE Conference,
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
INFSO-RI Enabling Grids for E-sciencE Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
EGEE is a project funded by the European Union under contract IST WS-Based Advance Reservation and Co-allocation Architecture Proposal T.Ferrari,
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
PPDG meeting, July 2000 Interfacing the Storage Resource Broker (SRB) to the Hierarchical Resource Manager (HRM) Arie Shoshani, Alex Sim (LBNL) Reagan.
INFSO-RI Enabling Grids for E-sciencE Agreement Service for Storage Space Reservation T.Ferrari, E.Ronchieri JRA1 All Hands Meeting,
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor and DAGMan Barcelona,
INFSO-RI Enabling Grids for E-sciencE Grid Services for Resource Reservation and Allocation Tiziana Ferrari Istituto Nazionale di.
Author - Title- Date - n° 1 Partner Logo WP5 Status John Gordon Budapest September 2002.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
WLCG Grid Deployment Board CERN, 14 May 2008 Storage Update Flavia Donno CERN/IT.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Reliable and Efficient Grid Data Placement using Stork and DiskRouter Tevfik Kosar University of Wisconsin-Madison April 15 th, 2004.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
SRM-iRODS Interface Development WeiLong UENG Academia Sinica Grid Computing 1.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
INFSO-RI Enabling Grids for E-sciencE DGAS, current status & plans Andrea Guarise EGEE JRA1 All Hands Meeting Plzen July 11th, 2006.
WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.
GridPP2 Data Management work area J Jensen / RAL GridPP2 Data Management Work Area – Part 2 Mass storage & local storage mgmt J Jensen
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
EGEE-II INFSO-RI Enabling Grids for E-sciencE WLCG File Transfer Service Sophie Lemaitre – Gavin Mccance Joint EGEE and OSG Workshop.
INFSO-RI Enabling Grids for E-sciencE Status of BLAH Francesco Prelz ( ) JRA1 All-Hands meeting, Nicosia,
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
Enabling Grids for E-sciencE Agreement-based Workload and Resource Management Tiziana Ferrari, Elisabetta Ronchieri Mar 30-31, 2006.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
INFSO-RI Enabling Grids for E-sciencE CREAM, WMS integration and possible deployment scenarios Massimo Sgaravatto – INFN Padova.
Condor DAGMan: Managing Job Dependencies with Condor
Wide Area Workload Management Work Package DATAGRID project
INFNGRID Workshop – Bari, Italy, October 2004
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN Milano

JRA1 All Hands meeting, Brno June /06/ Enabling Grids for E-sciencE INFSO-RI Summary Why should we bother to schedule data jobs ? Fundamental ingredients of data jobs: –Quoting Ian Bird, the SRM functionality foreseen in LCG is:  V1.1 + space management, pin/unpin, etc  Not full set of V2.1  V3 not required  CMS still to confirm agreement with this set – Should any additional low-level interface be considered ? What interaction with matchmaking? –We consider these scenarios:  Job needing to reserve space (for output) on a given tactical (or even strategic) SE, and to release it at the end.  Job needing to pre-stage a file in from a mass-storage system, and/or to keep the file pinned until the end of execution –Should anything else be considered ?

JRA1 All Hands meeting, Brno June /06/ Enabling Grids for E-sciencE INFSO-RI The fundamental concept Execute the jobStage-out Stage-in Execute the Job Stage-out Stage-in Release any temporary space used Allocate space for input & output data Data Placement Jobs Computational Jobs

JRA1 All Hands meeting, Brno June /06/ Enabling Grids for E-sciencE INFSO-RI Just a few more details Stage-in Execute the jobStage-out Allocate space for input & output data Should we deal with multiple matches ? Match-making For how long? Probably File pinning should be renewed. How does the executable find the files? Always via POSIX, relative to CWD,with a mapping that is known in advance and is applied by the sites? Should mapping be carried with the job? Where? Or: when should Files should be secured to ´strategic´ storage, but how hard should we try to move them to their final destination ? occur ?

JRA1 All Hands meeting, Brno June /06/ Enabling Grids for E-sciencE INFSO-RI SRM APIs

JRA1 All Hands meeting, Brno June /06/ Enabling Grids for E-sciencE INFSO-RI APIs used in each node Stage-in: the SRM pin the file if already has the it;otherwise allocate space, copy the file and pin it. Previous allocation may be avoided. Release any temporary space used srmReserveSpace (either directly or via reservation framework) SrmPrepareToGet, wait and srmStatusOfGetRequest srmReleaseSpace (either directly or via reservation framework) Allocate space for input & output data File pinning SrmExtendFileLifeTime

JRA1 All Hands meeting, Brno June /06/ Enabling Grids for E-sciencE INFSO-RI Data Placement in Condor DAGMan Condor Job Queue DaP A A.submit DaP B B.submit Job C C.submit ….. Parent A child B Parent B child C Parent C child D,E ….. Stork Job Queue C E AB D E F DAG specification The Concept C

JRA1 All Hands meeting, Brno June /06/ Enabling Grids for E-sciencE INFSO-RI SRM: Storage Resource Manager We view an SRM as managing the use of a storage resource on a grid. It could be managing a single disk cache (we refer to this as DRM), or managing the access to a tape archiving system (we call this TRM), or both (we call this combination HRM for Hierarchical Storage system). The SRMs do not perform file transfer, but can invoke middleware components that perform file transfer, such as GridFTP.

JRA1 All Hands meeting, Brno June /06/ Enabling Grids for E-sciencE INFSO-RI How should pinning and reservation be renewed in the job flow? Should we add more ad-hoc machinery, as done for the proxy renewal ? It is probably worth to generalise a renewal solution for renewing the allocation of various reservable resources. We are studying how to integrate an architecture for resource reservation (see T. Ferrari/E. Ronchieri's talk) –We'll need to resolve the renewal issues in that context. Should we have a different approach just for data matchmaking jobs ? How ?

JRA1 All Hands meeting, Brno June /06/ Enabling Grids for E-sciencE INFSO-RI Agreement Service Architecture Agreement Initiators Agreement Offer Storage/Computing/ Network Agreement Service

JRA1 All Hands meeting, Brno June /06/ Enabling Grids for E-sciencE INFSO-RI Just a DAG ? Really a DAG ? Stage-in Execute job Stage-out Match-making This can also fail, what do we do ? First This should likely be skipped in case of job failure, but, we should not forget to Release any temporary space used ? Then go back to File pinning Oh, this can fail, too!

JRA1 All Hands meeting, Brno June /06/ Enabling Grids for E-sciencE INFSO-RI More details about Match-making What data attributes should contribute to the rank ? Currently number of close (administratively local) files. Should prefetch time estimates be contributing ? Is srmGetReqEstTime going to be there ? Should the possibility of remote access be taken into account ? Estimated size and number of accesses if remote file access is allowed ? What should be the status of a job that failed to release space ? OK, But ? And who should be told about this ? What data attributes should contribute to the requirements ? This is the same as saying: should we allow a match to occur only after some independent data movement actions are taken?

JRA1 All Hands meeting, Brno June /06/ Enabling Grids for E-sciencE INFSO-RI Other details What should be the status of a job that failed to release space ? OK, But ? And who should be told about this ?

JRA1 All Hands meeting, Brno June /06/ Enabling Grids for E-sciencE INFSO-RI Non-conclusive questions... Did we get a reasonable view of the non SRM v1.1 functions that are going to be there ? We will be test-driving the generic reservation framework, applied to storage. This will require to apply some renewal/extension semantics, should it be added ad-hoc ? Handling job flows with data seems to require capabilities beyond DAG. Should we be implementing a state machine? A shell? Any other idea ?

JRA1 All Hands meeting, Brno June /06/ Enabling Grids for E-sciencE INFSO-RI References SRM V1 API SRM V2 API: