SRM workshop – September’05 1 SRM: Expt Reqts Nick Brook Revisit LCG baseline services working group Priorities & timescales Use case (from LHCb)

Slides:

Advertisements

Similar presentations

Jens G Jensen CCLRC/RAL hepsysman 2005Storage Middleware SRM 2.1 issues hepsysman Oxford 5 Dec 2005.

Advertisements

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.

Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.

Oxford Jan 2005 RAL Computing 1 RAL Computing Implementing the computing model: SAM and the Grid Nick West.

1 of 7 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.

16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.

Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.

Summary of issues and questions raised. FTS workshop for experiment integrators Summary of use  Generally positive response on current state!  Now the.

Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.

Maarten Litmaath (CERN), EGEE User Forum, CERN, 2006/03/02 (v3) Use of the SRM interface Use case What is the SRM? –Who develops it? –Is it a standard?

Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.

LCG Plans for Chrsitmas Shutdown John Gordon, STFC-RAL GDB December 10 th, 2008.

INFSO-RI Enabling Grids for E-sciencE DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN.

Write-through Cache System Policies discussion and A introduction to the system.

Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.

Bulk Data Movement: Components and Architectural Diagram Alex Sim Arie Shoshani LBNL April 2009.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.

Andrew C. Smith – Storage Resource Managers – 10/05/05 Functionality and Integration Storage Resource Managers.

Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.

Data Access for Analysis Jeff Templon PDP Groep, NIKHEF A. Tsaregorodtsev, F. Carminati, D. Liko, R. Trompert GDB Meeting 8 march 2006.

1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.

SRM Monitoring 12 th April 2007 Mirco Ciriello INFN-Pisa.

WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.

US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.

SRM & SE Jens G Jensen WP5 ATF, December Collaborators Rutherford Appleton (ATLAS datastore) CERN (CASTOR) Fermilab Jefferson Lab Lawrence Berkeley.

1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.

Maarten Litmaath (CERN), GDB meeting, CERN, 2006/03/08 An update on SRM Mumbai & subsequent discussion summary –

CERN SRM Development Benjamin Coutourier Shaun de Witt CHEP06 - Mumbai.

LCG CCRC’08 Status WLCG Management Board November 27 th 2007

Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.

INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.

Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.

David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.

Storage Classes report GDB Oct Artem Trunov

Author - Title- Date - n° 1 Partner Logo WP5 Status John Gordon Budapest September 2002.

WLCG Grid Deployment Board CERN, 14 May 2008 Storage Update Flavia Donno CERN/IT.

Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,

LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.

SRM v2.2 Production Deployment SRM v2.2 production deployment at CERN now underway. – One ‘endpoint’ per LHC experiment, plus a public one (as for CASTOR2).

StoRM: status report A disk based SRM server.

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.

LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.

WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.

1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.

Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.

LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.

SRM 2.2: experiment requirements, status and deployment plans 6 th March 2007 Flavia Donno, INFN and IT/GD, CERN.

Operating Systems Distributed-System Structures. Topics –Network-Operating Systems –Distributed-Operating Systems –Remote Services –Robustness –Design.

Bologna, March 30, 2006 Riccardo Zappi / Luca Magnoni INFN-CNAF, Bologna.

Baseline Services Group Status of File Transfer Service discussions Storage Management Workshop 6 th April 2005 Ian Bird IT/GD.

ATLAS Use and Experience of FTS

Status of the SRM 2.2 MoU extension

StoRM Architecture and Daemons

SRM v2.2 planning Critical features for WLCG

SRM Developers' Response to Enhancement Requests

Taming the protocol zoo

SRM2 Migration Strategy

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

Virtual Memory Networks and Communication Department.

R. Graciani for LHCb Mumbay, Feb 2006

INFNGRID Workshop – Bari, Italy, October 2004

Production Manager Tools (New Architecture)

The LHCb Computing Data Challenge DC06

Presentation transcript:

SRM workshop – September’05 1 SRM: Expt Reqts Nick Brook Revisit LCG baseline services working group Priorities & timescales Use case (from LHCb)

SRM workshop – September’05 2 All experiments require SRM at all sites The WG has agreed a common “LCG-SRM” set of functions, that the experiments need: –LCG Service Challenge 3: v1.1 –LCG Service Challenge 4: LCG-SRM LCG SRM functionality: –V1.1 + space management, pin/unpin, etc –Not full set of V2.1 –V3 not required Baseline Services report

SRM workshop – September’05 3 Basic SRM functions (see link from baseline services group web pages - File types: Volatile - temporary & sharable copy of a MSS resident file - if not pinned can be removed by garbage collector Durable - file cannot be removed automatically. If space needed file may be copied to MSS … Permanent - system cannot remove file Expts only require - volatile & permanent file types

SRM workshop – September’05 4 Basic SRM functions Space reservation: SRM v1.1: space reservation done on file-by-file basis User doesn’t know in advance if SE will be able to store all files in request SRM v2.1: allows for a user to reserve space Reservation has a lifetime Data “PrepareToGet(Put)” requests fail if not enough space SRM v3.0: allows for “streaming” When space is exhausted new requests don’t fail but wait until space is released Expt happy with v2.1 space reservation functionality

SRM workshop – September’05 5 Basic SRM functions Permission functions: SRM v2.1: allows for a posix-like ACLs Can be associated with each directory or file Expts desire storage system to respect permissions based on VOMS roles & groups Expt have NO wish for file ownership by individual users

SRM workshop – September’05 6 Basic SRM functions Directory functions: Create/remove directories Delete files Rename directories or files (on a particular SE) Directory listing (not necessarily recursive listing) No need for “mv” (between SRM SE’s)

SRM workshop – September’05 7 Basic SRM functions Data transfer functions (misnomer - not actual data movement but to prepare access to data): -stageIn, stageOut type functionality -Pinning & unpinning functionality -Request token to monitor status of request -How many files ready -How many files in progress -How many files left to process -Suspend/re-start/abort request

SRM workshop – September’05 8 Basic SRM functions Relative paths: Relative paths in SURLS (with respect to a base VO home) -srm://castorsrm.cern.ch/castor/cern.ch/grid/lhcb/DC04/pro d0705/0705_123.dst -Define SE -Site definition for VO -VO definition { defines VO SE

SRM workshop – September’05 9 Basic SRM functions Query the protocols supported by site/SE: Function already in information system List of protocols supported by VO can be given in the application to SRM - return TURL with protocol applicable to site

SRM workshop – September’05 10 Prioritised List discussed at LCG SC3 workshop In descending order: 1.Pin/Unpin functionality 2.Relative paths in SURLs 3.Permission functions: All experiments would like the permissions to be based on roles & DN’s. SRM should be integrated with VOMS 4.Directory functions (with the exception of “mv”) 5.Global space reservation: ReserveSpace, ReleaseSpace and UpdateSpace, though CompactSpace is not needed 6.srmGetProtocols is seen as useful but not mandatory 7.AbortRequest, SuspendRequest and resumeRequest not seen essential First five seen as essential

SRM workshop – September’05 11 LCG Project Execution Board 28th June 2005 meeting: Follows on from SC3 workshop in June Directory functions – OK Permission functions – OK Pin/Unpin – OK. i.e. assuming adequate resources / priority, could be implemented in all relevant SRMs on schedule for SC4 (delivery < end January 2006) Relative paths in SURLS: request is for something like $VO_HOME, it too can be provided in time for SC4 Global space reservation: Requires more discussion with the developers. Unlikely to be delivered in time for SC4 but could perhaps be available mid-2006(?)

SRM workshop – September’05 12 LCG Timescales SC4 (Feb’2006) - “to have available all missing tools & existing components” Necessary to expose service well before start of SC4

SRM workshop – September’05 13 Step-by-step through stripping use case

SRM workshop – September’05 14 Use case Stripping: -centralised analysis -Reduced reconstructed dataset about factor of ~10 -Performed 4 times a year: twice with recons & two other times -Need to retrieve RAW & rDST data from MSS -Output files distributed to all Tier-1 centres

SRM workshop – September’ Production manager reserves the needed pool space at the Tier-1 centres to receive output from concurrent jobs 2.Production manager launches production jobs 3.Production job via SRM checks the VO specific namespace (without the need to know the site specific high level details) if the necessary directory structure exists at the production Tier-1 and the other Tier-1’s for the output files. If not will create the necessary directory hierarchy everywhere 4.Check to see if output file already exists at any Tier-1. If so exits with warning message to production manager. If file/directory exists at only 1 site it may be necessary for the production manager to delete/copy the offending file/directory elsewhere. 5.Job issues stage request via SRM for all needed input files 6.As files become available they are pinned by SRM with a validity time compatible with the expected duration of the job. 7.Job processes files as they become available, once processed the file will be unpinned 8.Once all (available) input files are processed the output file(s) are made permanent 9.Job checks the permission on the file to ensure against accidental deletion but to allow read permission for the entire VO 10.Once the production is finished the production manager releases reserved space at Tier-1’s. Need the reserve space management & reservation functionality - necessary to ensure the stripped DSTs have storage at LHCb Tier-1’s. Likely to be a slight overestimate &/or usage of “space update”

SRM workshop – September’ Production manager reserves the needed pool space at the Tier-1 centres to receive output from concurrent jobs 2.Production manager launches production jobs 3.Production job via SRM checks the VO specific namespace if the necessary directory structure exists at the production Tier-1 and the other Tier-1’s for the output files. If not will create the necessary directory hierarchy everywhere 4.Check to see if output file already exists at any Tier-1. If so exits with warning message to production manager. If file/directory exists at only 1 site it may be necessary for the production manager to delete/copy the offending file/directory elsewhere. 5.Job issues stage request via SRM for all needed input files 6.As files become available they are pinned by SRM with a validity time compatible with the expected duration of the job. 7.Job processes files as they become available, once processed the file will be unpinned 8.Once all (available) input files are processed the output file(s) are made permanent 9.Job checks the permission on the file to ensure against accidental deletion but to allow read permission for the entire VO 10.Once the production is finished the production manager releases reserved space at Tier-1’s. Need basic directory functionality - necessary to create the hierarchical structure to receive data & check files don’t exist etc

SRM workshop – September’ Production manager reserves the needed pool space at the Tier-1 centres to receive output from concurrent jobs 2.Production manager launches production jobs 3.Production job via SRM checks the VO specific namespace if the necessary directory structure exists at the production Tier-1 and the other Tier-1’s for the output files. If not will create the necessary directory hierarchy everywhere 4.Check to see if output file already exists at any Tier-1. If so exits with warning message to production manager. If file/directory exists at only 1 site it may be necessary for the production manager to delete/copy the offending file/directory elsewhere. 5.Job issues stage request via SRM for all needed input files 6.As files become available they are pinned by SRM with a validity time compatible with the expected duration of the job. 7.Job processes files as they become available, once processed the file will be unpinned 8.Once all (available) input files are processed the output file(s) are made permanent 9.Job checks the permission on the file to ensure against accidental deletion but to allow read permission for the entire VO 10.Once the production is finished the production manager releases reserved space at Tier-1’s. Current SRM v1.1 functionality - ability to optimise use of MSS system

SRM workshop – September’ Production manager reserves the needed pool space at the Tier-1 centres to receive output from concurrent jobs 2.Production manager launches production jobs 3.Production job via SRM checks the VO specific namespace if the necessary directory structure exists at the production Tier-1 and the other Tier-1’s for the output files. If not will create the necessary directory hierarchy everywhere 4.Check to see if output file already exists at any Tier-1. If so exits with warning message to production manager. If file/directory exists at only 1 site it may be necessary for the production manager to delete/copy the offending file/directory elsewhere. 5.Job issues stage request via SRM for all needed input files 6.As files become available they are pinned by SRM with a validity time compatible with the expected duration of the job. 7.Job processes files as they become available, once processed the file will be unpinned 8.Once all (available) input files are processed the output file(s) are made permanent 9.Job checks the permission on the file to ensure against accidental deletion but to allow read permission for the entire VO 10.Once the production is finished the production manager releases reserved space at Tier-1’s. Many jobs will be running in parallel - important for them not to interfere with each other. Essential to pin file once staged to ensure its availability (& unpin, of course!)

SRM workshop – September’ Production manager reserves the needed pool space at the Tier-1 centres to receive output from concurrent jobs 2.Production manager launches production jobs 3.Production job via SRM checks the VO specific namespace if the necessary directory structure exists at the production Tier-1 and the other Tier-1’s for the output files. If not will create the necessary directory hierarchy everywhere 4.Check to see if output file already exists at any Tier-1. If so exits with warning message to production manager. If file/directory exists at only 1 site it may be necessary for the production manager to delete/copy the offending file/directory elsewhere. 5.Job issues stage request via SRM for all needed input files 6.As files become available they are pinned by SRM with a validity time compatible with the expected duration of the job. 7.Job processes files as they become available, once processed the file will be unpinned 8.Once all (available) input files are processed the output file(s) are made permanent 9.Job checks the permission on the file to ensure against accidental deletion but to allow read permission for the entire VO 10.Once the production is finished the production manager releases reserved space at Tier-1’s. Current SRM v1.1 functionality - ability to store files in MSS system. Necessary before a reserved space is released!

SRM workshop – September’ Production manager reserves the needed pool space at the Tier-1 centres to receive output from concurrent jobs 2.Production manager launches production jobs 3.Production job via SRM checks the VO specific namespace if the necessary directory structure exists at the production Tier-1 and the other Tier-1’s for the output files. If not will create the necessary directory hierarchy everywhere 4.Check to see if output file already exists at any Tier-1. If so exits with warning message to production manager. If file/directory exists at only 1 site it may be necessary for the production manager to delete/copy the offending file/directory elsewhere. 5.Job issues stage request via SRM for all needed input files 6.As files become available they are pinned by SRM with a validity time compatible with the expected duration of the job. 7.Job processes files as they become available, once processed the file will be unpinned 8.Once all (available) input files are processed the output file(s) are made permanent 9.Job checks the permission on the file to ensure against accidental deletion but to allow read permission for the entire VO 10.Once the production is finished the production manager releases reserved space at Tier-1’s. Permissions functions - essential to ensure file is readable by whole VO and only production manager has write access. Specialised stripping would need to set group/sub-group privileges

SRM workshop – September’ Production manager reserves the needed pool space at the Tier-1 centres to receive output from concurrent jobs 2.Production manager launches production jobs 3.Production job via SRM checks the VO specific namespace if the necessary directory structure exists at the production Tier-1 and the other Tier-1’s for the output files. If not will create the necessary directory hierarchy everywhere 4.Check to see if output file already exists at any Tier-1. If so exits with warning message to production manager. If file/directory exists at only 1 site it may be necessary for the production manager to delete/copy the offending file/directory elsewhere. 5.Job issues stage request via SRM for all needed input files 6.As files become available they are pinned by SRM with a validity time compatible with the expected duration of the job. 7.Job processes files as they become available, once processed the file will be unpinned 8.Once all (available) input files are processed the output file(s) are made permanent 9.Job checks the permission on the file to ensure against accidental deletion but to allow read permission for the entire VO 10.Once the production is finished the production manager releases reserved space at Tier-1’s. Again use of space management functions - needed for releasing the space used for the output of stripping.

SRM workshop – September’05 22 Summary All expts see SRM as essential Fundamental building block of a datagrid LCG set of functionality - agreed by all expts Optimisation of MSS essential Communicate multiple requests Handle priorities Optimise tape access Expose to expt before SC4 Timescales tight } Storage system manager reqts but recognised by expts