6 Sep Storage Classes implementations Artem Trunov IN2P3, France
6 Sep Purpose and motivation for the Storage Classes WG The WG group is created by Kors Bos after FNAL SRM workshop. da Discussion between storage admins about new SRM v2.2 standard, providing feedback to experiments and developers. Kor’s topic list how do sites implement disk0tape1, disk1tape0 and disk1tape1 data classes? do these classes fit the way the sites intend to store the experiments‘ raw, esd, reco, dst, aod, tag and mc datasets? and is this what the experiments had in mind? how do sites plan to store these different classes of data? can we achieve any storing and/or naming conventions? is that usefull? will all services defined in the SRMv2.2 interface work in such a scheme? More specifically srmLS, srmRm on the data classes mentioned before? can we define a process for managing resource allocation using this scheme, define the responsibilities of the experiments and the sites, and identify any common tools that may be required? Understanding experiments’ requirements, their usage of storage classes Need to make sure we are all “on the same page”: experiments, sites’ storage experts, developers… Need to “freeze” terminology, and start to speak the same language. Need to know the development road map and plan The need for this WG was confirmed by almost 2h discussion triggered by a presentation before SRM/Storage developers during SRM v2.2 workshop at CERN on Aug 29.
6 Sep A bit of history SRM (Storage Resource Management) was emerged at LBNL(Nersk) as a proposal for a common interface to managing storage resources. It is also viewed as a great tool in LCG to provide inter-site operability and give experiments a storage-independent way of managing their data at numerous institutions participating in LCH and other experiments. SRM V1 was specified and implemented in Castor, dCache and DMP. Some interoperability issues were uncovered as well as experiments raised concerns that functionality available with SRM v1 interface doesn’t cover experiment’s use cases At Mumbai SC4 workshop it was agreed to make a certain convention to carry out SC4, and then come up with new standard proposal. FNAL workshop in May came up with SRM v2.2 standard.
6 Sep SRM version 2 A lot more complex, file get, put requests use: TFileStorageType {VOLATILE, DURABLE, PERMANENT} TRetentionPolicy { REPLICA, OUTPUT, CUSTODIAL } TAccessLatency { ONLINE, NEARLINE } TAccessPattern { TRANSFER_MODE, PROCESSING_MODE } TConnectionType { WAN, LAN }
6 Sep SRM version 2 – cont. …which effectively Requires a lot of new development of storage systems, which is sometimes orthogonal to storage system architecture and Gives user more freedom than he actually needs Leaves storage administrators curious about how to setup site’s storage. And finally it doesn’t make anyone’s life easier Complexity leads to misunderstanding!
6 Sep Storage Classes A concept external to SRM v2 – not mentioned in the standard But meant to be used to simplify use of SRM v2 An LCG interpretation and guidance Puts into consideration several use cases and storage usage patterns most often used by experiments and in agreement with site’s storage setup practices. About the last agreement before LCH start up – to be considered very seriously.
6 Sep Storage Classes – basic use cases Three storage classes are really what experiments want, and how they mostly want to do their data management: Tape1Disk0, Long Term Inactive Use case: transfer of raw data from CERN. It needs to be archived on tape, but not kept on disk after archiving. It will be later staged in for reprocessing. Tape1Disk1, Long Term Active Use case: transfer of aod data produced elsewhere for custodial storage (first copy at producer’s site). It needs to be archived on tape, and kept on disk for user’s analysis. Tape0Disk1, Short Term Active Use case: temporary output of production data, which doesn’t need to be archived to tape. Use case: replica of aod data for which the site is not a custodian (Atlas). Plus, experiments will need to request data stage in, purge from disk, remove permanently from disk and tape.
6 Sep Tape0Disk1 TFileStorageType PERMANENT TRetentionPolicy REPLICA TAccessLatency ONLINE TAccessPattern TRANSFER or PROCESSING TConnectionType WAN or LAN Tape WAN disk - File stays in disk Probably the easiest Files are on disk. Explicitly managed by the VO Tape backend not required, but if used, files are guaranteed to be on disk When disk is full, srm put() will return an error
6 Sep Tape1Disk0 TFileStorageType PERMANENT TRetentionPolicy CUSTODIAL TAccessLatency NEARLINE TAccessPattern TRANSFER or PROCESSING TConnectionType WAN or LAN Tape WAN disk Files are transferred to the WAN pool (by definition), but don’t stay on disk System managed Space reservation is not applied to WAN pool - File does not stay on disk (doesn’t have a pin)
6 Sep Tape1Disk0 Tape WAN disk A File can be stage in On LAN pool for analysis by srmBringOnline On WAN pool for transfer By srmPrepareToGet System managed in both cases, but user- specified pin time can be set or taken into account Castor implementation doesn’t honor user pin time but uses it as a weighting factor Sites can chose to separate WAN (transfer) and LAN (processing) pools for few reasons: Security - keep on internet only minimum, and move the rest of storage to IFZ. Not mixing I/O patterns: large block sequential I/O of transfers with small block random read of processing LAN disk - A file has some pin time on disk
6 Sep Tape1Disk1 Files are pinned on disk by the system and guaranteed to be on disk. In principle, this class can share the same physical disk space with Tape0Disk1, as long as the system can enforce quotas on disk for Tape1Disk1 Castor will need to divide disk pools WAN pool basically becomes just a technical intermediate pool used for transfers Castor and dCache will internally transfer files to corresponding pool. But when experiments request disk space, they should know they pay for extra space on the transfer pools. A special case when a file does need to stay on the WAN pool (e.g. CMS multihop transfers), it should be specially arranged. I.e. not using Tape1Disk1 class Tape WAN disk LAN disk - File does not stay on disk (doesn’t have a pin) TFileStorageType PERMANENT TRetentionPolicy CUSTODIAL TAccessLatency ONLINE TAccessPattern TRANSFER or PROCESSING TConnectionType WAN or LAN
6 Sep Class transitions srmChangeSpaceForFiles needs to be used to change one class to another. Space transition was not initially considered as important – in the final document of FNAL SRM workshop it was said that class transitions are not required by the start of LHC However it’s clear, that use of Tape1Disk1 class requires this! Since no one want to keep his files on disk forever. Support of this operation in our storage systems Castor is going to implement this, but the details of the implementation not yet known. In discussion with dCache team, if they can change priorities and support class transitions for the initial release of dCache ith SRM v2.2 Possible implementations: Tape1Disk1→Tape1Disk0: the system will simply remove a disk replica, or remove a pin to let garbage collector remove a file later. Tape1Disk0→Tape1Disk1: the system will stage in a file and pin it on disk.
6 Sep Class transition Tape0 → Tape1 I.e. user-induced migration ordered via SRM interface. This is the most controversial Requires some non-trivial development Since most of storage systems are based on per directory setting of migration and staging. Although a use case were presented, one can easily get away without it LHCb proposed the following use case: a physicist makes “validation production”. By default he doesn’t want to store output on tapes, but if he likes the result, he will chose to store it and continue with production. However in this case it may be simpler to physically copy the output using Tape1Disk0 class, rather than to implement Tape0 → Tape1 operations! Another alternative – to store all files initially using Tape1Disk0 class is also not bad. Support of this operation in our storage systems Castor plans to implement it dCache is concerned, still investigating.
6 Sep Tape1Disk1 – experiments’ view CMS doesn’t yet have all tools to manage disk space (other than for transfer). So, de facto only Tape1Disk0 and Tape0Disk1 are used. The simplest DM solution seems to use srmBringOnline once on a new data set and rely on a system garbage collector to push older dataset from disk. But may also use Tape1Disk1 if it’s all clear how to use it and how it will be working. Alice is also confused, and prefer to implement it’s own solution for files that need to be on disk guaranteed. Will make a logical replicas (in its own file catalog) and will physically copy files to Tape0Disk1 pool. LHCb Illustrated use of this class by the following use case: Raw files are transferred to T1 site for custodial storage, but need to stay on disk for ~48 hours for processing. Then their class is changed to Tape1Disk0.
6 Sep How? How is the desired storage class passed with a transfer request? A string with command line tools: srmcp, glite- transfer-submit Those tools internally map it to corresponding TRetentionPolicy and TAccessLatency If the specified storage class doesn’t match the target system’s settings for the SURL, an error will be returned.
6 Sep Recommendations To sites Work with experiments on understanding their storage requirements in terms of storage classes Participate in our Storage Classes WG where experiments will present their storage requirements in details. Default class is determined by the storage set up, i.e. if a directory is set migratable, it’s Tape1Disk0 If a directory is set non-migratable, it’s Tape0Disk1 To experiments Work with storage admins on understanding storage setup at sites Think and talk in terms of three storage classes and their implementations. Their may not be another option available! Hold on with usage of Tape1Disk1 class until storage systems support class transitions Use Tape1Disk0 and Disk1Tape0 by now. Try to see how computing model fits in those storage classes. To all – let’s really try to move to a scheme where resource allocation requests are easy: “I want X TB of class Tape1Disk0 and Y TB of class Tape0Disk1”.
6 Sep Other topics of discussion in Storage Classes WG – VO name space Storage admins strongly prefer that files with different access pattern are stored in different “service classes” of MSS, (read = on different tapes) For the mutual benefit of sites and expriments This requires that VO files are stored in well defined directories under the VO end point Atlas is a good example of storing file in a sensible directory structure: atlas/data/raw /esd /aod /tag This allows to setup local storage with different service classes per dir. We will continue to work with experiments on this. At minumum, we would like to see raw separated from everything else.
6 Sep Reporting lost, unavailable files When a storage admin finds some files lost or unavailable due to e.g. hardware failure, how to notify the affected VO? When VO (a user) find some files inaccessible, how to report it to the site? Need a simple common mechanism in both directions Direct is the key word! – to reduce response latency. Proposing to use CIC portal for this task. It needs some modification: Add VO storage admins category They will receive a list of lost files from site admins Add site’s storage admin category They will receive info about files access errors
6 Sep Links CERN SRM v2.2 workshop s/GridStorageInterfacesWSAgenda s/GridStorageInterfacesWSAgenda FNAL SRM workshop s/GridStorageInterfacesWSAgenda s/GridStorageInterfacesWSAgenda WLCG SRM v2.2 usage agreement introducing storage classes torageInterfacesWSAgenda/SRMLCG-MoU-day2.doc Storage Classes WG virtual meetings