Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo.

Slides:



Advertisements
Similar presentations
Andrew Hanushevsky7-Feb Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University.
Advertisements

© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Enigma Data’s SmartMove.
Distributed Tier1 scenarios G. Donvito INFN-BARI.
1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.
1 RAL Status and Plans Carmine Cioffi Database Administrator and Developer 3D Workshop, CERN, November 2009.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Distributed File Systems
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Status report on SRM v2.2 implementations: results of first stress tests 2 th July 2007 Flavia Donno CERN, IT/GD.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Daniela Anzellotti Alessandro De Salvo Barbara Martelli Lorenzo Rinaldi.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
08/30/05GDM Project Presentation Lower Storage Summary of activity on 8/30/2005.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
1 INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff 28 th October 2009.
The LHCb Italian Tier-2 Domenico Galli, Bologna INFN CSN1 Roma,
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.
EPICS Release 3.15 Bob Dalesio May 19, Features for 3.15 Support for large arrays - done for rsrv in 3.14 Channel access priorities - planned to.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Implementation and performance analysis of.
High Availability in DB2 Nishant Sinha
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Padova, 5 October StoRM Service view Riccardo Zappi INFN-CNAF Bologna.
Reliability of KLOE Computing Paolo Santangelo for the KLOE Collaboration INFN LNF Commissione Scientifica Nazionale 1 Roma, 13 Ottobre 2003.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
TS7700 Performance and Capacity Daily charts only Enter Description, Month & Year in this Text Box.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
Storage Classes report GDB Oct Artem Trunov
1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
GridKa December 2004 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann dCache Implementation at FZK Forschungszentrum Karlsruhe.
DMLite GridFTP frontend Andrey Kiryanov IT/SDC 13/12/2013.
01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Storage & Database Team Activity Report INFN CNAF,
DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.
An Introduction to GPFS
Bologna, March 30, 2006 Riccardo Zappi / Luca Magnoni INFN-CNAF, Bologna.
Validation tests of CNAF storage infrastructure Luca dell’Agnello INFN-CNAF.
CTA: CERN Tape Archive Rationale, Architecture and Status
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
Compute and Storage For the Farm at Jlab
CASTOR: possible evolution into the LHC era
Scalable sync-and-share service with dCache
Status Report dello Storage al Tier1
GEMSS: GPFS/TSM/StoRM
StoRM: a SRM solution for disk based storage systems
Experiences and Outlook Data Preservation and Long Term Analysis
StoRM Architecture and Daemons
Introduction to Data Management in EGI
Luca dell’Agnello INFN-CNAF
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
CTA: CERN Tape Archive Overview and architecture
Storage Virtualization
Data Management cluster summary
Kirill Lozinskiy NERSC Storage Systems Group
IBM Tivoli Storage Manager
Presentation transcript:

Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo Zappi (INFN, CNAF) Lunca Magnoni (INFN, CNAF) Elisabetta Ronchieri (INFN, CNAF) Vincenzo Vagnoni (INFN, Bologna )

07/05/2008 2HEPiX 2008, Geneve Storage CNAF Implementation of 3 Storage Classes needed for LHC Implementation of 3 Storage Classes needed for LHC Disk0Tape1 (D0T1)  CASTOR Disk0Tape1 (D0T1)  CASTOR Space managed by system Space managed by system Data migrated to tapes and deleted from when staging area is full Data migrated to tapes and deleted from when staging area is full Disk1tape0 (D1T0)  GPFS/StoRM (in production) Disk1tape0 (D1T0)  GPFS/StoRM (in production) Space managed by VO Space managed by VO Disk1tape1 (D1T1)  CASTOR (production), GPFS/StoRM (production prototype for LCHb only) Disk1tape1 (D1T1)  CASTOR (production), GPFS/StoRM (production prototype for LCHb only) Space managed by VO (i.e. if disk is full, copy fails) Space managed by VO (i.e. if disk is full, copy fails) Large permanent buffer of disk with tape back-end and no gc Large permanent buffer of disk with tape back-end and no gc

07/05/2008 3HEPiX 2008, Geneve Looking into HSM solution on the base of StoRM/GPFS/TSM Project developed as a collaboration between: Project developed as a collaboration between: GPFS development team (US) GPFS development team (US) TSM HSM development team (Germany) TSM HSM development team (Germany) End-users (INFN-CNAF) End-users (INFN-CNAF) Main idea is to combine new features of GPFS (v.3.2) and TSM (v.5.5) with SRM (StoRM), to provide transparent GRID- friendly HSM solution. Main idea is to combine new features of GPFS (v.3.2) and TSM (v.5.5) with SRM (StoRM), to provide transparent GRID- friendly HSM solution. Information Lifecycle Management (ILM) used to order moving of data between disks and tapes Information Lifecycle Management (ILM) used to order moving of data between disks and tapes Interface between GPFS and TSM is on our shoulders Interface between GPFS and TSM is on our shoulders Improvements and development are needed from all sides Improvements and development are needed from all sides Transparent recall vs. massive (list ordered, optimized) recalls Transparent recall vs. massive (list ordered, optimized) recalls

07/05/2008 4HEPiX 2008, Geneve What we have now GPFS and TSM are widely used as separate products GPFS and TSM are widely used as separate products Build-in functionality in both products to implement backup and archiving from GPFS. Build-in functionality in both products to implement backup and archiving from GPFS. In GPFS v.3.2 concept of “external storage pool” extends use of policy driven ILM to tape storage. In GPFS v.3.2 concept of “external storage pool” extends use of policy driven ILM to tape storage. Some groups in HEP world are starting to investigate this solution or expressed interest to start Some groups in HEP world are starting to investigate this solution or expressed interest to start

07/05/2008 5HEPiX 2008, Geneve GPFS Approach: “External Pools” External pools are really interfaces to external storage managers, e.g. HPSS or TSM External pools are really interfaces to external storage managers, e.g. HPSS or TSM External pool “rule” defines script to call to migrate/recall/etc. files External pool “rule” defines script to call to migrate/recall/etc. files RULE EXTERNAL POOL ‘PoolName’ EXEC ‘InterfaceScript’ [ OPTS ’options’] GPFS policy engine builds candidate lists and passes them to external pool scripts GPFS policy engine builds candidate lists and passes them to external pool scripts External storage manager actually moves the data External storage manager actually moves the data

07/05/2008 6HEPiX 2008, Geneve Storage class Disk1-Tape1 D1T1 prototype in GPFS/TSM was tested for about two months D1T1 prototype in GPFS/TSM was tested for about two months Quite simple when no competition between migration and recall Quite simple when no competition between migration and recall D1T1 requires that every file written to disk will be copied to tape (and remain resident on disk) D1T1 requires that every file written to disk will be copied to tape (and remain resident on disk) recalls needed only in case of data loss (on disk) recalls needed only in case of data loss (on disk) Although the D1T1 is a living concept… Although the D1T1 is a living concept… Some adjustments were needed in StoRM Some adjustments were needed in StoRM Basically to place a file on hold for migration until the write operation is completed (SRM “putDone” on file) Basically to place a file on hold for migration until the write operation is completed (SRM “putDone” on file) Definitely positive results of the test with the current testbed hardware Definitely positive results of the test with the current testbed hardware Need to more tests up with a larger scale Need to more tests up with a larger scale Need to establish production model Need to establish production model

07/05/2008 7HEPiX 2008, Geneve Storage class Disk0-Tape1 Prototype is ready and being tested now Prototype is ready and being tested now More complicated logic is needed More complicated logic is needed Define priority between reads and writes Define priority between reads and writes For example in actual version of CASTOR migration to tape have absolute priority For example in actual version of CASTOR migration to tape have absolute priority logic of reordering of recall “list optimized recall”: by tapes and by files inside a tape logic of reordering of recall “list optimized recall”: by tapes and by files inside a tape The logic is realized by means of special scripts The logic is realized by means of special scripts First tests are encouraging, even considering the complexity of the problem First tests are encouraging, even considering the complexity of the problem Modification were requested in StoRM to implement recall logic and file pinning for files in use. Modification were requested in StoRM to implement recall logic and file pinning for files in use. The identified solutions are simple and linear The identified solutions are simple and linear

07/05/2008 8HEPiX 2008, Geneve GPFS+TSM tests So far we have performed full tests of a D1T1 solution (StoRM+GPFS+TSM) and the D0T1 implementation is being developed in close contact with IBM GPFS and TSM developers So far we have performed full tests of a D1T1 solution (StoRM+GPFS+TSM) and the D0T1 implementation is being developed in close contact with IBM GPFS and TSM developers The D1T1 is entering now its first production phase, being used by LHCb during this month’s CCRC08 The D1T1 is entering now its first production phase, being used by LHCb during this month’s CCRC08 As well as the D1T0, which is served by the same GPFS cluster but without migrations As well as the D1T0, which is served by the same GPFS cluster but without migrations GPFS/StoRM based D1T0 is also already used since February by Atlas GPFS/StoRM based D1T0 is also already used since February by Atlas

07/05/2008 9HEPiX 2008, Geneve D1T0 and using StoRM/GPFS/TSM 3 STORM instances 3 STORM instances 3 major HEP experiments 3 major HEP experiments 2 Storage classes 2 Storage classes 12 servers, 200TB of disk space 12 servers, 200TB of disk space 3 LTO2 tape drives 3 LTO2 tape drives

07/05/ HEPiX 2008, Geneve Hardware used for test 40TB GPFS File system (v ) served by 4 I/O NSD servers (SAN devices are EMC CX3-80) 40TB GPFS File system (v ) served by 4 I/O NSD servers (SAN devices are EMC CX3-80) FC (4Gbit/s) interconnection between servers and disks array FC (4Gbit/s) interconnection between servers and disks array TSM v.5.5 TSM v servers (1Gb Ethernet) HSM front-ends each one acting as: 2 servers (1Gb Ethernet) HSM front-ends each one acting as: GPFS client (reads and writes on the file-system via LAN) GPFS client (reads and writes on the file-system via LAN) TSM client (reads and writes from/to tapes via FC) TSM client (reads and writes from/to tapes via FC) 3 LTO-2 tape drives 3 LTO-2 tape drives Sharing of the tape library (STK L5500) between Castor e TSM Sharing of the tape library (STK L5500) between Castor e TSM i.e. working together with the same tape library i.e. working together with the same tape library

07/05/ HEPiX 2008, Geneve GPFS Server GPFS/TSM client TSM server Tape drive GPFS TSM Gigabit LAN FC SAN GPFS Server gridftp Server DB TSM server (backup) DB mirror 2 EMC CX3-80 controllers 4 GPFS server 2 StoRM servers 2 Gridftp Servers 2 HSM frontend nodes 3 Tape Drive LTO-2 1 TSM server 1/10 Gbps Ethernet 2/4 Gbps FC LHCb D1T0 and D1T1 details … FC TAN

07/05/ HEPiX 2008, Geneve How it works GPFS performs file system metadata scans according to ILM policies specified by the administrators GPFS performs file system metadata scans according to ILM policies specified by the administrators The metadata scan is very fast (is not a find…) and is used by GPFS to identify the files which need to be migrated to tape The metadata scan is very fast (is not a find…) and is used by GPFS to identify the files which need to be migrated to tape Once the list of files are obtained, it is passed to an external process which is run on the HSM nodes and it actually performs the migration to TSM Once the list of files are obtained, it is passed to an external process which is run on the HSM nodes and it actually performs the migration to TSM This is in particular what we implemented This is in particular what we implemented Note: Note: The GPFS file system and the HSM nodes can be kept completely decoupled, in the sense that it is possible to shutdown the HSM nodes without interrupting the file system availability The GPFS file system and the HSM nodes can be kept completely decoupled, in the sense that it is possible to shutdown the HSM nodes without interrupting the file system availability All components of the system are having intrinsic redundancy (GPFS failover mechanisms). All components of the system are having intrinsic redundancy (GPFS failover mechanisms). No need to put in place any kind of HA features (apart from the unique TSM server) No need to put in place any kind of HA features (apart from the unique TSM server)

07/05/ HEPiX 2008, Geneve Example of a ILM policy /* Policy implementing T1D1 for LHCb: -) 1 GPFS storage pool -) 1 GPFS storage pool -) 1 SRM space token: LHCb_M-DST -) 1 SRM space token: LHCb_M-DST -) 1 TSM management class -) 1 TSM management class -) 1 TSM storage pool */ -) 1 TSM storage pool */ /* Placement policy rules */ RULE 'DATA1' SET POOL 'data1' LIMIT (99) RULE 'DATA2' SET POOL 'data2' LIMIT (99) RULE 'DEFAULT' SET POOL 'system' /* We have 1 space token: LHCb_M-DST. Define 1 external pool accordingly. */ RULE EXTERNAL POOL 'TAPE MIGRATION LHCb_M-DST‘ EXEC '/var/mmfs/etc/hsmControl' OPTS 'LHCb_M-DST‘ EXEC '/var/mmfs/etc/hsmControl' OPTS 'LHCb_M-DST‘ /* Exclude from migration hidden directories (e.g..SpaceMan), baby files, hidden and weird files. */ baby files, hidden and weird files. */ RULE 'exclude hidden directories' EXCLUDE WHERE PATH_NAME LIKE '%/.%' RULE 'exclude hidden file' EXCLUDE WHERE NAME LIKE '.%' RULE 'exclude empty files' EXCLUDE WHERE FILE_SIZE=0 RULE 'exclude baby files' EXCLUDE WHERE (CURRENT_TIMESTAMP-MODIFICATION_TIME)<INTERVAL '3' MINUTE WHERE (CURRENT_TIMESTAMP-MODIFICATION_TIME)<INTERVAL '3' MINUTE

07/05/ HEPiX 2008, Geneve Example of a ILM policy (cont.) /* Migrate to the external pool according to space token (i.e. fileset). */ space token (i.e. fileset). */ RULE 'migrate from system to tape LHCb_M-DST' MIGRATE FROM POOL 'system' THRESHOLD(0,100,0) WEIGHT(CURRENT_TIMESTAMP-ACCESS_TIME) TO POOL 'TAPE MIGRATION LHCb_M-DST' FOR FILESET('LHCb_M-DST') RULE 'migrate from data1 to tape LHCb_M-DST' MIGRATE FROM POOL 'data1' THRESHOLD(0,100,0) WEIGHT(CURRENT_TIMESTAMP-ACCESS_TIME) TO POOL 'TAPE MIGRATION LHCb_M-DST' FOR FILESET('LHCb_M-DST') RULE 'migrate from data2 to tape LHCb_M-DST' MIGRATE FROM POOL 'data2' THRESHOLD(0,100,0) WEIGHT(CURRENT_TIMESTAMP-ACCESS_TIME) TO POOL 'TAPE MIGRATION LHCb_M-DST' FOR FILESET('LHCb_M-DST')

07/05/ HEPiX 2008, Geneve Example of configuration file # HSM node list (comma separated) HSMNODES=diskserv-san-14,diskserv-san-16 # system directory path SVCFS=/storage/gpfs_lhcb/system # filesystem scan minimum frequency (in sec) SCANFREQUENCY=1800 # maximum time allowed for a migrate session (in sec) MIGRATESESSIONTIMEOUT=4800 # maximum number of migrate threads per node MIGRATETHREADSMAX=30 # number of files for each migrate stream MIGRATESTREAMNUMFILES=30 # sleep time for lock file check loop LOCKSLEEPTIME=2 # pin prefix PINPREFIX=.STORM_T1D1_ # TSM admin user name TSMID=xxxxx # TSM admin user password TSMPASS=xxxxx # report period (in sec) REPORTFREQUENCY=86400 # report addresses (comma separated) # alarm addresses (comma separated) # alarm delay (in sec) ALARM DELAY=7200

07/05/ HEPiX 2008, Geneve Example of a report A first automatic reporting system has been implemented Start: Sun 04 May :38:48 PM CEST Stop: Mon 05 May :03:15 AM CEST Seconds: Tape Files Failures File throughput Total throughput L MiB/s MiB/s L MiB/s MiB/s L MiB/s MiB/s L MiB/s MiB/s L MiB/s MiB/s L MiB/s MiB/s L MiB/s MiB/s L MiB/s MiB/s L MiB/s MiB/s L MiB/s MiB/s L MiB/s MiB/s L MiB/s MiB/s Drive Files Failures File throughput Total throughput DRIVE MiB/s MiB/s DRIVE MiB/s MiB/s DRIVE MiB/s MiB/s Host Files Failures File throughput Total throughput diskserv-san MiB/s MiB/s diskserv-san MiB/s MiB/s Files Failures File throughput Total throughput Files Failures File throughput Total throughput Total MiB/s MiB/s Alarm part is being developed An is sent with the reports every day (period of time is configurable by the option file)

07/05/ HEPiX 2008, Geneve Description of the tests Test A Test A Data transfer of LHCb files from CERN Castor-disk to CNAF StoRM/GPFS using the File Transfer Service Data transfer of LHCb files from CERN Castor-disk to CNAF StoRM/GPFS using the File Transfer Service Automatic migration of the data files from GPFS to TSM while the data was being transferred by FTS Automatic migration of the data files from GPFS to TSM while the data was being transferred by FTS This is a realistic scenario This is a realistic scenario Test B Test B 1GiB zero’ed files created locally on the GPFS file system with the migration turned off, then migrated to tape when the writes were finished 1GiB zero’ed files created locally on the GPFS file system with the migration turned off, then migrated to tape when the writes were finished The migration of zero’ed files to tape is faster due to compression  measures physical limits of the system The migration of zero’ed files to tape is faster due to compression  measures physical limits of the system Test C Test C Similar to Test B, but with real LHCb data files instead of dummy zero’ed files Similar to Test B, but with real LHCb data files instead of dummy zero’ed files Realistic scenario, e.g. when for maintenance a long queue of files to be migrated accumulates in the file system Realistic scenario, e.g. when for maintenance a long queue of files to be migrated accumulates in the file system

07/05/ HEPiX 2008, Geneve Test A: input files Most of the files are of 4 and 2 GiB size, with a bit of other sizes in addition Most of the files are of 4 and 2 GiB size, with a bit of other sizes in addition data files are LHCb stripped DST data files are LHCb stripped DST 2477 files 2477 files 8 TiB in total 8 TiB in total File size distribution

07/05/ HEPiX 2008, Geneve Test A: results Black curve: net data throughput from CERN to CNAF vs. time Red curve: net data throughput from GPFS to TSM FTS transfers were temporarily interrupted Just two LTO-2 drives A third LTO-2 drive was added A drive was removed 8 TiB in total were transferred to tape in 150k seconds (almost 2 days) from CERN About 50 MiB/s to tape with two LTO-2 drives and 65 MiB/s with three LTO-2 drives Zero tape migration failures Zero retrials

07/05/ HEPiX 2008, Geneve Test A: results (II) Most of the files were migrated within less than 3 hours with a tail up to 8 hours Most of the files were migrated within less than 3 hours with a tail up to 8 hours The tail comes from the fact that at some point the CERN-to-CNAF throughput raised to 80 MiB/s, overcoming max performance of tape migration at that time. So, GPFS/TSM accumulated a queue of files with respect to the FTS transfers The tail comes from the fact that at some point the CERN-to-CNAF throughput raised to 80 MiB/s, overcoming max performance of tape migration at that time. So, GPFS/TSM accumulated a queue of files with respect to the FTS transfers Retention time on disk (time since file is written until it is migrated to tape)

07/05/ HEPiX 2008, Geneve Test A: results (III) The distribution peaks at about 33 MiB/s which is the maximum sustainable for LHCb data files by the LTO-2 drives The distribution peaks at about 33 MiB/s which is the maximum sustainable for LHCb data files by the LTO-2 drives Due to compression the actual performance depend on the content of the files… Due to compression the actual performance depend on the content of the files… Tail is mostly due to the fact that some of the tapes showed much smaller throughputs Tail is mostly due to the fact that some of the tapes showed much smaller throughputs For this test we reused old tapes no longer used by Castor For this test we reused old tapes no longer used by Castor Distribution of throughput per migration to tape What is this secondary peak? It is due to files which are written to the end of the tapes and the TSM splits them to a subsequent tape (i.e. must dismount and remount a new tape to continue writing the file)

07/05/ HEPiX 2008, Geneve Intermezzo Between Test A and Test B we realized that the interface logics was not perfectly balancing between the two HSM nodes Between Test A and Test B we realized that the interface logics was not perfectly balancing between the two HSM nodes Then the logics of the interface has been slightly changed in order to improve the performance Then the logics of the interface has been slightly changed in order to improve the performance

07/05/ HEPiX 2008, Geneve Test B: results File system prefilled with 1000 files of 1 GiB size each all filled with zeroes File system prefilled with 1000 files of 1 GiB size each all filled with zeroes migration to tape turned off while writing data to disk migration to tape turned off while writing data to disk Migration to tape turned on when prefilling finished Migration to tape turned on when prefilling finished Hardware compression is very effective for such files Hardware compression is very effective for such files About 100 MiB/s observed over 10k seconds About 100 MiB/s observed over 10k seconds What is this valley here? Explained in the next slide where they are more visible Net throughput to tape versus time No tape migration failures and no retrials observed

07/05/ HEPiX 2008, Geneve Test C: results Similar to Test B, but with real LHCb data files taken from the same sample of Test A instead of zero’ed files Similar to Test B, but with real LHCb data files taken from the same sample of Test A instead of zero’ed files The valleys clearly visible here have a period of exactly 4800 seconds The valleys clearly visible here have a period of exactly 4800 seconds They were also partially present in Test A, but not clearly visible in the plot due to larger binning They were also partially present in Test A, but not clearly visible in the plot due to larger binning The valleys are due to a tunable feature of our interface The valleys are due to a tunable feature of our interface Each migration session is timed out if not finished within 4800 seconds Each migration session is timed out if not finished within 4800 seconds After the timeout GPFS performs a new metadata scan and a new migration session is initiated After the timeout GPFS performs a new metadata scan and a new migration session is initiated 4800 seconds is not a magic number, could be larger or even infinite 4800 seconds is not a magic number, could be larger or even infinite No tape migration failures and no retrials observed Net throughput to tape versus time About 70 MiB/s on average with peaks up to 90 MiB/s

07/05/ HEPiX 2008, Geneve Conclusions and outlook First phase of tests for T1D1 StoRM/GPFS/TSM-based concluded First phase of tests for T1D1 StoRM/GPFS/TSM-based concluded LHCb is now starting the first production experience with such a T1D1 system LHCb is now starting the first production experience with such a T1D1 system Work is ongoing for a T1D0 implementation in collaboration with IBM GPFS and TSM HSM development teams Work is ongoing for a T1D0 implementation in collaboration with IBM GPFS and TSM HSM development teams T1D0 is more complicated since it should include active recalls optimization, concurrence between migrations and recalls, etc. T1D0 is more complicated since it should include active recalls optimization, concurrence between migrations and recalls, etc. IBM will introduce efficient ordered recalls features in the next major release of TSM IBM will introduce efficient ordered recalls features in the next major release of TSM Waiting for that release, in the meanwhile we are implementing it through an intermediate layer of intelligence between GPFS and TSM driven by StoRM Waiting for that release, in the meanwhile we are implementing it through an intermediate layer of intelligence between GPFS and TSM driven by StoRM A first proof of principle prototype already exists, but this is something to be discussed in a future talk… stay tuned! A first proof of principle prototype already exists, but this is something to be discussed in a future talk… stay tuned! New library recently acquired at CNAF New library recently acquired at CNAF Once the new library will be online and old data files will be repacked to the new one, the old library will be devoted entirely to TSM production systems and testbeds Once the new library will be online and old data files will be repacked to the new one, the old library will be devoted entirely to TSM production systems and testbeds About 15 drives, much more realistic and interesting scale than 3 drives About 15 drives, much more realistic and interesting scale than 3 drives