AMS02 Data Volume, Staging and Archiving Issues AMS Computing Meeting CERN April 8, 2002 Alexei Klimentov
A.Klimentov AMS Computing Meeting, CERN, Apr Outline AMS 02 data volume AMS/CASPUR Technical Meeting – Mar 2002 Projected Characteristics for disks, processors and tapes AMS data storage issues
A.Klimentov AMS/CASPUR Technical Meeting, Bologna, Mar AMS Data Volume (Tbytes) Data/ Year Total Raw ESD Tags Data& ESD MC Grand Total ~400 STS91 AMS02 on ISS
A.Klimentov AMS Computing Meeting, CERN, April AMS/CASPUR technical meeting, Bologna Mar, 2002 Participants : V.Bindi,M.Boschini, D.Casadei, A.Contin, V.Choutko, A.Klimentov, A.Maslennikov, F.Palmonari, PG.Rancoita, PP.Ricci, C.Sbara, P.Zuccon Topic : Archiving and staging strategy, AMS02 data volume To propose coherent scheme for AMS data storage in SOC and Remote center(s). Possible solutions : - disks servers - staging (tapes+disks) - outsourcing (CASTOR)
A.Klimentov AMS Computing Meeting, CERN, Apr Staging - Staging system is a generic name for a tape-to-disk migration tool. The files are migrated by user before they are about to be accessed on the disk. Migration of the disk files to tape may be automatic or manual. - Older known staging implementations required the user to keep track of his/her tape files (old CERN staging) - CASPUR flavour (in production since 1997) does the tape/file bookkeeping on behalf of the user. It uses NFS, and features a fairly easy installation and management. - CASTOR (CERN, project started in 1999) gives a user an option to migrate files both manually, and via the specially modified I/O calls from within a program. Uses a fast data transfer protocol (RFIO). Installed and maintained by CERN IT since 2000, currently used by COMPASS to store raw data and ESD, also ALICE and CMS made I/O tests. Currently the primary option for LHC experiments.
A.Klimentov AMS Computing Meeting, CERN Apr Projected characteristics for disks, processors and tapes Components Intel/AMD PC Dual-CPU Intel PII, rated at 450 MHz, 512 MB RAM. 7.5 kUS$ Dual-CPU Intel, Rated at 2.2 GHz, 1GB RAM and SCSI and IDE RAID controllers 7 kUS$ Dual-CPU rated at 8GHz, 2GB RAM and IDE RAID controller 5 kUS$ Magnetic disk 18 GByte SCSI 80 US$/Gbyte SG 180 GByte SCSI 10 US$/Gbyte WD 120 Gbyte IDE 2 US$/Gbyte IDE-FC 5.5 US$/Gbyte SCSI 700 Gbyte 2 US$/Gbyte IDE 800 Gbyte 0.6 US$/Gbyte IDE-FC 1.3 US$ /Gbyte Magnetic tape DLT 40 GB compressed 3 US$/Gbyte SDLT and LTO 200 GB compressed 0.8 US$/Gbyte ? 600 GB compressed 0.3 US$/Gbyte
A.Klimentov AMS Computing Meeting, CERN, April AMS staging and archiving system : requirements and considerations Storage strategy might be different for raw, ESD and MC data. All data must be archived. At least two copies of raw and ESD are required. I believe that data must be under control of AMS collaboration Archiving system should be scalable and independent from the HW technology Data Volume TB TB Throughput 2TB/day 23MB/sec
A.Klimentov AMS Computingl Meeing, CERN, Apr Cost estimation (I) Disks servers TB/Year RAID5 2.1 TB / server 3-4 servers/ year 23.3 kUS$/server/ % disk’s price drop/year, migration to IDE disks system 197 kUS$/total 6.2 US$/GByte 2006 and beyond 100 TB/Year RAID5 5.6 TB /server servers/year 8.8 kUS$/server/2006 50% disk’s price drop/year 411 kUS$/total 1.4 US$/GB
A.Klimentov AMS Computing Meeting, CERN, Apr Cost estimation (II) Staging TB/Year LTO Library 58 kUS$ 2 servers 10 kUS$ Cartridg./year FC switch 15 kUS$ 0.8 TB disks/year 30% IDE-FC disk’s price drop/year 111 kUS$/total 3.5 US$/GB 2006 and beyond 100 TB/Year LTO Libray/ biennial 2 servers / year Cartridg. /year FC switch 15 kUS$ 10 TB disks/ year 30% IDE-FC disk’s price drop/year 300 kUS$/total 1 US$/GB
A.Klimentov AMS Computing Meeting, CERN, Apr Cost estimation (III) Castor TB/Year 1.8 US$/GByte 57.6 kUS$/total 2006 and beyond 100 TB/Year 0.8 US$/GByte 240 kUS$/total
A.Klimentov AMS Computing Meeting, CERN, Apr Storage Solution (Summary) DiskServersStagingCASTOR System Complexity High : 50 servers 0.5 PB online Medium : 8 servers 0.05 PB online Low Cost kUS$ (2002/2006) 197/411111/ /240 Data accessReal-time5-10 mins delay mins delay Manpower 0.5 FTE 0.1 FTE System availability (short/long term ) fall 2002/ 2005 (R&D req) May 2002/ ? Special IssueAMS controlled CERN & AMS controlled
A.Klimentov AMS Computing Meeting, CERN, Apr Conclusion CASTOR might be the best solution for the short term and MC data storage, CERN central maintenance is one of its advantages. I won’t suggest to use CASTOR for AMS critical applications and one should note that due to CERN budget cut the cost/GB can be changed for non-CERN experiments and the priority always will be given to LHC groups. Disk Servers solution is still too expensive to store ALL data, it also increases the complexity of the system (even if one assumes that the same servers will be used for data processing), for the Raw data and selected ESD it might be the way how we will proceed Staging system represents the most cost/efficient solution for a case when AMS maintain full control of data. For the experiment lifetime the overall cost of staging system will be only 25% higher when the CASTOR. R&D requires to prove “CASPUR system” scalability to hundreds of Tbytes data volume and multi- servers/data movers proccesses.