CASTOR CNAF TIER1 SITE REPORT Geneve CERN June 2005 Ricci Pier Paolo
13-14 June 2005Geneve CERN2 TIER1 CNAF PRESENTATION Hardware and software status of our CASTOR installation and management tools Usage from LHC experiment of our installation Comments
13-14 June 2005Geneve CERN3 MENPOWER At present there are 4 people at TIER1 CNAF involved in administering our CASTOR installations and front-ends: Ricci Pier Paolo (50% also activity in SAN/NAS HA disk storage management and test, Oracle adm) Lore Giuseppe (50% also activity in ALICE exp. as Tier1 reference, SAN HA disk storage management and test, managing Grid frontend to our resources) Vilucchi Elisabetta (50% involved in Oracle and RLS development and adm. and SAN disk storage management and test) Also we have 1 CNAF FTE working with the development team at CERN (started March 2005) Lopresti
13-14 June 2005Geneve CERN4 HARDWARE STATUS At present our CASTOR ( ) system is: 1 STK L5500 SILOS partitioned with 2 form-factor slots About 2000 slots LTO-2 form About 3500 slots 9940B form 6 LTO-2 DRIVES with 2Gb/s FC interface B DRIVES with 2Gb/s FC interface 2 more have been just acquired END OF JUNE INSTALLED Sun Blade v100 with 2 internal ide disks with software raid-0 running ACSLS LTO-2 Imation TAPES B Imation TAPES
13-14 June 2005Geneve CERN5 HARDWARE STATUS (2) 8 Tapeservers, 1U Supermicro 3 GHz 2GB with 1 Qlogic 2300 F.C. HBA, STK CSC Development Toolkit provided by CERN (with licence agreement with STK) ssi,tpdaemon and rtcpd. The 8 tapeservers are direct connected direcly with the FC drive output: DRIVE LTO-2 0,0,10,0->tapesrv-0.cnaf.infn.it DRIVE LTO-2 0,0,10,1->tapesrv-1.cnaf.infn.it DRIVE LTO-2 0,0,10,2->tapesrv-2.cnaf.infn.it DRIVE LTO-2 0,0,10,3->tapesrv-3.cnaf.infn.it DRIVE LTO-2 0,0,10,4->tapesrv-4.cnaf.infn.it DRIVE LTO-2 0,0,10,5->tapesrv-5.cnaf.infn.it DRIVE 9940B 0,0,10,6->tapesrv-6.cnaf.infn.it DRIVE 9940B 0,0,10,7->tapesrv-7.cnaf.infn.it 2 MORE WILL BE INSTALLED SOON (tapesrv-8 tapesrv-9) with the 2 new 9940B USING THE 9940B have drastically reduced the error rate (we report only one 9940 tape marker RDONLY due to SCSI error and NEVER had “hanged” DRIVES in 6 months of activity).
13-14 June 2005Geneve CERN6 HARDWARE STATUS (3) castor.cnaf.infn.it Central Machine 1 IBM x345 2U machine 2x3GHz Intel Xeon, raid1 with double power supply O.S. Red Hat A.S. 3.0 Machine running all central CASTOR services (Nsdaemon, vmgrdaemon, Cupvdaemon, vdqmdaemon, msgdaemon) and the ORACLE client for the central database castor-4.cnaf.infn.it ORACLE Machine 1 IBM x345 O.S. Red Hat A.S. 3.0 Machine running ORACLE DATABASE 9.i rel 2 2 more x345 machines are in standby and are used for storing all the backup information of the ORACLE db (.exp.dbf) and can be used for replacing the above machines if needed... castor-1.cnaf.infn.it Monitoring Machine 1 DELL 1650 R.H 7.2 Machine running monitoring CASTOR service (Cmon daemon) NAGIOS central service for monitoring and notification. Also contains the command rtstat e tpstat that are usually runned with the –S option over the tapeserver
13-14 June 2005Geneve CERN7 HARDWARE STATUS (4) Stagers with diskserver: 1U Supermicro 3 GHz 2GB with 1 Qlogic 2300 F.C. HBA accessing our SAN and runnig Cdbdaemon, stgdaemon end rfiod.1 STAGER to EACH LHC Experiment disksrv-1.cnaf.infn.it ATLAS stager with 2TB locally disksrv-2.cnaf.infn.it CMS stager with 3.2TB locally disksrv-3.cnaf.infn.it LHCB stager with 3.2TB locally disksrv-4.cnaf.infn.it ALICE stager with 3.2TB locally disksrv-5.cnaf.infn.it TEST and PAMELA stager disksrv-6.cnaf.infn.it stager with 2TB locally (archive purpose LVD,alice TOF,CDF,VIRGO,AMS,BABAR,ARGO and other HEP experiment...) Diskservers: 1U Supermicro 3 GHz 2GB with 1 Qlogic 2300 F.C. HBA accessing our SAN and runnig rfiod. Red Hat 3.0 Cluster has been tested but not used in production for the rfiod.
13-14 June 2005Geneve CERN8 HARDWARE STATUS (5) Storage Element front-end for CASTOR castorgrid.cr.cnaf.infn.it (DNS alias load balaced over 4 machines for WAN gridftp ) SRM v.1 is installed and in production in the above machines.
13-14 June 2005Geneve CERN9 TIER1 INFN CNAF Storage Linux SL 3.0 clients ( nodes) WAN or TIER1 LAN STK180 with 100 LTO-1 (10Tbyte Native) STK L5500 robot (5500 slots) 6 IBM LTO-2, 2 (4) STK 9940B drives PROCOM 3600 FC NAS Gbyte PROCOM 3600 FC NAS Gbyte NAS1,NAS4 3ware IDE SAS Gbyte AXUS BROWIE About 2200 GByte 2 FC interface 2 Gadzoox Slingshot port FC Switch STK BladeStore About GByte 4 FC interfaces Infortrend 4 x 3200 GByte SATA A16F-R1A2-M1 NFS-RFIO-GridFTP oth... W2003 Server with LEGATO Networker (Backup) CASTOR HSM servers H.A. Diskservers with Qlogic FC HBA 2340 IBM FastT900 (DS 4500) 3/4 x GByte 4 FC interfaces 2 Brocade Silkworm port FC Switch Infortrend 5 x 6400 GByte SATA A16F-R1211-M2 + JBOD SAN 2 (40TB) SAN 1 (200TB) + 200TB end of June HSM (400 TB) NAS (20TB) NFS RFIO
13-14 June 2005Geneve CERN10 CASTOR HSM STK L drives LTO2 (20-30 MB/s) 2 drives 9940B (25-30 MB/s) 1300 LTO2 (200 GB native) B (200 GB native) TOTAL CAPACITY with 200GB 250 TB LTO-2 (400TB) 130 TB 9940B (700TB) Sun Blade v100 with 2 internal ide disks with software raid-1 running ACSLS 7.0 OS Solaris CASTOR (CERN)Central Services server RH AS3.0 8 tapeserver Linux RH AS3.0 HBA Qlogic stager with diskserver RH AS TB Local staging area EXPERIMENTStaging area (TB) Tape pool (TB native) ALICE812(LTO-2) ATLAS620(MIXED) CMS21(9940B) LHCb1830(LTO-2) BABAR,AMS+oth24(9940B) Point to Point FC 2Gb/s connections 1 ORACLE 9i rel 2 DB server RH AS or more rfio diskservers RH AS 3.0 min 20TB staging area (variable) SAN 1 WAN or TIER1 LAN SAN 2 Indicates Full rendundancy FC 2Gb/s connections (dual controller HW and Qlogic SANsurfer Path Failover SW)
13-14 June 2005Geneve CERN11 CASTOR Grid Storage Element GridFTP access through the castorgrid SE, a dns cname pointing to 3 server. Dns round-robin for load balancing During LCG Service Challenge2 introduced also a load average selection: every M minutes the ip of the most loaded server is replaced in the cname (see graph)
13-14 June 2005Geneve CERN12 NOTIICATION (Nagios)
13-14 June 2005Geneve CERN13 LHCb CASTOR tape pool # processes on a CMS disk SE eth0 traffic through a CASTOR LCG SE MONITORING (Nagios)
13-14 June 2005Geneve CERN14 DISK ACCOUNTING Pure disk space (TB)CASTOR disk space (TB)
13-14 June 2005Geneve CERN15 CASTOR USAGE The access to the castor system is 1) Grid using our SE frontends (from WAN) 2) Rfio using castor rpm and rfio commands installed on our WN and UI (from LAN) Only the 17% (65TB / 380TB) of the total HSM space was effectively used by the experiments in a 1.5 years period because 1)As TIER1 storage we offer “pure” disk as primary storage over SAN (preferred by the experiments) (GSIftp,nfs,xrootd,bbftp,GPFS ….) 2)The lack of an optimization in parallel stage-in operation (pre-stage) and reliability/performance problem arisen in LTO-2 give in general very bad performance when reading from castor so experiments in general ask for “pure” disk resources (next year requests are NOT for tape HW).
13-14 June 2005Geneve CERN16 COMMENTS As said we have a lot of disk space to manage and no definitive solution (xrootd, gpfs, dcache to be tested etc...) 1) CASTOR have already an SRM interface working. Is CASTOR-2 enough reliable and scalable to manage pure diskpool spaces? We think that it should be conceived also for this use (dcache and xrootd). 2) The limits in the rfio protocol/new stager performance could seriusly limit the potential performance scalability in a pure CASTOR diskpool. (i.e. a single open() calls need to query many database). A single diskserver with rfiod can fulfil only a limited number of request. In our site we have a limited number of diskserver with a big amount of space each (10-15TB) and the limit of the rfiod caused access failure to jobs. (we use rfiod for DIRECT access to local filesystem outside castor i.e. CMS) SOLUTION TO FAILURES=> Possibility to use swap memory SOLUTION TO PERFORMANCE=> More RAM? Other? rfio can be modified to our site-specific use?
13-14 June 2005Geneve CERN17 COMMENTS 1) We need the authorization method in CASTOR to be compatible also with LDAP not only on the password and group files. 2) Useful also include rfstage (or something similar in the official release?) 3) HA. We are planning to use the 2 stand-by machines as HA for CASTOR central services and vdqm replica Oracle 9.i rel 2 stand-by database (dataguard) or RAC
13-14 June 2005Geneve CERN18 CONCLUSION Possible to have collaborations with other groups (in order to expand the dev. team at CERN) TIER1 and LHC computing with IHEP THANK YOU FOR THE ATTENTION!