1 INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff 28 th October 2009
Overview Infrastructure Network Farming Storage 2
Infrastructure 3
4 INFN-T1 2005INFN-T Racks40120 Power sourceUniversityDirectly from supplier (15kV) Power Transformer 1 (~1MVA)3 (~2.5MVA) UPS1 diesel engine/UPS (~640kVA) 2 Rotary UPS (~3400kVA) + 1 diesel engine (~640kVA) Chiller1 (~530kVA)7 (~2740kVA)
5 UPS up to 3,8 MW V 1.4 MW 1 MW Chillers 1.4 MW 1.2 MW
Mechanical and electrical surveillance
Network 7
INFN CNAF TIER1 Network 7600 GARR 2x10Gb/s 10Gb/s Exterme BD x10Gb /s 10Gb/s LHC-OPN dedicated link 10Gb/s T0-T1 (CERN) T1-T1 (PIC,RAL,TRIUMPH) T1-T1’s (BNL,FNAL,TW-ASGC,NDGF) T1-T2’s CNAF General purpose Exterme BD8810 Worker Nodes 2x1Gb/s Extreme Summit450 Extreme Summit450 4x1Gb/ s Extreme Summit450 Worker Nodes 4x1Gb/s 2x10Gb /s Extreme Summit400 Storage Servers Disk Servers Castor Stagers Fiber Channel Storage Devices SAN Extreme Summit400 In Case of network Congestion: Uplink upgrade from 4 x 1Gb/s to 10 Gb/s or 2x10Gb/s LHC-OPN CNAF-KIT CNAF-IN2P3 CNAF-SARA T0-T1 BACKUP 10Gb/s WAN RAL PIC TRIUMPH Cisco NEXUS 7000
Farming 9
New tender 1U Twin solution with these specs: 2 Intel Nehalem 24GB RAM 2x 320 GB SATA rpm, 2x 1Gbps Ethernet 118 twin, reaching HEP-SPEC, measured on SLC44 Delivery and installation foreseen within
Computing resources Including machines from new tender, INFN- T1 computing power will reach HEP- SPEC within 2009 Further increase within January 2010 will bring us to HEP-SPEC Within may 2010, we will reach HEP- SPEC (as we pledged to WLCG) This basically will triple current computing power 11
Resource usage per VO 12
KSI2K pledged vs used 13
New accounting system Grid, local and overall job visualization Tier1/Tier2 separation Several parameters monitored avg and max RSS, avg and max Vmem added in latest release KSI2K/HEP-SPEC accounting WNoD accounting Available at: Feedback welcome to: 14
New accounting: sample picture 15
GPU Computing (1) We are investigating GPU computing NVIDIA Tesla C1060, used for porting software and performing comparison tests py?confId=266, meeting with Bill Dally (chief scientist and vice president of NVIDIA). py?confId=266 16
GPU Computing (2) Applications currently tested: Bioinformatics: CUDA-based paralog filtering in Expressed Sequence Tag clusters Physics: Implementing a second order electromagnetic particle in cell code on the CUDA architecture Physics: Spin-Glass Monte Carlo Simulations First two apps showed more than 10x increase in performance!! 17
GPU Computing (3) We plan to buy 2 more workstations in 2010, with 2 GPU each. We wait for the FERMI architecture, foreseen for spring 2010 We will continue the activities currently ongoing and will probably test some monte carlo simulations for superB We plan to test selection and shared usage of GPUs via grid 18
Storage 19
tenders Disk tender requested Baseline: 3.3 PB raw (~ 2.7 PB-N) 1 st option: 2.35 PB raw (~ 1.9 PB-N) 2 nd option: 2 PB raw (~ 1.6 PB-N) Options to be requested during Q2 and Q New disk in production ~ end of Q tapes (~ 4 PB) acquired with library tender 4.9 PB needed beginning of 2010 7.7 PB probably needed by half 2010
21 To be upgraded to Srm v 2.2 end-points available Supported protocols: rfio, gridftp Still cumbersome to manage requires frequent intervention in the Oracle db Lack of management tools CMS migrated to StoRM for D0T1
22 WLCG Storage Classes at INFN-T1 today Storage Class – offer different levels of storage quality (e.g. copy on disk and/or on tape) DnTm = n copies on disk and m copies on tape Implementation of 3 Storage Classes needed for WLCG (but usable also by non-LHC experiments) Disk0-Tape1 (D0T1) or “custodial nearline” Data migrated to tapes and deleted from disk when staging area full Space managed by system Disk is only a temporary buffer Disk1-Tape0 (D1T0) “replica online” Data kept on disk: no tape copy Space managed by VO Disk1-Tape1 (D1T1) “custodial online” Data kept on disk AND one copy kept on tape Space managed by VO (i.e. if disk is full, copy fails) Currently CASTOR Currently GPFS/TSM + StoRM
23 YAMSS: present status Yet Another Mass Storage System Scripting and configuration layer to interface GPFS&TSM Can work driven by StoRM or stand-alone Experiments not using the SRM model can work with it GPFS-TSM (no StoRM) interface ready Full support for migrations and tape ordered recalls StoRM StoRM in production at INFN-T1 and in other centres around the world for “pure” disk access (i.e. no tape) integration with YAMSS for migrations and tape ordered recalls ongoing (almost completed) Bulk migrations and recalls tested with a typical use case (stand-alone YAMSS, without StoRM) Weekly production workflow of the CMS experiment
24 Why GPFS&TSM Tivoli Storage Manager (developed by IBM) is a tape oriented storage manager widely used (also in HEP world, e.g. FZK) Built-in functionality present in both products to implement backup and archiving from GPFS. The development of a HSM solution is based on the combination of features of GPFS (since v.3.2) and TSM (since v.5.5). Since GPFS v.3.2 the new concept of “external storage pool” extends use of policy driven Information Lifecycle Management (ILM) to tape storage. External pools are real interfaces to external storage managers, e.g. HPSS or TSM HPSS very complex (no benefits in this sense compared to CASTOR)
25 YAMSS: hardware set-up 20x4 Gbps ~ 500 TB for GPFS on CX GridFTP servers (4x2 Gbps) 6 NSD servers (6x2 Gbps) on LAN HSM STA HSM STA HSM STA 8x4 Gbps 3x4 Gbps db 8x4 Gbps 8 tape drives T10KB: - 1 TB per tape, - 1 Gbps per drive TAN SAN TSM server 4 Gbps FC
YAMSS: validation tests Concurrent access in r/w to MSS for transfers and from farm StoRM not used in these tests 3 HSM nodes serving 8 T10KB drives 6 drives (at maximum) used for recalls 2 drives (at maximum) used for migrations Order of 1GB/s of aggregated traffic 26 ~550 MB/s from tape to disk ~100 MB/s from disk to tape ~400 MB/s from disk to the computing nodes (not shown in this graph)
Questions? 27