Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFN-T1 site report Giuseppe Misurelli On behalf of INFN-T1 staff HEPiX Spring 2015.

Similar presentations


Presentation on theme: "INFN-T1 site report Giuseppe Misurelli On behalf of INFN-T1 staff HEPiX Spring 2015."— Presentation transcript:

1 INFN-T1 site report Giuseppe Misurelli On behalf of INFN-T1 staff HEPiX Spring 2015

2 Outline Common services Network Farming Storage 23/03/2014Giuseppe Misurelli

3 Common services

4 Installation and configuration CNAF evaluation of new installation and configuration tools is complete. Decision taken to move towards puppet + foreman Quattor is still managing the bigger part of the infrastructure – No upgrades lately 23/03/2014Giuseppe Misurelli

5 Network

6 WAN@CNAF S.Zani6 NEXUSCisco7600 RAL SARA PIC TRIUMPH BNL FNAL TW-ASGC NDGF LHC ONE LHC OPN General IP 40Gb/s 10Gb/s 10 Gb/s CNAF-FNAL CDF (Data Preservation) 40 Gb Physical Link (4x10Gb) Shared by LHCOPN and LHCONE. 10Gb/s 10 Gb/s For General IP Connectivity General IP  20 Gb/s GARR Bo1 GARR Mi1 GARR BO1 IN2P3 Main Tier-2s

7 CNAF WAN Links 4x10 Gb/s LHCOPN+LHCONE (Evolving to 6x10Gb/s) – One link aggregation of 40Gb/s is used for T0-T1(LHCOPN) and T1-T2 (LHCONE) 20Gb/s to CERN are dedicated for T0-T1 and T1-T1 traffic (LHCOPN) CNAF, KIT and IN2P3 last year moved traffic between their TIER1s from LHCOPN to LHCONE (More bandwidth available through GEANT) 10 Gb/s General Purpose (Evolving to 2x10Gb) – General Internet Access for CNAF users – LHC sites not connected to LHCOPN/ONE (T3 and minor T2) – Backup link in case of LHCOPN down – INFN IT National Services 10 Gb/s (5 guaranteed) CNAF-FNAL (LTDP) – Data Transfer terminated (Decommissioning) 23/03/2014Giuseppe Misurelli

8 Farming

9 Computing resources 160K HS-06 Very stable during last period, just a few hardware failures Had to update IPMI firmware to get a signed java console applet. Latest java update does not allow for unsigned applets Renewed LSF contract for whole INFN with Platform/IBM for next 4 years 23/03/2014Giuseppe Misurelli

10 Security Since our last workshop we had to reboot the whole farm twice (2 critical kernel upgrades + glibc) – We use an automatic procedure – this process is slow (have to wait for a WN to completely drain) and causes the farm to provide less computing power 23/03/2014Giuseppe Misurelli

11 CPU tender 2014 tender still not installed (30K HS-06) – Same machine of previous tender – Will be installed shortly – 2014 pledged resources still guaranteed 2015 tender focused on blade solutions (HP, Lenovo, Dell) – Should be a quick procedure, machines available during summer We will be able to dismiss very old computing nodes and hopefully improve our PUE 23/03/2014Giuseppe Misurelli

12 Multicore support Activity carried on within the multicore-tf@CERN INFN-T1 now fully supports MCORE and HIMEM jobs Dynamic partitioning activated on August the 1° Enabled on a subset of farm racks (up to ~45 KHS06, 24 and 16 slots) Production quality, tunable; used by atlas and cms Accounting data properly delivered to Apel 3 python scripts and 2 C programs: - advanced conf LSF and dynamic partitioning logic - details: http://goo.gl/1Hpbt1http://goo.gl/1Hpbt1 23/03/2014Giuseppe Misurelli

13 Testing low power solutions HP Moonshot with m350 cards and external storage – HP probed our WNs in order to determine the best storage solution – Providing us a dl380 as an iSCSI server – M300 cards with internal storage are too expensive according to HP Supermicro microblade – Each blade carries 4 motherboards and 4 discs, less compact but with built-in storage 23/03/2014Giuseppe Misurelli

14 Storage

15 On-Line Storage 23/03/2014Giuseppe Misurelli GPFS (v3.5.0-22) – 15 PB of data in 15 file systems  Each major experiment has its own cluster  All worker nodes in a single “diskless” cluster  Worker nodes accessing file systems via remote mount  Observed I/O rate up to 15 Gb/s  4.5 MB/s for each TB  2015 tender  5PB disk replacement  3PB new disk

16 Near-Line Storage GEMSS: HSM based on TSM (v6.2) and GPFS  19 PB of data on tapes  tape drives of 3 generations: 10kB (1 TB) 15 drives 7200 tapes 10kC (5.5 TB) 13 drives 1400 tapes 10kD (8.5 TB) 9 drives 1500 tapes  12 Tape servers  SAN interconnection to Storage  I/O rate up to 350 MB/s per server  ~300 tape mounts per day

17 DSH: re-pack activities Data repack (10kB-> 10kD) campaign has started at the end of November 2014 3 PB of data has been migrated so far mean migration rate 330MB/s, limited by FC HBA of TSM server New TSM server (with FC16 HBA) in installation phase Plan to remove all 10kB tapes by autumn

18 Backup slides 23/03/2014Giuseppe Misurelli

19 Data Storage and Handling (DSH) ON-line (disk) storage systems Near-line (tape) storage system Data transfer services  StoRM  GridFTP  Xrootd  WebDav Long Time Data Preservation

20 SAN (disk) TAN (tape) WAN LAN GridFTP Xrootd servers GPFS NSD servers HSM servers StoRM servers Computing Farm (~1000 nodes) Disk Storage: Total: 15PB 8 DDN S2A 9900 1 DDN SFA10K 1 DDN SFA12K + EMC 2 boxes for specific use (Database, tape storage stage area, CDF long term data preservation...) Tape Library: Total: 19PB SL8500 8-robot 10000 tape slots 13 T10KC drives 9 T10KD drives (T10KB tech phasing out). Tape Cartridge Capacity 1 T10KD tape = 8,5TB 8 GPFS clusters with 130 NSD servers and 15 PB of GPFS data export ~ 12 PB of data to ~1000 nodes (computing farm) Every worker node can directly access ~12 PB of data shared via GPFS in 11 file systems 12 HSM servers providing data migration between GPFS and TSM 18 Servers providing GridFTP, XrootD and WebDaV (Https) data transfer service via WAN and LAN access to GPFS 18 StoRM servers providing data management interface (SRM) ~ 700 Fibre Channel ports in a single SAN/TAN Fabric 5 GB/s 80 GB/s

21 DSH: Long-time data preservation 3.8 PB of CDF data has been imported from FNAL Bandwidth of 5 Gb/s reserved on transatlantic Link CNAF ↔ FNAL code preservation: CDF legacy software release (SL6) under test Next step: analysis framework  CDF services and analysis computing resources will be instantiated on demand on pre-packaged VMs in a controlled environment


Download ppt "INFN-T1 site report Giuseppe Misurelli On behalf of INFN-T1 staff HEPiX Spring 2015."

Similar presentations


Ads by Google