INFN-T1 site report Giuseppe Misurelli On behalf of INFN-T1 staff HEPiX Spring 2015.

Slides:



Advertisements
Similar presentations
Exporting Raw/ESD data from Tier-0 Tier-1s Wrap-up.
Advertisements

DPM Italian sites and EPEL testbed in Italy Alessandro De Salvo (INFN, Roma1), Alessandra Doria (INFN, Napoli), Elisabetta Vilucchi (INFN, Laboratori Nazionali.
Luca dell’Agnello INFN-CNAF FNAL, May
Questionaire answers D. Petravick P. Demar FNAL. 7/14/05 DLP -- GDB2 FNAL/T1 issues In interpreting the T0/T1 document how do the T1s foresee to connect.
INFN – Tier1 Site Status Report Vladimir Sapunenko on behalf of Tier1 staff.
RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Site report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2014/12/10Tomoaki Nakamura1.
CC - IN2P3 Site Report Hepix Fall meeting 2009 – Berkeley
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Spring 2014.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
T0/T1 network meeting July 19, 2005 CERN
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
WLCG Service Report ~~~ WLCG Management Board, 24 th November
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
RAL Site Report Martin Bly HEPiX Fall 2009, LBL, Berkeley CA.
INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.
RAL PPD Computing A tier 2, a tier 3 and a load of other stuff Rob Harper, June 2011.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief.
1 INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff 28 th October 2009.
HEPix April 2006 NIKHEF site report What’s new at NIKHEF’s infrastructure and Ramping up the LCG tier-1 Wim Heubers / NIKHEF (+SARA)
RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska October 2014 Martin Bly, STFC-RAL.
INFN TIER1 (IT-INFN-CNAF) “Concerns from sites” Session LHC OPN/ONE “Networking for WLCG” Workshop CERN, Stefano Zani
Connect. Communicate. Collaborate perfSONAR MDM Service for LHC OPN Loukik Kudarimoti DANTE.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Fall 2013.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team ATHIC2012, Busan,
INFN-T1 site report Andrea Chierici, Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX spring 2012.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
STATUS OF KISTI TIER1 Sang-Un Ahn On behalf of the GSDC Tier1 Team WLCG Management Board 18 November 2014.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
INFN-T1 site report Luca dell’Agnello On behalf ot INFN-T1 staff HEPiX Spring 2013.
RAL Site Report HEPiX Spring 2015 – Oxford March 2015 Martin Bly, STFC-RAL.
LCG Storage Workshop “Service Challenge 2 Review” James Casey, IT-GD, CERN CERN, 5th April 2005.
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Fall 2015.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Development of a Tier-1 computing cluster at National Research Centre 'Kurchatov Institute' Igor Tkachenko on behalf of the NRC-KI Tier-1 team National.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
IT-INFN-CNAF Status Update LHC-OPN Meeting INFN CNAF, December 2009 Stefano Zani 10/11/2009Stefano Zani INFN CNAF (TIER1 Staff)1.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Farming Andrea Chierici CNAF Review Current situation.
Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief.
Elastic CNAF Datacenter extension via opportunistic resources INFN-CNAF.
26. Juni 2003Bernd Panzer-Steindel, CERN/IT1 LHC Computing re-costing for for the CERN T0/T1 center.
Stato del Tier1 Luca dell’Agnello 11 Maggio 2012.
Dynamic Extension of the INFN Tier-1 on external resources
Extending the farm to external sites: the INFN Tier-1 experience
Luca dell’Agnello INFN-CNAF
Bob Ball/University of Michigan
INFN CNAF TIER1 Network Service
Daniele Cesini – INFN-CNAF - 19/09/2017
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
INFN Computing infrastructure - Workload management at the Tier-1
Andrea Chierici On behalf of INFN-T1 staff
INFN CNAF TIER1 and TIER-X network infrastructure
Update on Plan for KISTI-GSDC
CERN Lustre Evaluation and Storage Outlook
Luca dell’Agnello INFN-CNAF
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
GridPP Tier1 Review Fabric
Luca dell’Agnello Daniele Cesini GDB - 13/12/2017
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
Cost Effective Network Storage Solutions
IPv6 update Duncan Rand Imperial College London
Presentation transcript:

INFN-T1 site report Giuseppe Misurelli On behalf of INFN-T1 staff HEPiX Spring 2015

Outline Common services Network Farming Storage 23/03/2014Giuseppe Misurelli

Common services

Installation and configuration CNAF evaluation of new installation and configuration tools is complete. Decision taken to move towards puppet + foreman Quattor is still managing the bigger part of the infrastructure – No upgrades lately 23/03/2014Giuseppe Misurelli

Network

S.Zani6 NEXUSCisco7600 RAL SARA PIC TRIUMPH BNL FNAL TW-ASGC NDGF LHC ONE LHC OPN General IP 40Gb/s 10Gb/s 10 Gb/s CNAF-FNAL CDF (Data Preservation) 40 Gb Physical Link (4x10Gb) Shared by LHCOPN and LHCONE. 10Gb/s 10 Gb/s For General IP Connectivity General IP  20 Gb/s GARR Bo1 GARR Mi1 GARR BO1 IN2P3 Main Tier-2s

CNAF WAN Links 4x10 Gb/s LHCOPN+LHCONE (Evolving to 6x10Gb/s) – One link aggregation of 40Gb/s is used for T0-T1(LHCOPN) and T1-T2 (LHCONE) 20Gb/s to CERN are dedicated for T0-T1 and T1-T1 traffic (LHCOPN) CNAF, KIT and IN2P3 last year moved traffic between their TIER1s from LHCOPN to LHCONE (More bandwidth available through GEANT) 10 Gb/s General Purpose (Evolving to 2x10Gb) – General Internet Access for CNAF users – LHC sites not connected to LHCOPN/ONE (T3 and minor T2) – Backup link in case of LHCOPN down – INFN IT National Services 10 Gb/s (5 guaranteed) CNAF-FNAL (LTDP) – Data Transfer terminated (Decommissioning) 23/03/2014Giuseppe Misurelli

Farming

Computing resources 160K HS-06 Very stable during last period, just a few hardware failures Had to update IPMI firmware to get a signed java console applet. Latest java update does not allow for unsigned applets Renewed LSF contract for whole INFN with Platform/IBM for next 4 years 23/03/2014Giuseppe Misurelli

Security Since our last workshop we had to reboot the whole farm twice (2 critical kernel upgrades + glibc) – We use an automatic procedure – this process is slow (have to wait for a WN to completely drain) and causes the farm to provide less computing power 23/03/2014Giuseppe Misurelli

CPU tender 2014 tender still not installed (30K HS-06) – Same machine of previous tender – Will be installed shortly – 2014 pledged resources still guaranteed 2015 tender focused on blade solutions (HP, Lenovo, Dell) – Should be a quick procedure, machines available during summer We will be able to dismiss very old computing nodes and hopefully improve our PUE 23/03/2014Giuseppe Misurelli

Multicore support Activity carried on within the INFN-T1 now fully supports MCORE and HIMEM jobs Dynamic partitioning activated on August the 1° Enabled on a subset of farm racks (up to ~45 KHS06, 24 and 16 slots) Production quality, tunable; used by atlas and cms Accounting data properly delivered to Apel 3 python scripts and 2 C programs: - advanced conf LSF and dynamic partitioning logic - details: 23/03/2014Giuseppe Misurelli

Testing low power solutions HP Moonshot with m350 cards and external storage – HP probed our WNs in order to determine the best storage solution – Providing us a dl380 as an iSCSI server – M300 cards with internal storage are too expensive according to HP Supermicro microblade – Each blade carries 4 motherboards and 4 discs, less compact but with built-in storage 23/03/2014Giuseppe Misurelli

Storage

On-Line Storage 23/03/2014Giuseppe Misurelli GPFS (v ) – 15 PB of data in 15 file systems  Each major experiment has its own cluster  All worker nodes in a single “diskless” cluster  Worker nodes accessing file systems via remote mount  Observed I/O rate up to 15 Gb/s  4.5 MB/s for each TB  2015 tender  5PB disk replacement  3PB new disk

Near-Line Storage GEMSS: HSM based on TSM (v6.2) and GPFS  19 PB of data on tapes  tape drives of 3 generations: 10kB (1 TB) 15 drives 7200 tapes 10kC (5.5 TB) 13 drives 1400 tapes 10kD (8.5 TB) 9 drives 1500 tapes  12 Tape servers  SAN interconnection to Storage  I/O rate up to 350 MB/s per server  ~300 tape mounts per day

DSH: re-pack activities Data repack (10kB-> 10kD) campaign has started at the end of November PB of data has been migrated so far mean migration rate 330MB/s, limited by FC HBA of TSM server New TSM server (with FC16 HBA) in installation phase Plan to remove all 10kB tapes by autumn

Backup slides 23/03/2014Giuseppe Misurelli

Data Storage and Handling (DSH) ON-line (disk) storage systems Near-line (tape) storage system Data transfer services  StoRM  GridFTP  Xrootd  WebDav Long Time Data Preservation

SAN (disk) TAN (tape) WAN LAN GridFTP Xrootd servers GPFS NSD servers HSM servers StoRM servers Computing Farm (~1000 nodes) Disk Storage: Total: 15PB 8 DDN S2A DDN SFA10K 1 DDN SFA12K + EMC 2 boxes for specific use (Database, tape storage stage area, CDF long term data preservation...) Tape Library: Total: 19PB SL robot tape slots 13 T10KC drives 9 T10KD drives (T10KB tech phasing out). Tape Cartridge Capacity 1 T10KD tape = 8,5TB 8 GPFS clusters with 130 NSD servers and 15 PB of GPFS data export ~ 12 PB of data to ~1000 nodes (computing farm) Every worker node can directly access ~12 PB of data shared via GPFS in 11 file systems 12 HSM servers providing data migration between GPFS and TSM 18 Servers providing GridFTP, XrootD and WebDaV (Https) data transfer service via WAN and LAN access to GPFS 18 StoRM servers providing data management interface (SRM) ~ 700 Fibre Channel ports in a single SAN/TAN Fabric 5 GB/s 80 GB/s

DSH: Long-time data preservation 3.8 PB of CDF data has been imported from FNAL Bandwidth of 5 Gb/s reserved on transatlantic Link CNAF ↔ FNAL code preservation: CDF legacy software release (SL6) under test Next step: analysis framework  CDF services and analysis computing resources will be instantiated on demand on pre-packaged VMs in a controlled environment