INFN-T1 site report Luca dell’Agnello On behalf ot INFN-T1 staff HEPiX Spring 2013.

Slides:



Advertisements
Similar presentations
Exporting Raw/ESD data from Tier-0 Tier-1s Wrap-up.
Advertisements

INFN-T1 site report Giuseppe Misurelli On behalf of INFN-T1 staff HEPiX Spring 2015.
Luca dell’Agnello INFN-CNAF FNAL, May
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
INFN – Tier1 Site Status Report Vladimir Sapunenko on behalf of Tier1 staff.
Computing/Tier 3 Status at Panjab S. Gautam, V. Bhatnagar India-CMS Meeting, Sept 27-28, 2007 Delhi University, Delhi Centre of Advanced Study in Physics,
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Spring 2014.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
Federico Ruggieri INFN-CNAF GDB Meeting 10 February 2004 INFN TIER1 Status.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.
PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
1 INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff 28 th October 2009.
HEPix April 2006 NIKHEF site report What’s new at NIKHEF’s infrastructure and Ramping up the LCG tier-1 Wim Heubers / NIKHEF (+SARA)
RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska October 2014 Martin Bly, STFC-RAL.
INFN TIER1 (IT-INFN-CNAF) “Concerns from sites” Session LHC OPN/ONE “Networking for WLCG” Workshop CERN, Stefano Zani
Fabric Monitoring at the INFN Tier1 Felice Rosso on behalf of INFN Tier1 Joint OSG & EGEE Operations WS, Culham (UK)
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Fall 2013.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team ATHIC2012, Busan,
INFN-T1 site report Andrea Chierici, Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX spring 2012.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Status of India CMS Grid Computing Facility (T2-IN-TIFR) Rajesh Babu Muda TIFR, Mumbai On behalf of IndiaCMS T2 Team July 28, 20111Status of India CMS.
BNL Oracle database services status and future plans Carlos Fernando Gamboa, John DeStefano, Dantong Yu Grid Group, RACF Facility Brookhaven National Lab,
W.A.Wojcik/CCIN2P3, Nov 1, CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center URL:
Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Fall 2015.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
Development of a Tier-1 computing cluster at National Research Centre 'Kurchatov Institute' Igor Tkachenko on behalf of the NRC-KI Tier-1 team National.
IT-INFN-CNAF Status Update LHC-OPN Meeting INFN CNAF, December 2009 Stefano Zani 10/11/2009Stefano Zani INFN CNAF (TIER1 Staff)1.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
PADME Kick-Off Meeting – LNF, April 20-21, DAQ Data Rate - Preliminary estimate Tentative setup: all channels read with Fast ADC 1024 samples, 12.
Daniele Cesini - INFN CNAF. INFN-CNAF 20 maggio 2014 CNAF 2 CNAF hosts the Italian Tier1 computing centre for the LHC experiments ATLAS, CMS, ALICE and.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Stato dello storage al Tier1 Luca dell’Agnello Mercoledi’ 18 Maggio 2011.
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
Farming Andrea Chierici CNAF Review Current situation.
Elastic CNAF Datacenter extension via opportunistic resources INFN-CNAF.
G. Russo, D. Del Prete, S. Pardi Frascati, 2011 april 4th-7th The Naples' testbed for the SuperB computing model: first tests G. Russo, D. Del Prete, S.
Mass Storage Systems for the Large Hadron Collider Experiments A novel approach based on IBM and INFN software A. Cavalli 1, S. Dal Pra 1, L. dell’Agnello.
S. Pardi Computing R&D Workshop Ferrara 2011 – 4 – 7 July SuperB R&D on going on storage and data access R&D Storage Silvio Pardi
Stato del Tier1 Luca dell’Agnello 11 Maggio 2012.
Validation tests of CNAF storage infrastructure Luca dell’Agnello INFN-CNAF.
Dynamic Extension of the INFN Tier-1 on external resources
Extending the farm to external sites: the INFN Tier-1 experience
GRID OPERATIONS IN ROMANIA
GEMSS: GPFS/TSM/StoRM
Daniele Cesini – INFN-CNAF - 19/09/2017
NL Service Challenge Plans
INFN Computing infrastructure - Workload management at the Tier-1
Andrea Chierici On behalf of INFN-T1 staff
INFN CNAF TIER1 and TIER-X network infrastructure
Update on Plan for KISTI-GSDC
Luca dell’Agnello INFN-CNAF
The INFN Tier-1 Storage Implementation
Luca dell’Agnello Daniele Cesini GDB - 13/12/2017
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
Cost Effective Network Storage Solutions
Presentation transcript:

INFN-T1 site report Luca dell’Agnello On behalf ot INFN-T1 staff HEPiX Spring 2013

Outline Introduction Network Farming Storage Grid and Middleware User experience Luca dell'Agnello2

3 Tier1 site at INFN-CNAF CNAF is the National Center of INFN (National Institute of Nuclear Physics) for Research and Development into the field of Information Technologies applied to High-Energy physics experiments. Operational since 2005 Luca dell'Agnello

INFN-Tier1 CNAF, is the Italian Tier-1 computing centre for the LHC experiments ATLAS, CMS, ALICE and LHCb.... … but also one of the main Italian processing facilities for several other experiments BaBar and CDF Astro and Space physics VIRGO (Italy), ARGO (Tibet), AMS (Satellite), PAMELA (Satellite) and MAGIC (Canary Islands) And other (e.g. Icarus, Borexino, Gerda etc…) for a total of more than 20 experiments Luca dell'Agnello 4 4

INFN-Tier1: some figures 1000 m 2 room with capability for more than 120 racks and several tape libraries (only one in production) – 5 MVA electrical power – Redundant facility to provide 24hx7d availability 1300 server with about cores available (~ 135 KHS06) ~11.4 PBytes of disk space for high speed access and ~16 PBytes on tapes – tape library capacity currently scalable up to 50 PB – Aggregate bandwidth to storage: ~50 GB/s 10 Gbit technology on LAN – 3x10 Gbps WAN connections (to be upgraded to 5x10 Gbps) ~20 FTEs to manage facilities, network, databases, storage and farm services (including MSS, CEs, SRM end-points, FTS ….) Luca dell'Agnello5

Network

20 Gb phisical Link (2x10Gb) for LHCOPN and LHCONE connectivity 10 Gigabit Link for General IP connectivity + 10Gb/s CNAF FNAL temporary link dedicated to CDF data preservation LHCONE and LHC-OPN are sharing the same physical ports now but they are managed as two completely different links (different VLANS are used for the point-to-point interfaces). All the TIER2s wich are not connected to LHCONE are reached only via General IP. WAN Connections (Q1 2013) GARR 2x10Gb/s 10Gb/s T1-T2’s CNAF General purpose WAN RAL PIC TRIUMPH BNL FNAL TW-ASGC NDFGF Router CERN per T1-T1 LHC-OPN/ONE (20 Gb/s) T0-T1 +T1-T1 in sharing Cross Border Fiber (from Milan) CNAF-KIT CNAF-IN2P3 CNAF-SARA T0-T1 BACKUP 10Gb/s 20Gb/s LHC ONE T2’s NEXUS T1 resources 10Gb/s FNAL (CDF Link) Dedicated to CDF (Data Preservation)

15/04/2013Luca dell'Agnello GARR 2x10Gb/s 10Gb/s T1-T2’s CNAF General purpose WAN RAL PIC TRIUMPH BNL FNAL TW-ASGC NDFGF Router CERN per T1-T1 LHC-OPN/ONE (20 Gb/s) T0-T1 +T1-T1 in sharing Cross Border Fiber (from Milan) CNAF-KIT CNAF-IN2P3 CNAF-SARA T0-T1 BACKUP 10Gb/s 20Gb/s LHC ONE T2’s NEXUS T1 resources 10Gb/s FNAL (CDF Link) Dedicated to CDF (Data Preservation) 20 Gb phisical Link (2x10Gb) for LHCOPN and LHCONE connectivity 10 Gigabit Link for General IP connectivity + 10Gb/s CNAF FNAL temporary link dedicated to CDF Data Preservation LHCOPN/ONE Upgrade to 40 Gb/s (Aprli – May 2013) GENERAL IP Upgrade to 20 Gb/s (Aprli – May 2013) WAN Connection (Q2 2013)

New activities Almost completed transition to Jumbo Frame on LAN – 10gbit disk and gridftp servers already using JF since 2010 – Farm just reconfigured – Reconfiguration of other services on going IPV6 activation on General IP (Q2 2013) – Testing dual stack on dedicated services (gridftp, xrootd, CEs, StoRM end-points….) IPV6 activation on LHCOPN (Q4 2013) – Coordinated effort with HEPIX IPv6 wg – done if the previous testing activity gives good results SDN (First steps..) – OpenFlow Layout test (Q ) Interoperability test on physical switches (on trial) Luca dell'Agnello9

Farming

Computing resources Currently 135K HS-06 in the farm – job slots – HT activated only on subset of the Intel based part of the farm (~10 % of the wns) New tender will add 45K HS-06 – ~4000 new job slots – 65 enclosures, 172 HS06 per mobo – Oldest nodes will be switched off this year Luca dell'Agnello11

2013 CPU tender 2U Supermicro Twin square – Chassis: 827HQ-R1620B (1+1) redundant power supply 12x 3.5" hot-swap SAS/SATA drive trays (3 for each node) Hot-swappable motherboard module – Mobo: H8DGT-HLF Dual AMD Opteron™ 6000 series processors AMD SR SP5100 Chipset Dual-Port Gigabit Ethernet 6x SATA2 3.0 Gbps Ports via AMD SP5100 controller, RAID 0, 1, 10 1x PCI-e 2.0 x16 AMD CPUs (Opteron 6320) 8 cores, 2.8Ghz 2.2 TB SATA hard disks 3,5” 64GB Ram DDR ECC REG 2x1620W 80+ platinum power supply 172HS06 on 32bit, SL6 Luca dell'Agnello12

New activities Investigating Grid Engine as an alternative batch system (to LSF) – Comparison with SLURM (being tested by INFN BARI) Job Packing – See Stefano’s presentation Study of power consumption optimization – New CPU architectures (Atom, ARM etc..) – Software control (DCM by Intel) Luca dell'Agnello13

Storage

Storage resources TOTAL of 11.4 PB on-line (net) disk space 7 EMC 2 CX EMC 2 CX4-960 (~1.9 PB) servers (2x1 gbps connections) 7 DDN S2A DDN SFA (~9.4 PB) + ~60 servers (10 gbps) Aggregate bandwidth: 50 GB/s Tender for additional disk (+ 1.9 PB-N) completed – DDN SFA 12k + 24 servers Tape library Sl8500 ~ 16 PB on line with 20 T10KB drives and 10 T10KC drives (to be increased) – 8800 x 1 TB tape capacity, ~ 100MB/s of bandwidth for each drive – 1200 x 5 TB tape capacity, ~ 200MB/s of bandwidth for each drive – Drives interconnected to library and servers via dedicated SAN (TAN). 13 Tivoli Storage manager HSM nodes access to the shared drives. – 1 Tivoli Storage Manager (TSM) server common to all GEMSS instances Tender to expand library capacity in Q All storage systems and disk-servers are on SAN (4Gb/s or 8Gb/s) 15 Luca dell'Agnello

16 Storage resources layout WAN or TIER1 LAN ~100 Diskservers with 2 FC connections TAPE (16 PB avaliable 14 PB used ) SAN/TAN Farm Worker Nodes (LSF Batch System) for 120 HS-06 i.e 9000 job slot GPFS client nodes Fibre Channel (4/8 gb/s) DISK ACCESS STK SL8500 robot (10000 slots) 20 T10000B drives 10 T10000C drives GPFS NSD diskserver Fibre Channel (4/8 gb/s) DISK ACCESS 13 server with triple FC connections 2 FC to the SAN (disk access) 1 FC to the TAN (tape access) Fibre Channel (4/8 gb/s) TAPE ACCESS Fibre Channel (4/8 gb/s) DISK&TAPE ACCESS DATA access from the FARM Worker Nodes use the TIER1 LAN. The NSD diskservers use 1 or 10Gb network connections. TIVOLI STORAGE MANAGER (TSM) HSM nodes DISK ~11.3 PB net space 13 GEMSS TSM HSM nodes provide all the DISK  TAPE data migration thought the SAN/TAN fibre channel network. 1 TSM SERVER NODE RUNS THE TSM INSTANCE Luca dell'Agnello

Storage configuration All disk space is partitioned in several GPFS clusters served by ~150 servers – One cluster per each main experiment – GPFS deployed on the SAN implements a full HA system – System scalable to tens of PBs and able to serve thousands of concurrent processes with an aggregate bandwidth of tens of GB/s GPFS coupled with TSM offers a complete HSM solution Access to storage granted through standard interfaces (posix, srm, xrootd and webdav) – Fs directly mounted on wns Luca dell'Agnello17

Storage layout for a large experiment at INFN Tier PB GPFS file-system 2 GridFTP servers (2 x 10 Gbps on WAN) 8 disk-servers for data (8 x 10 Gbps on LAN) 2 disk-servers for metadata (2 x 2 Gbps) 10x10 Gbps 2x2x2 Gbps Luca dell'Agnello18

New activities (LTDP) Long Term Data Preservation for CDF – ~5 PB of data to be copied from FNAL – Starting copy in May – 5 Gbps dedicated link CNAF  FNAL – Also “code preservation” issue to be addressed Participation to interdisciplinary LTDP project (PideS) – Proposal submitted Luca dell'Agnello19

Grid and Middleware

Grid Middleware status EMI-2 update status – UIs, Argus, BDII, WNs – Many nodes at SL6 – Some issues with Cream EMI-2/SL6 Upgrading to EMI-3/SL6 – Storm will be deployed on EMI-3 – EMI-1 phasing-out – Only VOBOX left at glite-3.2 Separate queue to test WNs SL6/EMI-2 – One VO left to validate Luca dell'Agnello21

WNoDeS status (Virtual) WNs on demand Distributed with EMI middleware Deployed on part of INFN-T1 farm – Mixed mode allows both virtual and real jobs on the same hv – Auger VO main user for this solution, requiring a direct access to a dedicated mysql server – Also investigating cloud computing Luca dell'Agnello22

User experience

Supported communities Besides the LHC experiments CNAF supports other scientific communities – Directly, through the CNAF User Support Service CDF, Babar, (SuperB,) LHCf, Argo, AMS2, Auger, Icarus, Fermi/GLAST, Pamela, Borexino, Xenon, Gerda, Virgo, CTA, local theoretical physics and astronomy groups – Indirectly, through the Grid projects Different level of computing expertise and heterogeneous computing tools used Trying to attract more user communities Luca dell'Agnello24

User Support Service Team of seven skilled young people plus a coordinator Used to assign support people to experiments, now migrating to a model with more work done in common by the team – Facilitates the transfer of know-how from big to small user communities – Facilitates the integration of new team members Typically 2 to 4 years contracts Independent of the Tier-1 team but working in close contact Luca dell'Agnello25

Backup slides

Running and pending jobs on the farm Running Pending GEMSS in production (1) Gbit technology (2009) – Using the file protocol (i.e. direct access to the file) – Up to 1000 concurrent jobs recalling from tape ~ 2000 files 100% job success rate Up to 1.2 GB/s from the disk pools to the farm nodes 10 Gbit technology (since 2010) – Using the file protocol – Up to concurrent jobs accessing files on disk Up to 10 GB/s from the same fs to the farm nodes WAN links towards saturation Aggregate traffic on eth0 network cards (x2) LHCb user jobs Luca dell'Agnello27

Building blocks of GEMSS system ILM DATA FILE GEMSS DATA MIGRATION PROCESS DATA FILE StoRM GridFTP GPFS DATA FILE WAN DATA FILE TSM DATA FILE GEMSS DATA RECALL PROCESS DATA FILE WORKER NODE SAN TAN LAN SAN Disk-centric system with five building blocks 1. GPFS: disk-storage software infrastructure 2. TSM: tape management system 3. StoRM: SRM service 4. TSM-GPFS interface 5. Globus GridFTP: WAN data transfers Luca dell'Agnello28