Luca dell’Agnello INFN-CNAF FNAL, May

Slides:



Advertisements
Similar presentations
Exporting Raw/ESD data from Tier-0 Tier-1s Wrap-up.
Advertisements

INFN-T1 site report Giuseppe Misurelli On behalf of INFN-T1 staff HEPiX Spring 2015.
Storage Area Networks The Basics. Storage Area Networks SANS are designed to give you: More disk space Multiple server access to a single disk pool Better.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Questionaire answers D. Petravick P. Demar FNAL. 7/14/05 DLP -- GDB2 FNAL/T1 issues In interpreting the T0/T1 document how do the T1s foresee to connect.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
INFN – Tier1 Site Status Report Vladimir Sapunenko on behalf of Tier1 staff.
Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Spring 2014.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
100G R&D at Fermilab Gabriele Garzoglio (for the High Throughput Data Program team) Grid and Cloud Computing Department Computing Sector, Fermilab Overview.
Federico Ruggieri INFN-CNAF GDB Meeting 10 February 2004 INFN TIER1 Status.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Storage Tank in Data Grid Shin, SangYong(syshin, #6468) IBM Grid Computing August 23, 2003.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
1 INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff 28 th October 2009.
INFN TIER1 (IT-INFN-CNAF) “Concerns from sites” Session LHC OPN/ONE “Networking for WLCG” Workshop CERN, Stefano Zani
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
Tier1 Andrew Sansum GRIDPP 10 June GRIDPP10 June 2004Tier1A2 Production Service for HEP (PPARC) GRIDPP ( ). –“ GridPP will enable testing.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Fall 2013.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team ATHIC2012, Busan,
INFN-T1 site report Andrea Chierici, Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX spring 2012.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association STEINBUCH CENTRE FOR COMPUTING - SCC
W.A.Wojcik/CCIN2P3, Nov 1, CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center URL:
Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.
INFN-T1 site report Luca dell’Agnello On behalf ot INFN-T1 staff HEPiX Spring 2013.
1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Fall 2015.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
IT-INFN-CNAF Status Update LHC-OPN Meeting INFN CNAF, December 2009 Stefano Zani 10/11/2009Stefano Zani INFN CNAF (TIER1 Staff)1.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
PADME Kick-Off Meeting – LNF, April 20-21, DAQ Data Rate - Preliminary estimate Tentative setup: all channels read with Fast ADC 1024 samples, 12.
Daniele Cesini - INFN CNAF. INFN-CNAF 20 maggio 2014 CNAF 2 CNAF hosts the Italian Tier1 computing centre for the LHC experiments ATLAS, CMS, ALICE and.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Stato dello storage al Tier1 Luca dell’Agnello Mercoledi’ 18 Maggio 2011.
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief.
An Introduction to GPFS
Elastic CNAF Datacenter extension via opportunistic resources INFN-CNAF.
Mass Storage Systems for the Large Hadron Collider Experiments A novel approach based on IBM and INFN software A. Cavalli 1, S. Dal Pra 1, L. dell’Agnello.
S. Pardi Computing R&D Workshop Ferrara 2011 – 4 – 7 July SuperB R&D on going on storage and data access R&D Storage Silvio Pardi
Stato del Tier1 Luca dell’Agnello 11 Maggio 2012.
Validation tests of CNAF storage infrastructure Luca dell’Agnello INFN-CNAF.
Dynamic Extension of the INFN Tier-1 on external resources
Extending the farm to external sites: the INFN Tier-1 experience
Luca dell’Agnello INFN-CNAF
GEMSS: GPFS/TSM/StoRM
Status and plans Giuseppe Lo Re INFN-CNAF 8/05/2007.
INFN Computing infrastructure - Workload management at the Tier-1
Andrea Chierici On behalf of INFN-T1 staff
Update on Plan for KISTI-GSDC
CERN Lustre Evaluation and Storage Outlook
Luca dell’Agnello INFN-CNAF
Project Status Report Computing Resource Review Board Ian Bird
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
Luca dell’Agnello Daniele Cesini GDB - 13/12/2017
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
Cost Effective Network Storage Solutions
Presentation transcript:

Luca dell’Agnello INFN-CNAF FNAL, May 18 2012 The INFN-Tier1 Luca dell’Agnello INFN-CNAF FNAL, May 18 2012

INFN-Tier1 Italian Tier-1 computing centre for the LHC experiments ATLAS, CMS, ALICE and LHCb.... … but also one of the main Italian processing facilities for several other experiments: BaBar and CDF Astro and Space physics VIRGO (Italy), ARGO (Tibet), AMS (Satellite), PAMELA (Satellite) and MAGIC (Canary Islands) And more (e.g. Icarus, Borexino, Gerda etc…)

INFN-Tier1: numbers > 20 supported experiments ~ 20 FTEs 1000 m2 room with capability for more than 120 racks and several tape libraries 5 MVA electrical power Redundant facility to provide 24hx7d availability Within May 2012 resources ready 1300 server with about 10000 cores available 11 PBytes of disk space for high speed access and 14 PBytes on tapes (1 tape library) Aggregate bandwith to storage: ~ 50 GB/s WAN link at 30 Gbit/s 2x10 Gbit/s over OPN With forthcoming GARR-X bandwith increase is expected

Chiller floor CED (new room) Electrical delivery UPS room

Current WAN Connections 7600 GARR 2x10Gb/s 10Gb/s T1-T2’s CNAF General purpose WAN RAL PIC TRIUMPH BNL FNAL TW-ASGC NDFGF Router CERN per T1-T1 LHC-OPN (20 Gb/s) T0-T1 +T1-T1 in sharing Cross Border Fiber (from Milan) CNAF-KIT CNAF-IN2P3 CNAF-SARA T0-T1 BACKUP 10Gb/s 20Gb/s LHC ONE T2’s NEXUS T1 resources 20 Gb phisical Link (2x1Gb) for LHCOPN and LHCONE Connectivity 10 Gigabit Link for General IP connectivity LHCHONE and LHC-OPN are sharing the same phisical ports now but they are managed as two completely different links (different VLANS are used for the point-to-point interfaces). All the TIER2s wich are not connected to LHCONE are reached only via General IP. A 10 Gbps link Geneva-Chicago for GARR/Geant us foreseen before the end of Q2 2012

Future WAN Connection (Q4 2012) 7600 GARR 2x10Gb/s 10Gb/s T1-T2’s CNAF General purpose WAN RAL PIC TRIUMPH BNL FNAL TW-ASGC NDFGF Router CERN per T1-T1 LHC-OPN (20 Gb/s) T0-T1 +T1-T1 in sharing Cross Border Fiber (from Milan) CNAF-KIT CNAF-IN2P3 CNAF-SARA T0-T1 BACKUP 10Gb/s 20Gb/s LHC ONE T2’s NEXUS T1 resources 20 Gb phisical Link (2x1Gb) for LHCOPN and LHCONE Connectivity 10 Gigabit phisical Link to LHCONE (dedicated to T1-T2’s traffic LHCONE) 10 Gigabit Link for General IP connectivity A new 10Gb/s dedcated to LHCONE link will be added.

Computing resources Currently ~ 110K HS06 available We host other sites T2 LHCb (~5%) with shared resources T3 UniBO (~2%) with dedicated resources New tender (June 2012) will add ~ 31K HS06 45 enclosures, 45x4 mb, 192 HS06/mb Nearly constant number of boxes (~ 1000) Farm is nearly always 100% used ~ 8900 job slots ( 9200 job slots) > 80000 jobs/day Quite high efficiency (CPT/WCT ~ 83%) Even with chaotic analysis Installazione CPU 2011 Farm usage (last 12 months WCT/CPT (last 12 months)

Storage resources 2 FTE employed to manage the full system; TOTAL of 8.6 PB on-line (net) disk space 7 EMC2 CX3-80 + 1 EMC2 CX4-960 (~1.8 PB) + 100 servers In phase-out nel 2013 7 DDN S2A 9950 (~7 PB) + ~60 servers Phase-out nel 2015-2016 … and under installation 3 Fujitsu Eternus DX400 S2 (3 TB SATA) : + 2.8 PB Tape library Sl8500 9PB + 5PB (just installed) on line with 20 T10KB drives and 10 T10KC drives 9000 x 1 TB tape capacity, ~ 100MB/s of bandwidth for each drive 1000 x 5 TB tape capacity, ~ 200MB/s of bandwidth for each drive Drives interconnected to library and servers via dedicated SAN (TAN). 13 Tivoli Storage manager HSM nodes access to the shared drives. 1 Tivoli Storage Manager (TSM) server common to all GEMSS instances. All storage systems and disk-servers are on SAN (4Gb/s or 8Gb/s) Disk space is partitioned in several GPFS clusters served by ~100 disk-servers (NSD + gridFTP) 2 FTE employed to manage the full system;

What is GEMSS? GEMSS is the integration of StoRM, GPFS and TSM GPFS parallel file-system by IBM, TSM archival system by IBM GPFS deployed on the SAN implements a full HA system StoRM is an srm 2.2 implementation developed by INFN Already in use at INFN T1 since 2007 and at other centers for the disk-only storage designed to leverage the advantages of parallel file systems and common POSIX file systems in a Grid environment We combined the features of GPFS and TSM with StoRM, to provide a transparent grid-enabled HSM solution. The GPFS Information Lifecycle Management (ILM) engine is used to identify candidates files for migration to tape and to trigger the data movement between the disk and tape pools An interface between GPFS and TSM (named YAMSS) was also implemented to enable tape-ordered recalls

GEMSS resources layout TAPE (14PB avaliable 8.9PB used ) WAN or TIER1 LAN GPFS NSD diskserver ~100 Diskservers with 2 FC connections STK SL8500 robot (10000 slots) 20 T10000B drives 10 T10000C drives GPFS client nodes Farm Worker Nodes (LSF Batch System) for 120 HS-06 i.e 9000 job slot DATA access from the FARM Worker Nodes use the TIER1 LAN. The NSD diskservers use 1 or 10Gb network connections. Fibre Channel (4/8 gb/s) DISK ACCESS Fibre Channel (4/8 gb/s) TAPE ACCESS SAN/TAN Fibre Channel (4/8 gb/s) DISK&TAPE ACCESS TIVOLI STORAGE MANAGER (TSM) HSM nodes Fibre Channel (4/8 gb/s) DISK ACCESS 13 server with triple FC connections 2 FC to the SAN (disk access) 1 FC to the TAN (tape access) 13 GEMSS TSM HSM nodes provide all the DISK  TAPE data migration thought the SAN/TAN fibre channel network. 1 TSM SERVER NODE RUNS THE TSM INSTANCE DISK ~8.4PB net space

CNAF in the grid CNAF is part of the WLCG/EGI infrastructure, granting access to distributed computing and storage resources Access to computing farm via the EMI CREAM Compute Elements Access to storage resources, on GEMSS, via the srm end-points Also “legacy” access (i.e. local access allowed) Cloud paradigm also supported ( see Davide’s slides on WnoDeS)

Building blocks of GEMSS system Disk-centric system with five building blocks GPFS: disk-storage software infrastructure TSM: tape management system StoRM: SRM service TSM-GPFS interface Globus GridFTP: WAN data transfers ILM DATA FILE GEMSS DATA MIGRATION PROCESS StoRM GridFTP GPFS WAN TSM GEMSS DATA RECALL PROCESS WORKER NODE SAN TAN LAN

GEMSS data flow (1/2) Recall TSM DB Migration Storage Area Network HSM-1 HSM-2 TSM-Server TSM DB StoRM GridFTP WAN I/O SRM request Migration Storage Area Network Tape Area Netwok Disk Tape Library

GEMSS data flow (2/2) Sorting Files by Tape SAN TAN 1 2 3 TSM DB SRM request StoRM HSM-1 HSM-2 TSM-Server WAN I/O GridFTP SAN TAN 1 2 3 Disk Tape Library GPFS Server LAN Worker Node

GEMSS layout for a typical Experiment at INFN Tier-1 10x10 Gbps 2x2x2 Gbps 2 GridFTP servers (2 x 10 Gbps on WAN) 8 disk-servers for data (8 x 10 Gbps on LAN) 2 disk-servers for metadata (2 x x Gbps) 2.2 PB GPFS file-system

Aggregate traffic on eth0 network cards (x2) GEMSS in production Running and pending jobs on the farm Running Pending Gbit technology (2009) Using the file protocol (i.e. direct access to the file) Up to 1000 concurrent jobs recalling from tape ~ 2000 files 100% job success rate Up to 1.2 GB/s from the disk pools to the farm nodes Aggregate traffic on eth0 network cards (x2) Farm- CMS storage traffic 10 Gbit technology (since 2010) Using the file protocol Up to 2500 concurrent jobs accessing files on disk ~98% job success rate Up to ~ 6 GB/s from the disk pools to the farm nodes WAN links towards saturation CMS queue (May 15)

Backup slides

Building blocks of GEMSS system Disk-centric system with five building blocks GPFS: disk-storage software infrastructure TSM: tape management system StoRM: SRM service TSM-GPFS interface Globus GridFTP: WAN data transfers ILM DATA FILE GEMSS DATA MIGRATION PROCESS StoRM GridFTP GPFS WAN TSM GEMSS DATA RECALL PROCESS WORKER NODE SAN TAN LAN

GEMSS data flow (1/2) Recall TSM DB Migration Storage Area Network HSM-1 HSM-2 TSM-Server TSM DB StoRM GridFTP WAN I/O SRM request Migration Storage Area Network Tape Area Netwok Disk Tape Library

GEMSS data flow (2/2) Sorting Files by Tape SAN TAN 1 2 3 TSM DB SRM request StoRM HSM-1 HSM-2 TSM-Server WAN I/O GridFTP SAN TAN 1 2 3 Disk Tape Library GPFS Server LAN Worker Node

GEMSS layout for a typical Experiment at INFN Tier-1 10x10 Gbps 2x2x2 Gbps 2 GridFTP servers (2 x 10 Gbps on WAN) 8 disk-servers for data (8 x 10 Gbps on LAN) 2 disk-servers for metadata (2 x x Gbps) 2.2 PB GPFS file-system

Aggregate traffic on eth0 network cards (x2) GEMSS in production Running and pending jobs on the farm Running Pending Gbit technology (2009) Using the file protocol (i.e. direct access to the file) Up to 1000 concurrent jobs recalling from tape ~ 2000 files 100% job success rate Up to 1.2 GB/s from the disk pools to the farm nodes Aggregate traffic on eth0 network cards (x2) Farm- CMS storage traffic 10 Gbit technology (since 2010) Using the file protocol Up to 2500 concurrent jobs accessing files on disk ~98% job success rate Up to ~ 6 GB/s from the disk pools to the farm nodes WAN links towards saturation CMS queue (May 15)

Yearly statistics Aggregate GPFS traffic (file protocol) Aggregate WAN traffic (gridftp) Tape-disk data movement (over the SAN) Mounts/hour

Why GPFS Original idea since the very beginning: we did not like to rely on a tape centric system First think to the disk infrastructure, the tape part will come later if still needed We wanted to follow a model based on well established industry standard as far as the fabric infrastructure was concerned Storage Area Network via FC for disk-server to disk-controller interconnections This lead quite naturally to the adoption of a clustered file- system able to exploit the full SAN connectivity to implement flexible and highly available services There was a major problem at that time: a specific SRM implementation was missing OK, we decided to afford this limited piece of work  StoRM

Basics of how GPFS works The idea behind a parallel file-system is in general to stripe files amongst several servers and several disks This means that, e.g., replication of the same (hot) file in more instances is useless  you get it “for free” Any “disk-server” can access every single device with direct access Storage Area Network via FC for disk-server to disk-controller interconnection (usually a device/LUN is some kind of RAID array) In a few words, all the servers share the same disks, but a server is primarily responsible to serve via Ethernet just some disks to the computing clients If a server fails, any other server in the SAN can take over the duties of the failed server, since it has direct access to its disks All filesystem metadata are saved on disk along with the data Data and metadata are treated simmetrically, striping blocks of metadata on several disks and servers as if they were data blocks No need of external catalogues/DBs: it is a true filesystem

Some GPFS key features Very powerful (only command line, no other way to do it) interface for configuring, administering and monitoring the system In our experience this is the key feature which allowed to keep minimal manpower to administer the system 1 FTE to control every operation (and scaling with increasing volumes is quite flat) Needs however some training to startup, it is not plug and pray… but documentation is huge and covers (almost) every relevant detail 100% POSIX compliant by design Limited amount of HW resources needed (see later for an example) Support for cNFS filesystem export to clients (parallel NFS server solution with full HA capabilities developed by IBM) Stateful connections between “clients” and “servers” are kept alive behind the data access (file) protocol No need of things like “reconnect” at the application level Native HSM capabilities (not only for tapes, but also for multi-tiered disk storage)

GEMSS in production for CMS GEMSS went in production for CMS in October 2009 w/o major changes to the layout only StoRM upgrade, with checksum and authz supportbeing deployed soon also CNAF ➝ T1-US_FNAL CNAF ➝ T2_CH_CAF Good-performance achieved in transfer throughput High use of the available bandwidth (up to 8 Gbps) Verification with Job Robot jobs in different periods shows that CMS workflows efficiency was not impacted by the change of storage system “Castor + SL4” vs “TSM + SL4” vs “TSM + SL5” As from the current experience, CMS gives a very positive feedback on the new system Very good stability observed so far