INFN-T1 site report Andrea Chierici, Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX spring 2012.

Slides:



Advertisements
Similar presentations
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
Advertisements

Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
INFN-T1 site report Giuseppe Misurelli On behalf of INFN-T1 staff HEPiX Spring 2015.
Luca dell’Agnello INFN-CNAF FNAL, May
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
Agenda Network Infrastructures LCG Architecture Management
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
INFN – Tier1 Site Status Report Vladimir Sapunenko on behalf of Tier1 staff.
Computing/Tier 3 Status at Panjab S. Gautam, V. Bhatnagar India-CMS Meeting, Sept 27-28, 2007 Delhi University, Delhi Centre of Advanced Study in Physics,
RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Spring 2014.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
Federico Ruggieri INFN-CNAF GDB Meeting 10 February 2004 INFN TIER1 Status.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.
WNoDeS – Worker Nodes on Demand Service on EMI2 WNoDeS – Worker Nodes on Demand Service on EMI2 Local batch jobs can be run on both real and virtual execution.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
1 INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff 28 th October 2009.
INFN TIER1 (IT-INFN-CNAF) “Concerns from sites” Session LHC OPN/ONE “Networking for WLCG” Workshop CERN, Stefano Zani
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
Fabric Monitoring at the INFN Tier1 Felice Rosso on behalf of INFN Tier1 Joint OSG & EGEE Operations WS, Culham (UK)
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Fall 2013.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team ATHIC2012, Busan,
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Status of India CMS Grid Computing Facility (T2-IN-TIFR) Rajesh Babu Muda TIFR, Mumbai On behalf of IndiaCMS T2 Team July 28, 20111Status of India CMS.
BNL Oracle database services status and future plans Carlos Fernando Gamboa, John DeStefano, Dantong Yu Grid Group, RACF Facility Brookhaven National Lab,
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
BaBar Cluster Had been unstable mainly because of failing disks Very few (
Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.
INFN-T1 site report Luca dell’Agnello On behalf ot INFN-T1 staff HEPiX Spring 2013.
Portuguese Grid Infrastruture(s) Gonçalo Borges Jornadas LIP 2010 Braga, Janeiro 2010.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Fall 2015.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
IT-INFN-CNAF Status Update LHC-OPN Meeting INFN CNAF, December 2009 Stefano Zani 10/11/2009Stefano Zani INFN CNAF (TIER1 Staff)1.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
Daniele Cesini - INFN CNAF. INFN-CNAF 20 maggio 2014 CNAF 2 CNAF hosts the Italian Tier1 computing centre for the LHC experiments ATLAS, CMS, ALICE and.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Stato dello storage al Tier1 Luca dell’Agnello Mercoledi’ 18 Maggio 2011.
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
Farming Andrea Chierici CNAF Review Current situation.
Elastic CNAF Datacenter extension via opportunistic resources INFN-CNAF.
KIT – Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der Helmholtz-Gemeinschaft Steinbuch Centre for Computing
NERSC/LBNL at LBNL in Berkeley October 2009 Site Report Roberto Gomezel INFN 1.
RAL Site Report HEP SYSMAN June 2016 – RAL Gareth Smith, STFC-RAL With thanks to Martin Bly, STFC-RAL.
Stato del Tier1 Luca dell’Agnello 11 Maggio 2012.
Validation tests of CNAF storage infrastructure Luca dell’Agnello INFN-CNAF.
Dynamic Extension of the INFN Tier-1 on external resources
Extending the farm to external sites: the INFN Tier-1 experience
GRID OPERATIONS IN ROMANIA
INFN Computing infrastructure - Workload management at the Tier-1
Andrea Chierici On behalf of INFN-T1 staff
Update on Plan for KISTI-GSDC
Luca dell’Agnello INFN-CNAF
The INFN Tier-1 Storage Implementation
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
The LHCb Computing Data Challenge DC06
Presentation transcript:

INFN-T1 site report Andrea Chierici, Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX spring 2012

Outline Facilities Network Farming Grid and Middleware Storage User experience 2 24-apr-2012 Andrea Chierici

INFN-Tier1: numbers 1000 m 2 room with capability for more than 120 racks and several tape libraries  5 MVA electrical power  Redundant facility to provide 24hx7d availability Within May (thanks to new 2012 tenders) even more resources  1300 servers with more than cores available  11 PBytes of disk space and 14 PBytes on tapes Aggregate bandwith to storage: ~ 50 GB/s WAN link at 30 Gbit/s  2x10 Gbit/s over OPN  With forthcoming GARR-X bandwidth increase is expected > 20 supported experiments ~ 20 FTE (currently evaluating some positions) 3 24-apr-2012 Andrea Chierici

Facilities 4 24-apr-2012 Andrea Chierici

Chiller floor CED (new room) Electrical delivery UPS room 5 24-apr-2012 Andrea Chierici

Tier1 power (5 days) 6 24-apr-2012 Andrea Chierici

APC blocks temp./hum. (5 days) 7 24-apr-2012 Andrea Chierici

Network 8 24-apr-2012 Andrea Chierici

7600 GARR 2x10Gb/s 10Gb/s T1-T2’s CNAF General purpose WAN RAL PIC TRIUMPH BNL FNAL TW-ASGC NDFGF Router CERN per T1-T1 LHC-OPN (20 Gb/s) T0-T1 +T1-T1 in sharing Cross Border Fiber (from Milan) CNAF-KIT CNAF-IN2P3 CNAF-SARA T0-T1 BACKUP 10Gb/s 20Gb/s LHC ONE T2’s NEXUS T1 resources 20 Gb phisical Link (2x10Gb) for LHCOPN and LHCONE Connectivity 10 Gigabit Link for General IP connectivity LHCONE and LHC-OPN are sharing the same physical ports now but they are managed as two completely different links (different VLANS are used for the point- to-point interfaces). All the TIER2s wich are not connected to LHCONE are reached only via General IP. Current WAN Connections (Q1 2012) 9 24-apr-2012 Andrea Chierici

20 Gb physical Link (2x10Gb) for LHCOPN and LHCONE Connectivity 10 Gigabit physical Link to LHCONE (dedicated to T1-T2’s traffic LHCONE) 10 Gigabit Link for General IP connectivity A new 10Gb/s dedcated to LHCONE link will be added GARR 2x10Gb/s 10Gb/s T1-T2’s CNAF General purpose WAN RAL PIC TRIUMPH BNL FNAL TW-ASGC NDFGF Router CERN per T1-T1 LHC-OPN (20 Gb/s) T0-T1 +T1-T1 in sharing Cross Border Fiber (from Milan) CNAF-KIT CNAF-IN2P3 CNAF-SARA T0-T1 BACKUP 10Gb/s 20Gb/s LHC ONE T2’s 10Gb/s NEXUS T1 resources Forthcoming WAN Connection (Q4 2012) apr-2012 Andrea Chierici

Farming apr-2012 Andrea Chierici

Computing resources Currently 120K HS-06  ~9000 job slots New tender will add 31,5K HS-06  ~4000 potential new job slots  41 enclosures, 192 HS-06 per mobo We host other sites  T2 LHCb  T3 UniBO apr-2012 Andrea Chierici

New tender machine 2U Supermicro Twin square  Chassis: 827H-R1400B (1+1) redundant power supply 12x 3.5" hot-swap SAS/SATA drive trays (3 for each node) Hot-swappable motherboard module  Mobo: H8DGT-HF Dual AMD Opteron™ 6000 series processors AMD SR SP5100 Chipset Dual-Port Gigabit Ethernet 6x SATA2 3.0 Gbps Ports via AMD SP5100 controller, RAID 0, 1, 10 1x PCI-e 2.0 x16 AMD CPUs (Opteron 6238) 12 cores, 2,6Ghz 2 2tb sata hard disks 3,5” 40GB Ram w 80gold power supply apr-2012 Andrea Chierici

14 24-apr-2012 Andrea Chierici

Issues for the future We would like to discuss problems with many-cores architectures  1 job per core  RAM per core  Single gigabit ethernet per box  Local disk I/O  Green Computing: how to evaluate during tender procedures apr-2012 Andrea Chierici

Grid and Middleware apr-2012 Andrea Chierici

Middleware status Deployed several EMI nodes  UIs, CreamCEs, Argus, BDII, FTS, Storm, WNs Legacy glite-3.1 phased-out almost completely Planning to completely migrate glite-3.2 nodes to EMI within end of summer Atlas and LHCb switched to cvmfs for software area  Facing small problems with both Tests ongoing on cvmfs server for SuperB apr-2012 Andrea Chierici

WNoDeS: current status WNoDeS is deployed in production for two VOs  Alice: no need for direct access to local storage  Auger: needs a customized environment requiring a direct access to a mysql server. Current version of WNoDeS deployed in production is using the GPFS/NFS gateway apr-2012 Andrea Chierici

WNoDeS: development WNoDeS will be distributed with EMI2 New feature called mixed mode  Mixed mode avoids to statically allocate resources to WNoDeS  A job can be executed on a virtual resource dynamically instantiated by WNoDeS  The same resources (the same hv) can be used to execute standard jobs  No resources overbooking  No hard resource limit enforcement which will be provided using cgroup General improvements apr-2012 Andrea Chierici

Docet (Data Operation Center Tool) DB-based webtool designed and implemented internally In use at various INFN sites.  PostgreSQL, tomcat+java  Cmdline tool (rely on XMLRPC python webservice) Provides inventory for HW, SW, Actions & docs Initially populated by grabbing and cross-relating info from heterogeneous authoritative sources (Quattor, DNS, DHCP, XLS sheets, plaintext files) through a bunch of custom python scripts apr-2012 Andrea Chierici

Docet (T1 tools) Cmdline Tools:  dhcpd.conf handling: preserves consistency (update from/to docet DB to be activated)  Cmdline query/update tool. It provides: re-configuration data for nagios add/list/update HW failures Support for batch operations (insert new HW, remove dismissed HW)  We plan to deploy more docet-based configuration tools apr-2012 Andrea Chierici

Docet (T1 tools) apr-2012 Andrea Chierici

Storage apr-2012 Andrea Chierici

Storage resources 8.4 PB of on-line disk with GEMSS  7 DDN S2A TB SATA for data, 300 GB SAS for metadata  7 EMC 2 CX EMC 2 CX4-960 (1 TB disks)  2012 acquisition: 3 Fujitsu Eternus DX400 S2 (3 TB SATA) Servers  ~32 NSD servers (10 Gbps ethernet) on DDN  ~60 NSD servers (1 Gbps ethernet) on EMC 2 Tape library Sl8500 (14 PB on line) with 20 T10Kb drives and 10 T10Kc drives  9000 x 1 TB tape capacity, 1 Gbps of bandwidth for each drive  1000 x 5 TB tape capacity, 2 Gbps of bandwidth for each drive  Drives interconnected to library and tsm-hsm servers via dedicated SAN (TAN)  TSM server common to all GEMSS instances All storage systems and disk-servers are on SAN (FC4/FC8) apr-2012 Andrea Chierici

GEMSS: Grid Enabled Mass Storage System Integration of GPFS, TSM and StoRM Our choice is driven by need to minimize management effort:  Very positive experience for scalability so far;  Large GPFS installation in production at CNAF since 2005 with increasing disk space and number of users; Over 8 PB of net disk space partitioned in several GPFS clusters served by less than 100 disk-servers (NSD + gridFTP);  2 FTE employed to manage the full system;  All experiments at CNAF (LHC and non-LHC) agreed to use GEMSS as HSM apr-2012 Andrea Chierici

GEMSS evolution Disk-centric system with five building blocks 1. GPFS: disk-storage software infrastructure 2. TSM: tape management system 3. StoRM: SRM service 4. TSM-GPFS interface 5. Globus GridFTP: WAN data transfers apr-2012 Andrea Chierici New component in GEMSS: DMAPI Server  Used to intercept READ events via GPFS DMAPI and re-order recalls according to the files position on tape;  “Preload library” is not needed anymore;  Available with GPFS v.3.x

GEMSS: Timeline D1T0 Storage Class with StoRM/GPFS since Nov. 07 for LHCb and ATLAS D1T1 Storage Class with StoRM/GPFS/TSM since May 08 for LHCb D0T1 Storage Class with StoRM/GPFS/TSM since Oct. 09 for CMS GEMSS is now used by all LHC and non-LHC experiments in production for all Storage Classes ATLAS, ALICE, CMS and LHCb experiments, together with other non-LHC experiments (Argo, Pamela, Virgo, AMS), have agreed to use GEMSS Introduced DMAPI server (to support GPFS 3.3/3.4) apr-2012 Andrea Chierici

User experience apr-2012 Andrea Chierici

Resource usage per VO apr-2012 Andrea Chierici

Jobs Andrea Chierici30 24-apr-2012

LHCb feedback More than 1 million jobs executed  Analysis: 600k, Simulation 400k  User analysis not very efficient (about 50%): too large bandwidth requested available bandwidth for LHCb will be significantly raised with 2012 pledges Stability of the services during last year  Small fraction of failed jobs  Good performance of data access, both from tape and disk Andrea Chierici31 24-apr-2012

CMS feedback 5 Kjobs/day average (50 Kjobs/day peak) Up to 200 TB data transfers per week both in and out Recent tests proved the possibility to work with 10 GB files in a sustained way Andrea Chierici32 24-apr-2012

Alice feedback 6.93×10 6 KSI2k hours consumed in one year of very stable running The cloud infrastructure based on WNoDes provided excellent flexibility in catering to temporary requirements (e.g. large memory queues for PbPb event reconstruction). CNAF holds 20% of ALICE RAW data (530 TB on tape) All data are accessed through an XROOTD interface over the GPFS+TSM underlying file system Andrea Chierici33 24-apr-2012

Questions? apr-2012 Andrea Chierici