The Italian Tier-1: INFN-CNAF Andrea Chierici, on behalf of the INFN Tier1 3° April 2006 – Spring HEPIX.

Slides:



Advertisements
Similar presentations
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
Advertisements

Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003.
“A prototype for INFN TIER-1 Regional Centre” Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR La Biodola, 8 Maggio 2002.
INFN CNAF TIER1 Castor Experience CERN 8 June 2006 Ricci Pier Paolo
INFN-T1 site report Giuseppe Misurelli On behalf of INFN-T1 staff HEPiX Spring 2015.
Luca dell’Agnello INFN-CNAF FNAL, May
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
INFN – Tier1 Site Status Report Vladimir Sapunenko on behalf of Tier1 staff.
INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF.
Computing/Tier 3 Status at Panjab S. Gautam, V. Bhatnagar India-CMS Meeting, Sept 27-28, 2007 Delhi University, Delhi Centre of Advanced Study in Physics,
INFN Tier1 Andrea Chierici INFN – CNAF, Italy LCG Workshop CERN, March
Site report: CERN Helge Meinhard (at) cern ch HEPiX spring CASPUR.
Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.
Soluzioni HW per il Tier 1 al CNAF Luca dell’Agnello Stefano Zani (INFN – CNAF, Italy) III CCR Workshop May
October, Site Report Roberto Gomezel INFN.
UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Spring 2014.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
Federico Ruggieri INFN-CNAF GDB Meeting 10 February 2004 INFN TIER1 Status.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
Tier1 status at INFN-CNAF Giuseppe Lo Re INFN – CNAF Bologna Offline Week
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
1 INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff 28 th October 2009.
HEPix April 2006 NIKHEF site report What’s new at NIKHEF’s infrastructure and Ramping up the LCG tier-1 Wim Heubers / NIKHEF (+SARA)
Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.
October, HEPiX Fall 2005 at SLACSLAC Site Report Roberto Gomezel INFN.
KOLKATA Grid Site Name :- IN-DAE-VECC-02Monalisa Name:- Kolkata-Cream VO :- ALICECity:- KOLKATACountry :- INDIA Shown many data transfers.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
Fabric Monitoring at the INFN Tier1 Felice Rosso on behalf of INFN Tier1 Joint OSG & EGEE Operations WS, Culham (UK)
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
SA1 operational policy training, Athens 20-21/01/05 Presentation of the HG Node “Isabella” and operational experience Antonis Zissimos Member of ICCS administration.
CASTOR CNAF TIER1 SITE REPORT Geneve CERN June 2005 Ricci Pier Paolo
BNL Oracle database services status and future plans Carlos Fernando Gamboa, John DeStefano, Dantong Yu Grid Group, RACF Facility Brookhaven National Lab,
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
The Italian Tier-1: INFN-CNAF 11-Oct-2005 Luca dell’Agnello Davide Salomoni.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Fall 2015.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
IT-INFN-CNAF Status Update LHC-OPN Meeting INFN CNAF, December 2009 Stefano Zani 10/11/2009Stefano Zani INFN CNAF (TIER1 Staff)1.
Storage at TIER1 CNAF Workshop Storage INFN CNAF 20/21 Marzo 2006 Bologna Ricci Pier Paolo, on behalf of INFN TIER1 Staff
PADME Kick-Off Meeting – LNF, April 20-21, DAQ Data Rate - Preliminary estimate Tentative setup: all channels read with Fast ADC 1024 samples, 12.
Daniele Cesini - INFN CNAF. INFN-CNAF 20 maggio 2014 CNAF 2 CNAF hosts the Italian Tier1 computing centre for the LHC experiments ATLAS, CMS, ALICE and.
TRIUMF Site Report for HEPiX, JLAB, October 9-13, 2006 – Corrie Kost TRIUMF SITE REPORT Corrie Kost & Steve McDonald Update since Hepix Spring 2006.
Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Validation tests of CNAF storage infrastructure Luca dell’Agnello INFN-CNAF.
INFN Site Report R.Gomezel October 9-13, 2006 Jefferson Lab, Newport News.
status, usage and perspectives
Luca dell’Agnello INFN-CNAF
INFN CNAF TIER1 Network Service
Status and plans Giuseppe Lo Re INFN-CNAF 8/05/2007.
Andrea Chierici On behalf of INFN-T1 staff
Luca dell’Agnello INFN-CNAF
The INFN TIER1 Regional Centre
The INFN Tier-1 Storage Implementation
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
ACAT 2007 April Nikhef Amsterdam
Experience with GPFS and StoRM at the INFN Tier-1
Storage resources management and access at TIER1 CNAF
Presentation transcript:

The Italian Tier-1: INFN-CNAF Andrea Chierici, on behalf of the INFN Tier1 3° April 2006 – Spring HEPIX

Andrea Chierici - INFN-CNAF2 3rd April 2006 Introduction Location: INFN-CNAF, Bologna (Italy)  one of the main nodes of the GARR network  Hall in the basement (floor -2): ~ 1000 m 2 of total space  Easily accessible with lorries from the road  Not suitable for office use (remote control mandatory) Computing facility for the INFN HENP community  Partecipating to LCG, EGEE, INFNGRID projects Multi-Experiment TIER1 (22 VOs, including LHC experiments, CDF, BABAR, and others)  Resources are assigned to experiments on a yearly basis

Andrea Chierici - INFN-CNAF3 3rd April 2006 Infrastructure (1) Electric power system (1250 KVA)  UPS: 800 KVA (~ 640 KW) needs a separate room Not used for the air conditioning system  Electric Generator: 1250 KVA (~ 1000 KW) Theoretically suitable for up to 160 racks (~100 with 3.0 GHz Xeon) 220 V mono-phase (computers)  4 x 16A PDU needed for 3.0 GHz Xeon racks 380 V three-phase for other devices (tape libraries, air conditioning, etc…)  Expansion under evaluation The main challenge is the electrical/cooling power needed in 2010  Currently, we have mostly Intel 110 Watt/KspecInt, with quasi- linear increase in Watt/SpecInt  Next generation chip consumption is 10% less E.g. Opteron Dual Core ~factor less ?

Andrea Chierici - INFN-CNAF4 3rd April 2006 Infrastructure (2) Cooling  RLS (Airwell) on the roof ~530 KW cooling power Water cooling Need “booster pump” (20 mts T1  roof) Noise insulation needed on the roof  1 UTA (air conditioning unit) 20% of RLS refreshing power and controls humidity  14 UTL (local cooling systems) in the computing room (~30 KW each) New control and alarm systems (including cameras to monitor the hall)  Circuit cold water temperature  Hall temperature  Fire  Electric power transformer temperature  UPS, UTL, UTA

Andrea Chierici - INFN-CNAF5 3rd April 2006 WN typical Rack Composition Power Controls (3U)  Power switches 1 network switch (1-2U)  48 FE copper interfaces  2 GE fiber uplinks ~36 1U WNs  Connected to network switch via FE  Connected to KVM system

Andrea Chierici - INFN-CNAF6 3rd April 2006 Remote console control Paragon UTM8 (Raritan)  8 Analog (UTP/Fiber) output connections  Supports up to 32 daisy chains of 40 nodes (UKVMSPD modules needed)  IP-reach (expansion to support IP transport) evaluated but not used  Used to control WNs Autoview 2000R (Avocent)  1 Analog + 2 Digital (IP transport) output connections  Supports connections up to 16 nodes Optional expansion to 16x8 nodes  Compatible with Paragon (“gateway” to IP)  Used to control servers IPMI  New acquisitions (Sunfire V20z) have IPMI v2.0 built-in. IPMI is expected to take over other remote console methods in the middle term

Andrea Chierici - INFN-CNAF7 3rd April 2006 Power Switches  2 models used:  “Old” APC MasterSwitch Control Unit AP9224 controlling 3 x 8 outlets 9222 PDU from 1 Ethernet  “New” APC PDU Control Unit AP7951 controlling 24 outlets from 1 Ethernet  “zero” Rack Unit (vertical mount)  Access to the configuration/control  menu via serial/telnet/web/snmp  Dedicated machine using APC Infrastructure Manager Software  Permits remote switching-off of resources in case of serious problems

Andrea Chierici - INFN-CNAF8 3rd April 2006 Networking (1) Main network infrastructure based on optical fibres (~20 Km) LAN has a “classical” star topology with 2 Core Switch/Router (ER16, BD)  Migration to Black Diamond with 120 GE and 12x10GE ports (it can scale up to 480 GE or 48x10GE) soon  Each CPU rack equipped with FE switch with 2xGb uplinks to core switch  Disk servers connected via GE to core switch (mainly fibre) Some servers connected with copper cables to a dedicated switch  VLAN’s defined across switches (802.1q)

Andrea Chierici - INFN-CNAF9 3rd April 2006 Networking (2) 30 rack switches (14 switches 10Gb Ready): several brands, homogeneous characteristics  48 Copper Ethernet ports  Support of main standards (e.g q)  2 Gigabit up-links (optical fibres) to core switch CNAF interconnected to GARR-G backbone at 1 Gbps + 10 Gbps for SC4  GARR Giga-PoP co-located  SC link to 10 Gbps  New access router (Cisco 7600 with 4x10GE and 4xGE interfaces) just installed

Andrea Chierici - INFN-CNAF10 3rd April 2006 WAN connectivity GARR T1 BD CISCO 7600 LAN CNAF Juniper GARR 10 Gbps 1 Gbps (10 soon) default Link LHCOPN default GEANT 10 Gbps MEPHI

Andrea Chierici - INFN-CNAF11 3rd April 2006 Hardware Resources CPU:  ~600 XEON bi-processor boxes 2.4 – 3 GHz  150 Opteron biprocessor boxes 2.6 GHz ~1600 KSi2k Total Decommissioned ~100 WNs (~150 KSi2K) moved to test farm  New tender ongoing (800 KSI2k) – exp. delivery Fall 2006 Disk:  FC, IDE, SCSI, NAS technologies  470 TB raw (~430 FC-SATA) 2005 tender: 200 TB raw  Requested approval for new tender (400 TB) – exp. Delivery Fall 2006 Tapes:  Stk L TB  Stk LTO-2 with 2000 tapes  400 TB B with 800 tapes  160 TB

Andrea Chierici - INFN-CNAF12 3rd April 2006 CPU Farm Farm installation and upgrades centrally managed by Quattor 1 general purpose farm (~750 WNs, 1600 KSI2k)  SLC 3.0.x, LCG 2.7  Batch system: LSF 6.1 Accessible both from Grid and locally  ~2600 CPU slots available 4 CPU slots/Xeon biprocessor (HT) 3 CPU slots/Opteron biprocessor  22 experiments currently supported Including special queues like infngrid, dteam, test, guest  24 InfiniBand-based WNs for MPI on a special queue Test farm on phased-out hardware (~100 WNs, 150 KSI2k)

Andrea Chierici - INFN-CNAF13 3rd April 2006 LSF At least one queue per experiment  Run and Cpu limits configured for each queue Pre-exec script with report  Verify software availability and disk space on execution host on demand Scheduling based on fairshare  Cumulative CPU time history (30 days) No resources granted  Inclusion of legacy farms completed  Maximization of CPU slots usage

Andrea Chierici - INFN-CNAF14 3rd April 2006 Farm usage Last month Last day Available CPU slots See presentation on monitoring and accounting on Wednesday for more details ~ 2600 S

Andrea Chierici - INFN-CNAF15 3rd April 2006 User Access T1 users are managed by a centralized system based on kerberos (authc) & LDAP (authz) Users are granted access to the batch system if they belong to an authorized Unix group (i.e. experiment/VO)  Groups centrally managed with LDAP  One group for each experiment Direct user logins not permitted on the farm Access from the outside world via dedicated hosts  New anti-terrorism law making access to resources more complicated to manage

Andrea Chierici - INFN-CNAF16 3rd April 2006 Grid access to INFN-Tier1 farm Tier1 resources can still be accessed both locally and via grid  Actively discouraging local access Grid gives opportunity to access transparently not only Tier1 but also other INFN resources  You only need a valid X.509 certificate INFN-CA ( for INFN peoplehttp://security.fi.infn.it/CA/  Request access on a Tier1 UI  More details on it.cnaf.infn.it/index.php?jobsubmit&type=1http://grid- it.cnaf.infn.it/index.php?jobsubmit&type=1

Andrea Chierici - INFN-CNAF17 3rd April 2006 Storage: hardware (1) Linux SL 3.0 clients ( nodes) WAN or TIER1 LAN STK180 with 100 LTO-1 (10Tbyte Native) STK L5500 robot (5500 slots) 6 IBM LTO-2, 4 STK 9940B drives PROCOM 3600 FC NAS Gbyte PROCOM 3600 FC NAS Gbyte NAS1,NAS4 3ware IDE SAS Gbyte AXUS BROWIE About 2200 GByte 2 FC interface 2 Gadzoox Slingshot port FC Switch STK BladeStore About GByte 4 FC interfaces Infortrend 4 x 3200 GByte SATA A16F-R1A2-M1 NFS-RFIO-GridFTP oth... W2003 Server with LEGATO Networker (Backup) CASTOR HSM servers H.A. Diskservers with Qlogic FC HBA 2340 IBM FastT900 (DS 4500) 3/4 x GByte 4 FC interfaces 2 Brocade Silkworm port FC Switch Infortrend 5 x 6400 GByte SATA A16F-R1211-M2 + JBOD SAN 2 (40TB) SAN 1 (200TB) HMS (400 TB) NAS (20TB) NFS RFIO

Andrea Chierici - INFN-CNAF18 3rd April 2006 Storage: hardware (2) All problems now solved (after many attempts!)  Firmware upgrade Aggregate throughput 300 MB/s for each Flexline 16 Diskservers with dual Qlogic FC HBA 2340 Sun Fire U20Z dual Opteron 2.6GHZ DDR 400MHz 4 x 1GB RAM SCSIU320 2 x 73 10K Brocade Director FC Switch (full licenced) with 64 port (out of 128) 4 Flexline 600 with 200TB RAW (150TB) RAID x 2GB redundand connections to the Switch

Andrea Chierici - INFN-CNAF19 3rd April 2006 DISK access A1A2B1B2 Generic Diskserver Supermicro 1U 2 Xeon 3.2 Ghz 4GB Ram,GB eth. 1 or 2 Qlogic 2300 HBA Linux AS or CERN SL 3.0 OS WAN or TIER1 LAN 2 Brocade Silkworm port FC Switch ZONED (50TB Unit with 4 Diskservers) 1 or 2 2Gb FC connections every Diskserver 2 x 2GB Interlink connections 50 TB IBM FastT 900 (DS 4500) Dual redundant Controllers (A,B) Internal MiniHub (1,2) 2Gb FC connections FC Path Failover HA: Qlogic SANsurfer IBM or STK Rdac for Linux 4 Diskservers every 50TB Unit: every controller can perform a maximum of 120MByte/s R-W F1F2 FARM racks Application HA: NFS server, rfio server with Red Hat Cluster AS 3.0(*) GPFS with configuration NSD Primary Secondary /dev/sdaPrimary Diskserver 1; Secondary Diskserver2 /dev/sdbPrimary Diskserver 2; Secondary Diskserver3 (*) tested but not used in production yet GB Eth. connections: nfs,rfio,xrootd,GPFS, GRID ftp TB Logical Disk LUN0 LUN1... LUN0 => /dev/sda LUN1 => /dev/sdb... RAID5

Andrea Chierici - INFN-CNAF20 3rd April 2006 CASTOR disk space CASTOR HMS system (1)  STK 5500 library 6 x LTO2 drives 4 x 9940B drives 1300 LTO2 (200 GB) tapes B (200 GB) tapes  Access CASTOR file system hides tape level Native access protocol: rfio srm interface for grid fabric available (rfio/gridftp)  Disk staging area Data migrated to tapes and deleted from staging area when full  Migration to CASTOR-2 ongoing CASTOR-1 support ending around Sep 2006

Andrea Chierici - INFN-CNAF21 3rd April 2006 CASTOR HMS system (2) 8 or more rfio diskservers RH AS 3.0 min 20TB staging area SAN 1 Point to Point FC 2Gb/s connections 8 tapeserver Linux RH AS3.0 HBA Qlogic CASTOR (CERN)Central Services server RH AS3.0 1 ORACLE 9i rel 2 DB server RH AS 3.0 WAN or TIER1 LAN SAN 2 6 stager with diskserver RH AS TB Local staging area Indicates Full redundancy FC 2Gb/s connections (dual controller HW and Qlogic SANsurfer Path Failover SW) STK L mixed slots 6 drives LTO2 (20-30 MB/s) 4 drives 9940B (25-30 MB/s) 1300 LTO2 (200 GB native) B (200 GB native) Sun Blade v100 with 2 internal ide disks with software raid-0 running ACSLS 7.0 OS Solaris 9.0

Andrea Chierici - INFN-CNAF22 3rd April 2006 Other Storage Activities dCache testbed currently deployed  4 pool servers w/ about 50 TB  1 admin node  34 clients  4 Gbit/sec uplink GPFS currently under stress test  Focusing on [LHCb] analysis jobs, submitted to the production batch system jobs submitted, ca. 500 in simultaneous run state, all jobs completed successfully. 320 MByte/sec effective I/O throughput.  IBM support options still unclear  See presentation on GPFS and StoRM in the file system session.

Andrea Chierici - INFN-CNAF23 3rd April 2006 DB Service Active collaboration with 3D project One 4-nodes Oracle RAC (test environment)  OCFS2 functional tests  Benchmark tests with Orion, HammerOra Two 2-nodes Production RACs (LHCb and ATLAS)  Shared storage accessed via ASM, 2 Dell PowerVault 224F, 2TB raw Castor2: 2 single instance DBs (DLF and CastorStager) One Xeon 2,4 with a single instance database for Stream replication tests on 3D testbed Starting deployment of LFC, FTS, VOMS readonly replica