The Italian Tier-1: INFN-CNAF 11-Oct-2005 Luca dell’Agnello Davide Salomoni.

Slides:



Advertisements
Similar presentations
Chapter 3: Planning a Network Upgrade
Advertisements

Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003.
IFIN-HH LHCB GRID Activities Eduard Pauna Radu Stoica.
GRIF Status Michel Jouvin LAL / IN2P3
“A prototype for INFN TIER-1 Regional Centre” Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR La Biodola, 8 Maggio 2002.
Luca dell’Agnello INFN-CNAF FNAL, May
Agenda Network Infrastructures LCG Architecture Management
Tier 3g Infrastructure Doug Benjamin Duke University.
INFN – Tier1 Site Status Report Vladimir Sapunenko on behalf of Tier1 staff.
Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.
INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF.
Computing/Tier 3 Status at Panjab S. Gautam, V. Bhatnagar India-CMS Meeting, Sept 27-28, 2007 Delhi University, Delhi Centre of Advanced Study in Physics,
INFN Tier1 Andrea Chierici INFN – CNAF, Italy LCG Workshop CERN, March
Soluzioni HW per il Tier 1 al CNAF Luca dell’Agnello Stefano Zani (INFN – CNAF, Italy) III CCR Workshop May
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
CC - IN2P3 Site Report Hepix Fall meeting 2009 – Berkeley
October, Site Report Roberto Gomezel INFN.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Spring 2014.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
T0/T1 network meeting July 19, 2005 CERN
Federico Ruggieri INFN-CNAF GDB Meeting 10 February 2004 INFN TIER1 Status.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
Tier1 status at INFN-CNAF Giuseppe Lo Re INFN – CNAF Bologna Offline Week
1 PRAGUE site report. 2 Overview Supported HEP experiments and staff Hardware on Prague farms Statistics about running LHC experiment’s DC Experience.
Fabric Monitor, Accounting, Storage and Reports experience at the INFN Tier1 Felice Rosso on behalf of INFN Tier1 Workshop sul.
Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.
October, HEPiX Fall 2005 at SLACSLAC Site Report Roberto Gomezel INFN.
KOLKATA Grid Site Name :- IN-DAE-VECC-02Monalisa Name:- Kolkata-Cream VO :- ALICECity:- KOLKATACountry :- INDIA Shown many data transfers.
Tier1 Andrew Sansum GRIDPP 10 June GRIDPP10 June 2004Tier1A2 Production Service for HEP (PPARC) GRIDPP ( ). –“ GridPP will enable testing.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
Fabric Monitoring at the INFN Tier1 Felice Rosso on behalf of INFN Tier1 Joint OSG & EGEE Operations WS, Culham (UK)
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
SA1 operational policy training, Athens 20-21/01/05 Presentation of the HG Node “Isabella” and operational experience Antonis Zissimos Member of ICCS administration.
CASTOR CNAF TIER1 SITE REPORT Geneve CERN June 2005 Ricci Pier Paolo
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
R. Krempaska, October, 2013 Wir schaffen Wissen – heute für morgen Controls Security at PSI Current Status R. Krempaska, A. Bertrand, C. Higgs, R. Kapeller,
Database CNAF Barbara Martelli Rome, April 4 st 2006.
The Italian Tier-1: INFN-CNAF Andrea Chierici, on behalf of the INFN Tier1 3° April 2006 – Spring HEPIX.
IT-INFN-CNAF Status Update LHC-OPN Meeting INFN CNAF, December 2009 Stefano Zani 10/11/2009Stefano Zani INFN CNAF (TIER1 Staff)1.
Storage at TIER1 CNAF Workshop Storage INFN CNAF 20/21 Marzo 2006 Bologna Ricci Pier Paolo, on behalf of INFN TIER1 Staff
PADME Kick-Off Meeting – LNF, April 20-21, DAQ Data Rate - Preliminary estimate Tentative setup: all channels read with Fast ADC 1024 samples, 12.
Storage & Database Team Activity Report INFN CNAF,
Daniele Cesini - INFN CNAF. INFN-CNAF 20 maggio 2014 CNAF 2 CNAF hosts the Italian Tier1 computing centre for the LHC experiments ATLAS, CMS, ALICE and.
November 28, 2007 Dominique Boutigny – CC-IN2P3 CC-IN2P3 Update Status.
Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.
NERSC/LBNL at LBNL in Berkeley October 2009 Site Report Roberto Gomezel INFN 1.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Quattor installation and use feedback from CNAF/T1 LCG Operation Workshop 25 may 2005 Andrea Chierici – INFN CNAF
Stato del Tier1 Luca dell’Agnello 11 Maggio 2012.
Validation tests of CNAF storage infrastructure Luca dell’Agnello INFN-CNAF.
INFN Site Report R.Gomezel October 9-13, 2006 Jefferson Lab, Newport News.
status, usage and perspectives
Luca dell’Agnello INFN-CNAF
INFN CNAF TIER1 Network Service
INFN Computing infrastructure - Workload management at the Tier-1
Andrea Chierici On behalf of INFN-T1 staff
Luca dell’Agnello INFN-CNAF
The INFN TIER1 Regional Centre
The INFN Tier-1 Storage Implementation
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
Presentation transcript:

The Italian Tier-1: INFN-CNAF 11-Oct-2005 Luca dell’Agnello Davide Salomoni

Introduction Location: INFN-CNAF, Bologna (Italy) –one of the main nodes of the GARR network Computing facility for INFN HNEP community –Partecipating to LCG, EGEE, INFNGRID projects Multi-Experiment TIER1 –LHC experiments –VIRGO –CDF –BABAR –AMS, MAGIC, ARGO, PAMELA,… Resources assigned to experiments on a yearly basis

Infrastructure (1) Hall in the basement (-2 nd floor): ~ 1000 m 2 of total space –Easily accessible with lorries from the road –Not suitable for office use (remote control) New control and alarm systems under installation (including cameras to monitor the hall) –Hall temperature –Circuit cold water temperature –Electric power transformer temperature Electric power system (1250 KVA) –UPS: 800 KVA (~ 640 KW) needs a separate room (conditioned and ventilated) Not used for conditioning system –Electric Generator: 1250 KVA (~ 1000 KW) up to 160 racks (~100 with 3.0 GHz Xeon) –220 V mono-phase (computers) 4 x 16A PDU needed for 3.0 GHz Xeon racks –380 V three-phase for other devices (tape libraries, air conditioning etc…) –Expansion under evaluation

Infrastructure (2) Cooling –RLS (Airwell) on the roof ~ 530 KW cooling power Water cooling Need “booster pump” (20 mts T1  roof) Noise insulation needed on the roof –1 Air Conditioning Unit (uses 20% of RLS refreshing power and controls humidity) –14 Local Cooling Systems (Hiross) in the computing room ~ 30 KW each Main challenge is electric power needed in 2010 –Presently Intel Xeon 110 Watt/KspecInt with quasi-linear increase in Watt/SpecInt –New Opteron consumption 10% less –Opteron Dual Core ~factor less ?

WN typical Rack Composition Power Controls (3U) 1 network switch (1- 2U) – 48 FE copper interfaces – 2 GE fiber uplinks ~ 36 1U WNs – Connected to network switch via FE – Connected to KVM system

Remote console control Paragon UTM8 (Raritan) –8 Analog (UTP/Fiber) output connections –Supports up to 32 daisy chains of 40 nodes (UKVMSPD modules needed) –IP-reach (expansion to support IP transport) evaluted but not used –Used to control WNs Autoview 2000R (Avocent) –1 Analog + 2 Digital (IP transport) output connections –Supports connections up to 16 nodes Optional expansion to 16x8 nodes –Compatible with Paragon (“gateway” to IP) –Used to control servers

Power Switches 2 models used: “Old” APC MasterSwitch Control Unit AP9224 controlling 3 x 8 outlets 9222 PDU from 1 Ethernet “New” APC PDU Control Unit AP7951 controlling 24 outlets from 1 Ethernet “zero” Rack Unit (vertical mount) Access to the configuration/control menu via serial/telnet/web/snmp Dedicated machine using APC Infrastructure Manager Software Permit remote switching off of resources in case of serious problems

Networking Main network infrastructure based on optical fibres (~ 20 Km) LAN has a “classical” star topology with 2 Core Switch/Router (ER16, BD) –Migration to Black Diamond with 120 GE and 12x10GE ports (it can scale up to 480 GE or 48x10GE) –Each CPU rack equipped with FE switch with 2xGb uplinks to core switch –Disk servers connected via GE to core switch (mainly fibre) Some servers concentrated with copper cables to a dedicated switch –VLAN’s defined across switches (802.1q) 30 rack switches (14 switches 10Gb Ready): several brands, homogeneous characteristics –48 Copper Ethernet ports –Support of main standards (e.g q) –2 Gigabit up-links (optical fibers) to core switch CNAF interconnected to GARR-G backbone at 1 Gbps + 2 x 1 Gbps for SC –Giga-PoP co-located –Upgrading SC link to 10 Gbps –New access router (Cisco 7600 with 4x10GE and 4xGE interfaces) under installation

L1L2 CNAF Production Network backdoor (CNAF) SC layout /24 GARR 2x 1Gbps links aggregation L0 General Internet 1Gbps link T1 n x 1Gbps link LAN & WAN T1 connectivity 10 Gbps link (October)

HW Resources….. CPU: –700 biprocessor boxes 2.4 – 3 GHz (+70 servers) –150 new Opteron biprocessor boxes 2.6 GHz (~ 2200 Euro each + VAT) 1700 KSi2k Total Decommissioning ~ 100 WNs (~ 150 KSi2K) moved to test farm –Tender for 800 KSI2k (Summer 2006) Disk: –FC, IDE, SCSI, NAS technologies –470 TB raw (~ 430 FC-SATA) 2005 tender: 200 TB raw (~ 2260 Euro/TB net + VAT) –Tender for 400 TB (Summer 2006) Tapes: –Stk L TB –Stk LTO-2 with 2000 tapes  400 TB B with 800 tapes  160 TB (1.5 KEuro/TB  0.35 KEuro/TB)

….. Human Resources ~ 14 FTE available –Farming service: 4 FTE –Storage service: 5 FTE –Logistic service: 2 FTE –Network & Security service: 3 FTE

Farm(s) 1 general purpose farm (~ 750 WNs, 1550 KSI2k) –SLC 3.0.5, LCG 2.6, LSF –Accessible both from Grid and locally –Also 16 InfiniBand WNs for MPI on special queue 1 dedicated farm to CDF (~ 80 WNs) –RH 7.3,Condor Migration to SLC and inclusion in general farm planned –CDF can run also on general farm with glide-in Test farm (~ 100 WNs, 150 KSI2k)

Access to Batch system “Legacy” non Grid Access CELSF Wn1WNn SE Grid Access UI Grid

Farming tasks Installation & management of Tier1 WNs and servers –Using Quattor Deployment & configuration of OS & LCG middleware –HW maintenance management –Management of batch scheduler (LSF) Migration from torque+maui to LSF (v6.1) last Spring –Torque+maui apparently not scalable –LSF farm running successfully –Fair sharing model for resource access 1 queue/experiment (at least) Queue policies under evaluation –Progressive inclusion of CDF farm into general one Access to resources centrally managed with Kerberos (authc) and LDAP (authz) –Group based authorization

c=it o=infn U AFS: infn.it G o=cnaf UG Generic CNAF users infn user public view ou=afs Authorization with LDAP private view ou=cnaf G U A R R U G ou=people ou=group A ou=role ou=automount N ou=people-nologin N

The queues QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP dteam 200 Open:Active babar_test 80 Open:Active babar_build 80 Open:Active alice 40 Open:Active cdf 40 Open:Active atlas 40 Open:Active cms 40 Open:Active cms_align 40 Open:Active lhcb 40 Open:Active babar_xxl 40 Open:Active babar_objy 40 Open:Active babar 40 Open:Active virgo 40 Open:Active argo 40 Open:Active magic 40 Open:Active ams 40 Open:Active infngrid 40 Open:Active pamela 40 Open:Active quarto 40 Open:Active guest 40 Open:Active test 40 Open:Active geant4 30 Open:Active biomed 10 Open:Active pps 10 Open:Active

Farm usage CPU total time total wall-clock time

Farm usage

Tier1 Database (1) Resource database and management interface –Postgres database as back end –Web interface (apache+mod_ssl+php) –Hw servers characteristics –Sw servers configuration –CPU allocation Interoperability of other applications with the db –Monitoring/accounting system –Nagios Interface to configure switches and interoperate with Quattor –Vlan tags –ddns –Dhcp

Tier1 Database (2)

Storage & Database Tasks –DISK (SAN, NAS) - HW/SW installation and maintenance, remote (gridSE) and local (rfiod/nfs/GPFS) access service, clustered/parallel filesystem tests, participation to SC 2 SAN systems (~ 225 TB) 4 NAS systems (~ 60TB) –CASTOR HSM system - HW/SW installation and maintenance, gridftp and SRM access service STK library with 6 LTO2 and B drives (+4 to install) –2000 LTO2 (200 GB) tapes – B (200 GB) tapes –DB (Oracle for Castor & RLS test, Tier 1 “global” Hardware db)

Storage status Physical access to main storage (Fast-T900) via SAN –Level1 disk servers connected via FC Usually also in GPFS cluster –Easiness of administration –Load balancing and redundancy –Lustre under evaluation –Can be level2 disk servers connected to storage only via GPFS LCG and FC dependencies on OS decoupled WNs are not members of GPFS cluster (no scalability on large number of WNs) –Storage available to WNs via rfio, xrootd (BABAR only), gridftp/SRM or NFS (sw distribution only) CASTOR HSM system (SRM interface) –STK library with 6 LTO2 and B drives (+4 to install) 1200 LTO2 (200 GB) tapes B (200 GB) tapes

Summary & Conclusions INFN-Tier1 startup was in , start-up phase ended During 2004 INFN Tier1 began to ramp up towards LHC –Some experiments (e.g. BABAR, CDF) already in data taking phase A lot of work still needed… –Infrastructural issues –Consolidation –Technological uncertainties –Management –Customer requests!