Stato del Tier1 Luca dell’Agnello 11 Maggio 2012.

Stato del Tier1 Luca dell’Agnello 11 Maggio 2012

2 INFN-Tier1 Italian Tier-1 computing centre for the LHC experiments ATLAS, CMS, ALICE and LHCb.... … but also one of the main Italian processing facilities for several other experiments: BaBar and CDF Astro and Space physics VIRGO (Italy), ARGO (Tibet), AMS (Satellite), PAMELA (Satellite) and MAGIC (Canary Islands) More (e.g. Icarus, Borexino)

INFN-Tier1: numbers > 20 supported experiments ~ 20 FTEs 1000 m 2 room with capability for more than 120 racks and several tape libraries – 5 MVA electrical power – Redundant facility to provide 24hx7d availability Within May (with new 2012 tenders) even more resources – 1300 server with about 10000 cores available – 11 PBytes of disk space for high speed access and 14 PBytes on tapes (1 tape library) Aggregate bandwith to storage: ~ 50 GB/s WAN link at 30 Gbit/s – 2x10 Gbit/s over OPN – With forthcoming GARR-X bandwith increase is expected 3

Chiller floor CED (new room) Electrical delivery UPS room 4

7600 GARR 2x10Gb/s 10Gb/s T1-T2’s CNAF General purpose WAN RAL PIC TRIUMPH BNL FNAL TW-ASGC NDFGF Router CERN per T1-T1 LHC-OPN (20 Gb/s) T0-T1 +T1-T1 in sharing Cross Border Fiber (from Milan) CNAF-KIT CNAF-IN2P3 CNAF-SARA T0-T1 BACKUP 10Gb/s 20Gb/s LHC ONE T2’s NEXUS T1 resources 20 Gb phisical Link (2x1Gb) for LHCOPN and LHCONE Connectivity 10 Gigabit Link for General IP connectivity LHCHONE and LHC-OPN are sharing the same phisical ports now but they are managed as two completely different links (different VLANS are used for the point-to-point interfaces). All the TIER2s wich are not connected to LHCONE are reached only via General IP. Current WAN Connections

20 Gb phisical Link (2x1Gb) for LHCOPN and LHCONE Connectivity 10 Gigabit phisical Link to LHCONE (dedicated to T1-T2’s traffic LHCONE) 10 Gigabit Link for General IP connectivity A new 10Gb/s dedcated to LHCONE link will be added. 7600 GARR 2x10Gb/s 10Gb/s T1-T2’s CNAF General purpose WAN RAL PIC TRIUMPH BNL FNAL TW-ASGC NDFGF Router CERN per T1-T1 LHC-OPN (20 Gb/s) T0-T1 +T1-T1 in sharing Cross Border Fiber (from Milan) CNAF-KIT CNAF-IN2P3 CNAF-SARA T0-T1 BACKUP 10Gb/s 20Gb/s LHC ONE T2’s 10Gb/s NEXUS T1 resources Future WAN Connection (Q4 2012)

CNAF in the grid CNAF is part of the WLCG/EGI infrastructure, granting access to distributed computing and storage resources – Access to computing farm via the EMI CREAM Compute Elements – Access to storage resources, on GEMSS, via the srm end-points – Also “legacy” access (i.e. local access allowed) Some typical grid acronyms for storage: – SE (Storage Element) a Grid service that allows Grid users to store and manage files together with the space assigned to them. – SRM (Storage Resource Manager) middleware component whose function is to provide dynamic space allocation and file management in spaces for shared storage components on the Grid. Essential for bulk operations on tape system.

Middleware status Deployed several EMI nodes – UIs, CreamCEs, Argus, BDII, FTS, Storm, WNs – Legacy glite-3.x phased-out almost completely – Planning to completely migrate glite-3.2 nodes to EMI within summer Atlas and LHCb switched to cvmfs for software area – Tests ongoing on cvmfs server for SuperB 824-apr-2012Andrea Chierici

Computing resources Currently ~ 110K HS06 (2011) – We host other sites T2 LHCb (~5%) T3 UniBO (~2%) – New tender (June 2012) will add ~15K HS06 41 enclosures, 41x4 mb, 192 HS06/mb Nearly constant number of boxes (~ 1000) – Farm utilizzata ~ 100% costantemente ~ 8900 job slots (  9200 job slots) > 80000 jobs/day Buona efficienza di uso (CPT/WCT) – 83% (ultimi 12 mesi) Installazione CPU 2011 Uso farm ultimi 12 mesi Uso WCT/CPT ultimi 12 mesi

Storage resources TOTAL of 8.6 PB on-line (net) disk space (GEMSS) 7 EMC 2 CX3-80 + 1 EMC 2 CX4-960 (~1.8 PB) + 100 servers In phase-out nel 2013 7 DDN S2A 9950 (~7 PB) + ~60 servers – Phase-out nel 2015-2016 … and under installation 3 Fujitsu Eternus DX400 S2 (3 TB SATA) : + 2.8 PB Tape library Sl8500 9PB + 5PB (just installed) on line with 20 T10KB drives and 10 T10KC drives – 9000 x 1 TB tape capacity, ~ 100MB/s of bandwidth for each drive – 1000 x 5 TB tape capacity, ~ 200MB/s of bandwidth for each drive – Drives interconnected to library and servers via dedicated SAN (TAN). 13 Tivoli Storage manager HSM nodes access to the shared drives. – 1 Tivoli Storage Manager (TSM) server common to all GEMSS instances. All storage systems and disk-servers are on SAN (4Gb/s or 8Gb/s) 11

WHAT IS GEMSS? A full TSM (Hierarchical Storage Management) integration of GPFS, TSM and StoRM* (StoRM is the SRM front-end see Poster Session) Minimize management effort and increase reliability: – Very positive experience for scalability so far; – Large GPFS installation in production at CNAF since 2005 with increasing disk space and number of users; Over 8 PB of net disk space partitioned in several GPFS clusters served by ~100 disk-servers (NSD + gridFTP); ~9 PB of tape space. – 2 FTE employed to manage the full system; – All experiments at CNAF (LHC and non-LHC) agreed to use GEMSS 12

13 GEMSS resources layout WAN or TIER1 LAN ~100 Diskservers with 2 FC connections TAPE (14PB avaliable 8.9PB used ) SAN/TAN Farm Worker Nodes (LSF Batch System) for 120 HS-06 i.e 9000 job slot GPFS client nodes Fibre Channel (4/8 gb/s) DISK ACCESS STK SL8500 robot (10000 slots) 20 T10000B drives 10 T10000C drives GPFS NSD diskserver Fibre Channel (4/8 gb/s) DISK ACCESS 13 server with triple FC connections 2 FC to the SAN (disk access) 1 FC to the TAN (tape access) Fibre Channel (4/8 gb/s) TAPE ACCESS Fibre Channel (4/8 gb/s) DISK&TAPE ACCESS DATA access from the FARM Worker Nodes use the TIER1 LAN. The NSD diskservers use 1 or 10Gb network connections. TIVOLI STORAGE MANAGER (TSM) HSM nodes DISK ~8.4PB net space 13 GEMSS TSM HSM nodes provide all the DISK  TAPE data migration thought the SAN/TAN fibre channel network. 1 TSM SERVER NODE RUNS THE TSM INSTANCE

Risorse Dismissioni disco AnnoTB 20120 20131800 20140 20155900 Dismissioni CPU AnnoHS06 201215519 201366255 201426884 201531488 Costo CPU(2011): 13 E/HS06 Con costi occulti (da sommare costo rete etc..) Manutenzione vecchie CPU: 98 E/box (2 mb) 1 box “vecchia” ~96x2 HS06  ~ 0.61 E/HS06 Valutare upgrade blade? Costo disco (2011): 261 E/TB Senza server! Presso estremamente competitivo (ripetibile?) Costo tape: ~ 40 E/TB (con trade-in) 1 tape driver: ~ 15 KE Rinnovo hw servizi (extra wn, extra disk-server), manutenzioni (es. SAN)

Stato del Tier1 Luca dell’Agnello 11 Maggio 2012.

Similar presentations

Presentation on theme: "Stato del Tier1 Luca dell’Agnello 11 Maggio 2012."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Stato del Tier1 Luca dell’Agnello 11 Maggio 2012.

Similar presentations

Presentation on theme: "Stato del Tier1 Luca dell’Agnello 11 Maggio 2012."— Presentation transcript:

Similar presentations

About project

Feedback