Portuguese Grid Infrastruture(s) Gonçalo Borges Jornadas LIP 2010 Braga, Janeiro 2010
2 Portuguese EGEE Infrastructure 8 Sites – LIP-Lisboa, LIP-Coimbra – NCG-INGRID-PT – UPorto (3 clusters) – UMINHO-CP, DI-UMINHO – IEETA – CFP-IST Resources – 2500 job slots – Hundrends of Terabytes for Storage space Support for more than 20 VOs – WLCG,EGEE, EELA – INGRID, IBERGRID
RCTS 10 Gbps link between Lisbon and Porto – ~ 1 Gbps to other important regions Important improvements are foreseen – Better International high speed connectivity New links through Spain are (almost) operational o Minho and Galicia o Spanish Estremadura o 5 Gbps – Better Geant connectivity – Better redundancy Ring between both countries 5 Gpbs
4 INGRID overview INGRID: Iniciativa National GRID – Push for resource sharing in Portugal – Follows the path of EGI – Helps to fulfill the Portuguese responsibilities in the framework of present European Projects EGEE, WLCG, IBERGRID... INGRID Management Commitee: – FCT, UMIC and LIP (technical coordination) Steps towards INGRID deployment... – Selection of 13 pilot applications HEP, Weather Forecast, Civil Protection,... – Deploy a dedicated data centre to host resources Located in LNEC Works as the INGRID seed / core infrastructure
5 INGRID main node Computer room area 370m 2 – 85 cm raised floor Electrical power: – 1st step:1000 kVA – 2nd step 2000 kVA Protected power – 5x UPS 200kVA – Diesel generator Chilled water cooling: – Chillers with free-cooling (2x375kW) – Close-control units (6x150kW+47kW) Fire detection – Very Early Warning Smoke Detection – Fire extinction being installed
6 INGRID main node Central 10 Gigabit Ethernet switch Vlan Grid Cluster Vlan Core Services Vlan Site Services Vlan ISCSI Arrays Vlan Support Services Vlan User Services Vlan Routers Internet 3.5 Gbps LIP-Lisbon + LIP-Coimbra 10 Gbps FW/NAT 2 FW/NAT 1 FW Different VLANs: Scalability, Security, Easier Management – Inbound/Outbound traffic to the grid local farm via 2 linux boxes FW/NAT servers connected to the central switch at 10 Gbps.
7 INGRID main node Core 10gigabit Ethernet switch... Computing Blade Centres SGE cluster Storage = Lustre + StoRM Support Services Blade Centres net Firewall
8 Core Services Fault Tolerance Redundant services distributed between 2 blade centres – Solution based on Xen Virtual machines Xen images available Storage accessible via Internet SCSI (iSCSI) Controlled by OCFS2 shared cluster file system... o guaranties full data coherence and data integrity in simultaneous data accesses from multiply hosts... Xen Images Repository ISCSI ARRAY 1 12 x 1TB SATA disks Raids 10, 6 LUNS ISCSI ARRAY 6 12 x 1TB SATA disks Raids 10, 6 LUNS Central 10 Gigabit Ethernet switch Blade centre 2 x 512 MB iSCSI controllers OCFS2 Filesystem
9 Computing INGRID main node Core services ( failures have a critical impact on the infrastructure availability) Core services ( failures have a critical impact on the infrastructure availability) – 2 different blade centres with server direct attached storage – Redudant power suplies, management and network connectivity – IBM blades running SL5 x86_64 Xen Dom0 Kernels: 24 LS22 IBM blades (2 quad-core AMD opteron GHz) 2 Gigabit Ethernet interfaces; Intelligent Platform Management Interface (version 2.0) 146 GB of local storage under 2 SAS disks in Raid mirror 192 cores, 3 GB of RAM per core (24GB RAM / blade) High Throughput Computing Servers High Throughput Computing Servers – Bladecentres hosting 12 / 14 blades running SL5 x86_64: 100 LS22/HS21 IBM blades (2 quad-core AMD opteron GHz) 42 Proliant BL460 G6 HP blades (2 quad-core Intel Xeon 2.67GHz) 1136 cores, 3 GB of RAM per core (24GB RAM / blade) High Performance Computing Servers High Performance Computing Servers – IBM bladecentre H with infiniband switch running SL5 x86_64: 14 LS22 IBM Blades (2 quad-core AMD opteron GHz) 20 Gbps Double Data Rate Host Channel Adaptors 112 cores, 4 GB of RAM per core (32GB RAM / blade)
10 Storage INGRID main node Storage servers and expansion boxes – 13 IBM X3650 servers running SL5 x86_64 2 quad-core Intel(R) Xeon(R) L GHz 2 SAS disks (74GB) deployed in Raid mirror Each server has associated 40 TB of effective storage o 2 LSI Mega Raid Controllers (linking to the expansion boxes) o Expansion boxes in Raid 5 Volumes with 1 TB SATA-II disks Total of ~ 620 TB of online grid storage space Storage Network interfaces o Two builtin 10/100/1000 BASE-T Broadcom Gigabit Ethernet o One NetXen 10 Gigabit Ethernet PCIe interface – 2 HP DL380 G6 running SL5 x86_64 Expansion boxes with 4 x 12 SAS disks 450GB 15K RPM disks 10 Gigabit Ethernet, 2 NetXen 10 Gigabit Ethernet PCIe interface – 2 HP DL360 G5 dedicated servers to run gridftp services – Grid access enabled via the StoRM SRM interface – Sun Microsystems’ Lustre – Sun Microsystems’ Lustre cluster shared file system
HP Proliant BL380 G6 Computing INGRID main node 11 IBM Blade Center E IBM System X3650 Server 4x12TB expansions IBM Blade Center E Core services iSCSI (SAN) HP bladecenter C7000 HP Proliant BL460C G6