Stallo: First impressions Roy Dragseth Team Leader, HPC The Computer Center roy.dragseth@cc.uit.no 02.08.2018
University of Tromsø Northernmost university in the world. Staff: 2000 Students: 6000 Tromsø 02.08.2018
Background Uit installed a new HPC system in late 2007. Extremely tight time schedule. Also established a new machine room for the new system. 02.08.2018
Timeplan 30. oct. Machine room ready 2. nov. HW installation starts. 10. nov. HW installation done. 1. des. First users. 1. jan. Full production. 02.08.2018
System config 704 HP BL460c blades HP SFS (lustre) storage 5632 CPUcores 12 TB memory 384 nodes with ib. HP SFS (lustre) storage 66 SFS20 arrays 18 DL380 servers 128 TB net storage 02.08.2018
Measured performance. Theoretical peak 59.9TF/s HPL linpack: 15TF/s (83. on Top500) (later: 17TF/s with IB) Iozone read/write: 9.5/6.5 GB/s (64 clients, 32GB files each) MPI Latency: 1.3/2.1 μs ping-pong MPI Bandwidth: 1300MB/s one-way 02.08.2018
Usage profile so far The system is designed for througput so the typical job is using 32-256 cores per run. 02.08.2018
interconnect usage The infiniband nodes (c1-c24) seems to be in higher demand than the ethernet ones 02.08.2018
System software OS: Rocks cluster distribution (CentOS) Storage: HP SFS (Lustre) Batch: Torque/maui Compilers: INTEL and GCC MPI: OpenMPI/OFED 02.08.2018
Pleasant suprises Rocks scales to this level rather effortless. This is also true for the batch system. The blade management CLI works really well. The MCS cooling systems makes it bearable to work in the machine room. User feedback has been overwhelmingly positive! 02.08.2018
Unpleasant suprises Single disk errors take down the global filesystem (HP SFS/Lustre) OpenMPI/OFED has some quirks and sometimes needs tuning on a per app. basis. If we loose cooling the machine room will overheat even if we turn off all systems! 02.08.2018
Whishes Fix the SFS20 firmware!!! Publish all SNMP MIBs. Publish all freely available rpms in a searchable manner (at least, do not hide them in isos). 02.08.2018
Questions? 02.08.2018