Heterogeneous Computation Team HybriLIT HPC Cluster. The Modern Paradigm Matveev Mikhail on behalf of the Heterogeneous Computation Team HybriLIT Laboratory of Information Technologies, JINR Complex GIMM FPEIP is intended for simulation of thermal processes in materials irradiated by heavy ion beams. July 5 Dubna, GRID 2016
Content Essentials of HybriLIT cluster configuration; The network boot method; Upgrades: SLURM, CVMFS, Modules.
Essentials of HybriLIT cluster configuration 7 blades include specific GPU accelerator sets. Driven by NVIDIA CUDA software. 1 blade includes 2 PHI accelerators. Driven by Intel MPSS software. 1 blade includes 1 PHI and 1 GPU accelerators. Mixed NVIDIA CUDA and Intel MPSS software. 1 blade includes 2 multi-core CPU processors. Large ~7 Tb storage area Presently the HybriLIT cluster includes 10 distinct computation physical nodes in terms of: gpu, cpu & phi
The network boot method Typical diskless boot Our diskless boot to ramfs PC power ON, Network stack loads to RAM; Ask IP and IP of TFTP server from DHCP server Linux kernel loads to RAM Linux initrd loads to RAM (modified initrd to network boot) Mount network filesystem Start modified init Linux services starts PC power ON, Network stack loads to RAM; Ask IP and IP of TFTP server from DHCP server Linux kernel loads to RAM Linux initrd loads to RAM ( modified initrd to network boot) Load and unpack-by-the-fly packed image with filesystem to RAM 7. Start modified init 8. Linux services starts 9. Our cluster services starts nanoramfs
The network boot method: nanoramfs We have prepared an image interface that includes commands of network settings. This small image is called nanoramfs. It is downloaded during the load of the core operation system. Hi! My name is blade04. Please, load me with gpu tasks! Hi! My name is blade03. Have you phi tasks for me? Hi! My name is blade02. You can load me with any task. Hi! How are you doing?
The network boot method: ramfs The ramfs image comes next. It includes commands for setting specific computation elements such as cpu, gpu, phi, as well as associated software services. Thank you! I can use my gpu accelerators! Thank you! I can use my phi accelerators! Just wait a few second, please. After this reboot process the computation node becomes available for users` tasks
The network boot method: general Three basic purposes: To enable dynamic cluster extension in the future by allowing quick add of new computation nodes to the structure of the cluster; To make simultaneous software changes or upgrades at all computation nodes; To get quick setup of nodes, including errors after reboot.
SLURM: accommodation of the new blades Simple Linux Utility for Resource Management interactive (1xblade =1 mic, 1 gpu, 2 cpu sockets), cpu (2xblades=4 cpu sockets ), gpu (4xblades=12 gpu sockets), phi (1xblade =2 mic sockets), gpuK80 (3xblades=8 gpu sockets). Blade specifications: 2x Intel Xeon E5-2695 v3 2x 14 cores; 4x NVIDIA Tesla K80 4x 4992 CUDA cores; 512 Gb RAM NEW CALCULATION NODE CPU cores GPU cores PHI cores 224+28 57216+19968 182
CVMFS: network access to the CERN repository CERN Virtual Machine File System The operational CVMFS interface at HybriLIT has two main characteristics: Reservation of 32 Gb SSD storage at each node for CVMFS packages; Dedicated extension of MODULES environment. hybrilit cern Task with ID 2447 uses blade02 by CUDA 7.5 CUDA 7.5 Each node includes ssd storage with 32 Gb capacity for CVMFS packages. ROOT Task with ID 2448 uses blade04 by ROOT
MODULES: CVMFS devoted newly added modules [user@hydra] module avail hlit/opencv/2.4.9 hlit/cuda/5.5 hlit/fairsoft/nov15 hlit/openmpi/1.6.5 hlit/cuda/6.0 hlit/gcc/4.8.4 hlit/openmpi/1.8.1 hlit/cuda/6.5 hlit/gcc/4.9.3 hlit/magma/2.0.0 hlit/cuda/6.5 hlit/java/jdk-1.6.0_45 hlit/scotch/6.0.4 hlit/cuda/7.0 hlit/java/jdk-1.7.0_60 hlit/zeromq/4.1.3 hlit/cuda/7.5 hlit/java/jdk-1.8.0_05 hlit/zyre/1.1.0 hlit/czmq/3.0.2 … and so on for HybriLIT own modules …
Conclusions The developed hardware-software environment of the HybriLIT cluster secures efficient system administration; Its features match the requirements of scalability and high fault tolerance; Network access to remote software resources secures efficient fulfillment of users` needs; The acquired expertise creates hope for connection of resources of remote heterogeneous clusters.
Thank you for your attention!