ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP 8000 cores in 3 years, in this year Distributed parallel filesystem of 1 PB in 3 years, TB in this year kW of power (35-40 kW in this year)
ITEP computing center and plans for supercomputing
Hardware: 7U blade system with 10 twin modules (20 nodes per blade chassis) 3 blade chassis enclosures with power supply per 42U rack kW per 42U rack. Two 36 QSFP ports infiniband switches per blade chassis 36 QSFP porrts infiniband switches for the second level of fat tree 2 x AMD 12 cores CPU per node 64 GB of RAM per node Two channel of 4xQDR Infiniband per node for interprocess communication ITEP computing center and plans for supercomputing
Infiniband topology: Two levels fat tree using 36 x QSFP ports infiniband switches
ITEP computing center and plans for supercomputing Software: RedHat based distribution (Scientific Linux or CentOS) for x86_64 architecture. TORQUE batch system with maui as scheduler. OpenMPI with TORQUE integration (mvapich and mvapich2 is under consideration) OpenMP BLAS, lapack including ATLAS versions, ACML
ITEP computing center and plans for supercomputing Prototype: 7U blade system 10 twin blade modules, 20 nodes 36 QSFP ports switch module 22x1GbE + 3x10GbE ports Ethernet switch module Node characteristics: Dual Xeon X GHz 6 cores 32GB RAM 500 GB disk One 4xQDR Infiniband port Dual 1GbE Ethernet (one channel connected)
ITEP computing center and plans for supercomputing Prototype software configuration: CentOS 5.6 x86_64 TORQUE batch system maui scheduler OpenMPI 1.4 integrated with TORQUE BLAS, lapack including ATLAS version
ITEP computing center and plans for supercomputing Benchmarking: Single node (12 processes): Linpack (N=60000, NB=128, P=4, Q=3)- 98 Gflops (77% of theoretical performance) One process per node (12 processes): Linpack (N=60000, NB=128, P=4, Q=3)- 100 Gflops (78% of theoretical performance) Cluster full load (240 processes): Linpack (N=250000, NB=128, P=16, Q=15) Gflops (70% of theoretical performance)
ITEP computing center and plans for supercomputing Distributed parallel filesystem: Glusterfs with RDMA and TCP as transport Local disks of the nodes are used for glusterfs volumes Replication (mirroring) of data provides fault tolerance Linear speed for 1 file:Write: 46 MB/s, Read: 84 MB/s Bandwidth for cluster: Write: 450 MB/s, Read: 839 MB/s
ITEP computing center and plans for supercomputing Infiniband bandwidth measurements for prototype: Approximated bandwidth for cluster: 56 GByte/s for 20 nodes (bidirectional)
ITEP computing center and plans for supercomputing Zabbix monitoring system: Active client with vast possibility of customization SNMP monitoring and traps IPMI monitoring and control Triggers and events for group of hosts including the usage of the aggregate functions Powerful and flexible tools for triggers and actions description Presentation of data in many ways
ITEP computing center and plans for supercomputing Example of presentation data from different sources in ZABBIX
ITEP computing center and plans for supercomputing Next steps: Dedicated storage with lustre filesystem First stage of the supercomputer with AMD processors Two levels fat tree infiniband topology Moving to RHEL 6 based operating system
ITEP computing center and plans for supercomputing Thank you