Download presentation
Presentation is loading. Please wait.
Published byArlene Privitt Modified over 10 years ago
1
Issues of HPC software From the experience of TH-1A Lu Yutong NUDT
2
TH-1A system Installed in NSCC-TJ, Aug. 2010 Hybrid MPP structure: CPU & GPU Custom software stack Peak performance 4.7Pflop/s, Linpack 2.566PFlop/s, No.1Top500 Power consumption 4.04MW(635.15MF/W), No.11Green500 ItemsConfiguration Processors14336 Intel CPUs + 7168 nVIDIA GPUs + 2048FT CPUs Memory262TB in total InterconnectProprietary high-speed interconnecting network StorageGlobal shared parallel storage system, 2PB Cabinets 120 Compute / service Cabinets 14 Storage Cabinets 6 Communication Cabinets
3
3 kinds of LSI chips CPU: FT-1000(PSoC) High radix router ASIC NRC Network interface ASIC NIC 15 kinds of PCB boards Hardware 4 kinds of nodes, 2 sets of network Custom cabinet Water cooling system
4
TH-1A software Software stack
5
Whats the point Customization Optimization Oriented to the system architecture
6
Operating system Kylin Linux compute node kernel Provide virtual running environment Isolated running environments for different users Custom software package installation QoS support Power aware computing
7
Proprietary Interconnection based on high radix router High bandwidth packet and RDMA communication Zero copy user space RDMA MPI base on GLEX: Bandwidth 6.3GB/s Accelerate collective operation with hardware support in communication interface Fault tolerance Rapid error detection in large scale interconnection Rebuild communication links Glex communicating system
8
Resource management, job scheduler Heterogenous resources management and topology-aware scheduling Large scale parallel job launcher (custom) Improve system structure, optimized protocol, network performance, file system performance logN, klogN System power management Automatic CR supporting Accounting Enhance Resource management system
9
Global parallel file system Object storage architecture (Lustre based) Capacity: 2 PB, Scalability Performance: Collective BW Optimized file system protocol over proprietary interconnection network Confliction release for concurrency accessing Fine-grain distributed file lock mechanism Optimized file cache policy Reliability enhancement Fault tolerance of network protocol Data objects placement Soft-raid
10
Compiler system C, C++, Fortran, Java OpenMP, MPI, OpenMP/MPI CUDA, OpenCL Heterogeneous programming framework Accelerate the large scale, complex applications, Use the computing power of CPUs and GPUs, hide the GPU programming to users Inter-node homogeneous parallel programming (users) Intra-node heterogeneous parallel computing (experts)
11
Accelerating HPL (MPI(custom)+OpenMP+Cuda) Accelerating HPL (MPI(custom)+OpenMP+Cuda) Adaptive partition Asynchronous data transfer Software pipeline Affinity scheduling Zero-copy Compiler Optimized
12
Programming environment Virtual running environments Provide services on demand Parallel toolkits Based on Eclipse To integrate all kinds of tools Editor, debugger, profiler Work flow support Support QoS negotiate Reservation
13
Nowadays Our principle Practicality and Usability HPC system software Mature technology correctness and functionality Optimizing technology increase performance, scalarbility and reliability Needs of new technology and theory
14
Programming Language Productive language X10, Chapel, Fortress Global Arrays, …… MPI is still vitality Compiler independence Distributed features MPI + sth
15
Exascale computing Hardware architecture Many cores system Nano-Photonics System Software-- (Adaptive + Intelligent) Extreme parallelism Heterogenous computing Fault-tolerant computing Power management Application software Legacy codes supporting Creative model
16
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.