Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.

Slides:

Advertisements

Similar presentations

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.

Advertisements

Providing Fault-tolerance for Parallel Programs on Grid (FT-MPICH) Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University.

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Lecture 6: Multicore Systems

CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.

GPU System Architecture Alan Gray EPCC The University of Edinburgh.

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.

1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*

Productive Performance Tools for Heterogeneous Parallel Computing Allen D. Malony Department of Computer and Information Science University of Oregon Shigeo.

PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.

IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)

HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825.

1 BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3.

Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.

Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.

Earth Simulator Jari Halla-aho Pekka Keränen. Architecture MIMD type distributed memory 640 Nodes, 8 vector processors each. 16GB shared memory per node.

Legion Worldwide virtual computer. About Legion Made in University of Virginia Object-based metasystems software project middleware that connects computer.

Contemporary Languages in Parallel Computing Raymond Hummel.

1 petaFLOPS+ in 10 racks TB2–TL system announcement Rev 1A.

Panda: MapReduce Framework on GPU’s and CPU’s

HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.

ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.

Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.

Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı

GPU Programming with CUDA – Accelerated Architectures Mike Griffiths

Computer System Architectures Computer System Software

Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.

WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.

Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.

Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Debugging and Profiling GMAO Models with Allinea’s DDT/MAP Georgios Britzolakis April 30, 2015.

GPU in HPC Scott A. Friedman ATS Research Computing Technologies.

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.

Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.

GPU Architecture and Programming

Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,

Large Scale Parallel File System and Cluster Management ICT, CAS.

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.

CSC 7600 Lecture 28 : Final Exam Review Spring 2010 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS FINAL EXAM REVIEW Daniel Kogler, Chirag Dekate.

Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.

© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.

Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.

Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.

Partitioned Multistack Evironments for Exascale Systems Jack Lange Assistant Professor University of Pittsburgh.

International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.

Interconnection network network interface and a case study.

Contemporary Languages in Parallel Computing Raymond Hummel.

Tackling I/O Issues 1 David Race 16 March 2010.

The Tianhe-1A Supercomputer System and Some Applications in NSCC-TJ For CAS2K11 Workshop Associate Prof. Zhang Lilun NUDT.

Background Computer System Architectures Computer System Software.

NCSA Strategic Retreat: System Software Trends Bill Gropp.

Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,

Productive Performance Tools for Heterogeneous Parallel Computing

Enabling Effective Utilization of GPUs for Data Management Systems

Heterogeneous Computation Team HybriLIT

Stallo: First impressions

Structural Simulation Toolkit / Gem5 Integration

CRESCO Project: Salvatore Raia

Cray Announces Cray Inc.

Overview Introduction VPS Understanding VPS Architecture

SiCortex Update IDC HPC User Forum

Wide Area Workload Management Work Package DATAGRID project

Presentation transcript:

Issues of HPC software From the experience of TH-1A Lu Yutong NUDT

TH-1A system Installed in NSCC-TJ, Aug Hybrid MPP structure: CPU & GPU Custom software stack Peak performance 4.7Pflop/s, Linpack 2.566PFlop/s, No.1Top500 Power consumption 4.04MW(635.15MF/W), No.11Green500 ItemsConfiguration Processors14336 Intel CPUs nVIDIA GPUs FT CPUs Memory262TB in total InterconnectProprietary high-speed interconnecting network StorageGlobal shared parallel storage system, 2PB Cabinets 120 Compute / service Cabinets 14 Storage Cabinets 6 Communication Cabinets

3 kinds of LSI chips CPU: FT-1000(PSoC) High radix router ASIC NRC Network interface ASIC NIC 15 kinds of PCB boards Hardware 4 kinds of nodes, 2 sets of network Custom cabinet Water cooling system

TH-1A software Software stack

Whats the point Customization Optimization Oriented to the system architecture

Operating system Kylin Linux compute node kernel Provide virtual running environment Isolated running environments for different users Custom software package installation QoS support Power aware computing

Proprietary Interconnection based on high radix router High bandwidth packet and RDMA communication Zero copy user space RDMA MPI base on GLEX: Bandwidth 6.3GB/s Accelerate collective operation with hardware support in communication interface Fault tolerance Rapid error detection in large scale interconnection Rebuild communication links Glex communicating system

Resource management, job scheduler Heterogenous resources management and topology-aware scheduling Large scale parallel job launcher (custom) Improve system structure, optimized protocol, network performance, file system performance logN, klogN System power management Automatic CR supporting Accounting Enhance Resource management system

Global parallel file system Object storage architecture (Lustre based) Capacity: 2 PB, Scalability Performance: Collective BW Optimized file system protocol over proprietary interconnection network Confliction release for concurrency accessing Fine-grain distributed file lock mechanism Optimized file cache policy Reliability enhancement Fault tolerance of network protocol Data objects placement Soft-raid

Compiler system C, C++, Fortran, Java OpenMP, MPI, OpenMP/MPI CUDA, OpenCL Heterogeneous programming framework Accelerate the large scale, complex applications, Use the computing power of CPUs and GPUs, hide the GPU programming to users Inter-node homogeneous parallel programming (users) Intra-node heterogeneous parallel computing (experts)

Accelerating HPL (MPI(custom)+OpenMP+Cuda) Accelerating HPL (MPI(custom)+OpenMP+Cuda) Adaptive partition Asynchronous data transfer Software pipeline Affinity scheduling Zero-copy Compiler Optimized

Programming environment Virtual running environments Provide services on demand Parallel toolkits Based on Eclipse To integrate all kinds of tools Editor, debugger, profiler Work flow support Support QoS negotiate Reservation

Nowadays Our principle Practicality and Usability HPC system software Mature technology correctness and functionality Optimizing technology increase performance, scalarbility and reliability Needs of new technology and theory

Programming Language Productive language X10, Chapel, Fortress Global Arrays, …… MPI is still vitality Compiler independence Distributed features MPI + sth

Exascale computing Hardware architecture Many cores system Nano-Photonics System Software-- (Adaptive + Intelligent) Extreme parallelism Heterogenous computing Fault-tolerant computing Power management Application software Legacy codes supporting Creative model

Thank you