Download presentation
Presentation is loading. Please wait.
Published byLouise Griffith Modified over 8 years ago
1
Benefits
2
CAAR Project Phases Each of the CAAR projects will consist of a: 1.three-year Application Readiness phase (2015-17) in which the code refactoring and porting work will take place and an 2.Early Science phase (2018) for tuning of the code to the Summit architecture and demonstration of the application through a scientific grand-challenge project.
3
Partnership Teams The partnership teams, consisting of the core developers of the application and staff from the OLCF that will be assigned to the project
4
Partnership team responsibilities 1.Develop a technical application porting and performance improvement plan with reviewable milestones for the Application Readiness phase of the project 2.Develop a management plan with clear description of responsibilities of the CAAR team consisting of the core application developers, the Scientific Computing staff member assigned by the OLCF and the OLCF postdoctoral fellow, that will carry out the code optimization, refactoring, testing and profiling of the application 3.Develop a compelling scientific grand-challenge campaign for the Early Science phase of the project 4.Assign an application scientist who, together with the CAAR team, will carry out the Early Science campaign 5.Prepare the necessary documentation for semi-annual reviews of achieved milestones, and intermediate and final reports
5
Partnership resources 1.The core development team of the application, with a stated level of effort dedicated to the partnership 2.An ORNL Scientific Computing staff member, who will partner with the core application development team to jointly carry out the code profiling and optimization tasks, The OLCF commits a minimum of a third FTE per year to the partnership 3.A full-time postdoctoral fellow, located and mentored at the OLCF, who will engage with the CAAR team for code profiling, optimization and execution of the science challenge 4.Allocation of compute resources on Titan 5.Allocation of compute resources at the ALCF and at NERSC to enable performance portability to multiple architectures 6.Support from the IBM/NVidia Center of Excellence staff at the ORNL as needed 7.Access to early delivery systems and the Summit system as they become available 8.Allocation of compute resources on the full Summit system for the Early Science campaign
6
Support Provided Support from the IBM/NVIDIA Center of Excellence at Oak Ridge National Laboratory, and have access to computational resources including Titan at OLCF, Mira at ALCF and Edison and Cori at NERSC, early delivery systems and Summit as they become available.
7
CAAR Partnership Activities 1.Common training of all Application Readiness teams a.Architecture and performance portability b.Avoidance of duplicate efforts 2.Application Readiness Technical Plan Development and Execution a.Code analysis & benchmarking to understand application characteristics: code structure, code suitability for architecture port, algorithm structure, data structures and data movement patterns, code execution characteristics (“hot spots” or “flat” execution profile) b.Develop parallelization and optimization approach to determine the algorithms and code components to port, how to map algorithmic parallelism to architectural features, how to manage data locality and motion c.Decide on programming model such as compiler directives, libraries, explicit coding models d.Execute technical plan– benchmarking, code rewrite or refactor, porting and testing, managing portability, managing inclusion in main code repository 3.Development and Execution of and Early Science Project, i.e., challenging science problem that demonstrates the performance and scientific impact of the developed application port
8
Selection criteria CAAR projects will be selected on the basis 1.anticipated impact on the science and engineering fields, 2.the importance to the user programs of the OLCF, 3.the feasibility to achieve scalable performance on Summit, 4.the anticipated opportunity to achieve performance portability for other architectures, 5.the algorithmic and scientific diversity of the suite of CAAR applications. Decisions will be made by the OLCF Scientific Computing staff, in consultation with the IBM/NVidia Center of Excellence at Oak Ridge National Laboratory and the DOE Office of Advanced Scientific Computing Research.
9
Selection criteria 1.Anticipated impact on the science and engineering fields 2.Importance to the user program of the OLCF 3.Feasibility to achieve scalable performance on Summit 4.Anticipated opportunity to achieve performance portability for other architectures 5.Algorithmic and scientific diversity of the suite of CAAR applications 6.Optimizations incorporated into master repository 7.Size of the application’s user base
10
Portability
11
Performance portability to other architectures is an important consideration, and the CAAR is collaborating with the Argonne Leadership Computing Facility (ALCF) and the National Energy Research Supercomputing Center (NERSC) to enhance application portability across their respective architectures.
12
Portability Application portability among NERSC, ALCF and OLCF architectures is critical concern of ASCR Application developers target wide range of architectures Maintaining multiple code version is difficult Porting to different architectures is time-consuming Many Principal Investigators have allocations on multiple resources Applications far outlive any computer system Primary task is exposing parallelism and data locality Primary task is exposing parallelism and data locality
13
Summit System
14
Summit Architecture The architecture of Summit will consist of nodes with multiple IBM Power-9 CPUs and NVIDIA Volta GPU accelerators, using a coherent memory space that includes high bandwidth memory (HBM) on the GPUs and a high speed NVLink interconnect between the POWER9 CPU and Volta GPUs. Internode communication will be through a Mellanox InfiniBand EDR interconnect. The peak performance of this system is expected to be five to ten times that of Titan.
15
Summit Architecture Approximately 3,400 nodes, each with: Multiple IBM POWER9 CPUs and multiple NVIDIA Tesla® GPUs using the NVIDIA Volta™ architecture CPUs and GPUs completely connected with high speed NVLink™ Large coherent memory: over 512 GB (HBM + DDR4) –all directly addressable from the CPUs and GPUs An additional 800 GB of NVRAM, which can be configured as either a burst buffer or as extended memory over 40 TF peak performance Dual-rail Mellanox® EDR-IB full, non-blocking fat-tree interconnect IBM Elastic Storage (GPFS™) - 1TB/s I/O and 120 PB disk capacity.
16
Summit System Software System –Linux® –IBM Elastic Storage (GPFS™) –IBM Platform Computing™ (LSF) –IBM Platform Cluster Manager™ (xCAT)
17
Programming Environment –Compilers supporting OpenMP, OpenACC, CUDA IBM XL, PGI, LLVM, GNU, NVIDIA –Libraries IBM Engineering and Scientific Subroutine Library (ESSL) FFTW, ScaLAPACK, PETSc, Trilinos, BLAS-1,-2,-3, NVBLAS cuFFT, cuSPARSE, cuRAND, NPP, Thrust –Debugging Allinea DDT, IBM Parallel Environment Runtime Edition (pdb) Cuda-gdb, Cuda-memcheck, valgrind, memcheck, helgrind, stacktrace –Profiling IBM Parallel Environment Developer Edition (HPC Toolkit) VAMPIR, Tau, Open|Speedshop, nvprof, gprof, Rice HPCToolkit
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.