Download presentation
Presentation is loading. Please wait.
Published byLauren Wilkinson Modified over 8 years ago
1
11 Brian Van Straalen Portable Performance Discussion August 7, 2015. FASTMath SciDAC Institute
2
22 C/C++11 will run on all systems correctly C++ std::thread might not be performant for several more years Fortran 2003 should be working everywhere. Fortran 2008 might be available CoArrays, DO CONCURRENT MPI will be available on CPUs CPU-based machines will have some form of threading OpenMP is the current expected flavor. GPU accelerator machines will support a kernel offload style of computing through CUDA or OpenCL. C/C++/CUDA/OpenCLFortran will link together correctly First, the portable and performant basics
3
33 Multicore CPU compute nodes (Cray XC30, XC40) Multicore CPU host with NVIDIA Accelerator (XE6, XK6, PowerEdge) host and accelerator connected by PCIe bus Early Manycore hosted platforms (BG/Q). Higher count of simpler cores Current hardware landscape
4
44 Developers With the minimum amount of code divergence capture as much of the available performance on a the range of target architectures. Users Application parallelism and library parallelism not mandated MPI Endpoints a possible way for MPI+X to work with flat MPI. Users can decide to adopt a library’s programming model and build environment. It is hard to not subject Users to the Library data structure choices. Developers vs Users
5
55 MPI3+OpenMP4 as a portable programming model Department of Energy vendors recommended BoxLib already uses MPI2+OpenMP3 MPI3 More asynchronous styles available One-sided communication Fewer dynamic load-balancing options than threads default private address spaces OpenMP4 threading, SIMD vector and offload kernel directives Move towards code generator for OpenMP code More I/O abstractions Chombo is going to drink the Kool-aid
6
66 On-core networking and off-core networking will start to merge. System-On-Chip designs are more energy efficient NIC-on-chip is already on the roadmap. NIC and On-Chip are merging. Intel: Sherkar Borkar: Sending messages using source and destination addresses would be the most efficient and sensible approach because all of the hardware necessary to accomplish it is already present. NVIDIA: Steve Oberlin :We can make hardware to accelerate matching for MPI, but it is redundant hardware given address matching logic is already in there MPI Tag matching is ulcer-inducing for next generation hardware designers Heterogenous compute resources but Unified Memory Unified, but certainly not Uniform Chombo will be minimizing reliance on MPI semantics What convergence of hardware can we expect?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.