11 Brian Van Straalen Portable Performance Discussion August 7, 2015. FASTMath SciDAC Institute.

11 Brian Van Straalen Portable Performance Discussion August 7, 2015. FASTMath SciDAC Institute

22  C/C++11 will run on all systems correctly C++ std::thread might not be performant for several more years  Fortran 2003 should be working everywhere. Fortran 2008 might be available  CoArrays, DO CONCURRENT  MPI will be available on CPUs  CPU-based machines will have some form of threading OpenMP is the current expected flavor.  GPU accelerator machines will support a kernel offload style of computing through CUDA or OpenCL.  C/C++/CUDA/OpenCLFortran will link together correctly First, the portable and performant basics

33  Multicore CPU compute nodes (Cray XC30, XC40)  Multicore CPU host with NVIDIA Accelerator (XE6, XK6, PowerEdge) host and accelerator connected by PCIe bus  Early Manycore hosted platforms (BG/Q). Higher count of simpler cores Current hardware landscape

44  Developers With the minimum amount of code divergence capture as much of the available performance on a the range of target architectures.  Users Application parallelism and library parallelism not mandated  MPI Endpoints a possible way for MPI+X to work with flat MPI. Users can decide to adopt a library’s programming model and build environment.  It is hard to not subject Users to the Library data structure choices. Developers vs Users

55  MPI3+OpenMP4 as a portable programming model Department of Energy vendors recommended BoxLib already uses MPI2+OpenMP3  MPI3 More asynchronous styles available One-sided communication Fewer dynamic load-balancing options than threads default private address spaces  OpenMP4 threading, SIMD vector and offload kernel directives Move towards code generator for OpenMP code  More I/O abstractions Chombo is going to drink the Kool-aid

66  On-core networking and off-core networking will start to merge. System-On-Chip designs are more energy efficient NIC-on-chip is already on the roadmap.  NIC and On-Chip are merging.  Intel: Sherkar Borkar: Sending messages using source and destination addresses would be the most efficient and sensible approach because all of the hardware necessary to accomplish it is already present.  NVIDIA: Steve Oberlin :We can make hardware to accelerate matching for MPI, but it is redundant hardware given address matching logic is already in there  MPI Tag matching is ulcer-inducing for next generation hardware designers  Heterogenous compute resources but Unified Memory Unified, but certainly not Uniform  Chombo will be minimizing reliance on MPI semantics What convergence of hardware can we expect?

11 Brian Van Straalen Portable Performance Discussion August 7, 2015. FASTMath SciDAC Institute.

Similar presentations

Presentation on theme: "11 Brian Van Straalen Portable Performance Discussion August 7, 2015. FASTMath SciDAC Institute."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

11 Brian Van Straalen Portable Performance Discussion August 7, 2015. FASTMath SciDAC Institute.

Similar presentations

Presentation on theme: "11 Brian Van Straalen Portable Performance Discussion August 7, 2015. FASTMath SciDAC Institute."— Presentation transcript:

Similar presentations

About project

Feedback