Download presentation
Presentation is loading. Please wait.
Published byAlexis Fleming Modified over 9 years ago
1
LLNL-PRES-653431 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC Salishan 4/24/2014
2
Lawrence Livermore National Laboratory LLNL-PRES-653431 2 Tuning large complex applications for each hardware generation is impractical Performance Productivity Code Base Size Solutions must be general, adaptable to the future and maintainable
3
Lawrence Livermore National Laboratory LLNL-PRES-653431 3 What drives these designs? Handheld mobile device weight and battery life Exascale power goals Power cost Lower power reduces performance and reliability Power vs. frequency for Intel Ivy Bridge
4
Lawrence Livermore National Laboratory LLNL-PRES-653431 4 Chips operating near threshold voltage encounter More transient errors More hard errors Checkpoint restart is our current reliability mechanism
5
Lawrence Livermore National Laboratory LLNL-PRES-653431 5 Complex power saving features SIMD and SIMT Multi-Level memory systems Heterogeneous systems In-Package Memory Memory Processing Multi-Core CPU In-Package Memory GPU Memory Processing NVRAM Exploiting these features is difficult
6
Lawrence Livermore National Laboratory LLNL-PRES-653431 6 No production GPU or Xeon Phi code GPU and Xeon Phi optimizations are different No production codes explicitly manage on-node data motion Less than 10% of our FLOPs use SIMD units, even with the best compilers Architecture dependent data layouts may hinder the compiler Mechanisms are needed to isolate architecture specific code
7
Lawrence Livermore National Laboratory LLNL-PRES-653431 7 We add directives to existing codes where portable Multi-level memory handled by OS, runtime or used as a cache We continue to get little SIMD and probably a bit better SIMT parallelism Overall performance improvement is incremental at best
8
Lawrence Livermore National Laboratory LLNL-PRES-653431 8 Are our algorithms well suited for future machines? Can we rewrite our data structures to match future machines? We will address these questions in the next few slides
9
Lawrence Livermore National Laboratory LLNL-PRES-653431 9 Loop fusion Make each operator a single sweep over a mesh Data structure reorganization Reduce mallocs or Use better libraries LULESH BG/Q However, better implementations only get us 2-3x
10
Lawrence Livermore National Laboratory LLNL-PRES-653431 10 Throughput optimized processors execute serial sections slowly Design codes with limited serial sections Better runtime support is needed to reduce serial overhead OpenMP Malloc Libraries Use latency optimized processor for what remains
11
Lawrence Livermore National Laboratory LLNL-PRES-653431 11 More parallelism exists in current algorithms than we exploit today Code changes are required to express parallelism more clearly SIMT or SIMD with HW Gather/Scatter are easier to exploit LULESH Sandy Bridge Bandwidth constraints will eventually limit us
12
Lawrence Livermore National Laboratory LLNL-PRES-653431 12 Many of today’s apps need 0.5-2 bytes for every FLOP performed.
13
Lawrence Livermore National Laboratory LLNL-PRES-653431 13 Excess FLOPs
14
Lawrence Livermore National Laboratory LLNL-PRES-653431 14 More FLOPs per byte Small dense operations More accurate Potentially more robust and better symmetry preservation B to F Requirement vs. Algorithmic Order
15
Lawrence Livermore National Laboratory LLNL-PRES-653431 15 How do you use the FLOPs efficiently? What does high-order accuracy mean when a there is a shock? Can you couple all they physics we need at high-order? We are working to answer these, but whether we use new algorithms or our current ones there is a pervasive challenge…
16
Lawrence Livermore National Laboratory LLNL-PRES-653431 16
17
Lawrence Livermore National Laboratory LLNL-PRES-653431 17 Mechanisms are needed to isolate non-portable optimizations
18
Lawrence Livermore National Laboratory LLNL-PRES-653431 18 RAJA, Kokkos and Thrust allow portable abstractions in today’s codes Charm++ Liszt
19
Lawrence Livermore National Laboratory LLNL-PRES-653431 19 Algorithms Programming Models Architectures RAJA Today’s High Order
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.