Programmability Hiroshi Nakashima Thomas Sterling
Key Challenges (1) Parallelism – Expose sufficient parallelism (multi billion-way) – Manage the massive parallelism in ensemble (hierarchy) – Reveal rich form and granularity of parallelism – Efficient exploitation of fine grained parallelism Distribution and resource assignment – Enables exploitation of separate concurrency of action – Need for some kind of global name space Locality management – Reduces latency of access and control – Exposure of object and control affinity
Key Challenges (2) Management of memory hierarchy – Transparent cache misses – Finite cache size and structure – Copy semantics and consistency(?) Latency hiding – Already said locality management – Intrinsic overlap of communication with computation to mitigate impact Hardware idiosyncrasies – E.g., TLB misses – Non-deterministic resolution of shared resource contention – Branch prediction, register renaming, etc.
Key Challenges (3) Legacy codes may not meet requirements for future Exascale systems – Rewrite only once, please. What is the paradigm or execution model for the programming model to satisfy and cooperate with remaining system components? – Distribution of responsibilities across system components Libraries – Code reuse – Decouples performance issue from logical function – Can adapt to your program requirements Should learn about your data structure, not you about library
Key Challenges (4) Interoperability – Between cooperating concurrently executing functionality – Exploit existing legacy codes during transitional periods Minimization of performance sensitivity Robust guarantees of correctness of result Elimination of over constraining synchronization bottlenecks – e.g., global barriers – Lightweight synchronization Re-empower strong scaling Portability – Different systems – Different scale – Different generations
Potential Impact on Software Component Need for new model of computation Programming model reflects user program parallelism Runtime system make available runtime information for decision chain Architecture and runtime minimize overhead to enable useful rich mechanisms for control, cooperation, and sharing Asynchrony management for out of order arrival of data transfers and service completion Guaranteed compound atomic operations for user programmed segments with efficient protection OS protocol to inform runtime system – bi directional exchange
Summary of Research Directions Separation of logical functionality from performance attributes New model of computation Diversity of parallelism forms and sizes Data directed execution Dynamic graph-based problems, encoding, and control New programming models that interoperate with old Dealing with memory hierarchies Advanced runtime systems Requirements for new ultra massive architecture Automatic runtime tuning for heterogeneous architectures
Potential Impact on Usability, Capability, & Breadth of Community Enormous Essential Ease of use Eternal Everyone
4.x Programmability Cross-cutting property of concurrency as it relates to programmability 100 thousand way parallelism Million-way parallelism 10 million-way parallelism 100 million-way parallelism Billion-way parallelism 10 billion-way parallelism Exposed Concurrency
4.x Programmability Technology drivers – Programming models and languages – Compiler analysis, distribution, and allocation – Runtime system software – OS – Architecture structure, semantics, and mechanisms
4.x Programmability Alternative R&D strategies – Models of computation Message passing with multi threaded processes Message-driven work-queue multithreaded – Programming models MPI-8 Event-driven multithreaded with GAS DSP and Declarative – Runtime system software
4.x Programmability Recommended research agenda – Model of computation – Decision chain across system layers – Protocols between successive layers
4.x Programmability Crosscutting considerations – It is one – Performance major hazard for programmabiltiy – Reliability Does the application program play a role in determining response to faults