Presentation is loading. Please wait.

Presentation is loading. Please wait.

HPX-5 ParalleX in Action

Similar presentations


Presentation on theme: "HPX-5 ParalleX in Action"— Presentation transcript:

1 HPX-5 ParalleX in Action
Martin Swany Associate Chair and Professor, Intelligent Systems Engineering Deputy Director, Center for Research in Extreme Scale Technology (CREST) Indiana University

2 ParalleX Execution Model
Core Tenets Fine grained parallelism Hide latency with concurrency Runtime introspection and adaptation Formal components Global address space (shared memory programming) Processes Compute complexes Lightweight control objects Parcels Fully flexible but promotes fine-grained dataflow programs HPX-5 based on ParalleX and is part of the Center for Shock-Wave Processing of Advanced Reactive Materials (C-SWARM) effort in PSAAP-II

3 Model: Global Address Space
Flat byte-addressable global addresses Put/get with local and remote completion Active message targets Array collectives Controls thread distribution and load balance Current implementation Block-based allocation Malloc/free with distribution (local, cyclic, user, etc) Traditional PGAS or directory-based AGAS High performance local allocation (high frequency LCO allocation) Soft core affinity for NUMA

4 Model: Parcels Active messages with continuations Target data
action, global address, immediate data Continuation data action, global address lco_set, lco_delete, memput, free, etc Execute local to target address Unified local and remote execution model send() equiv to thread_create()

5 Model: User-level threads
Cooperative threads Block on dynamic dependencies (lco_get, memput, etc) Continuation passing style Progenitor parcel specifies continuation target, action Thread “continues” value Call/cc “pushes” continuation parcel Isomorphic with parcels

6 Model: Local Control Objects
Abstract synchronization interface Unified local/remote access Threads get, set, wait, reset, compound ops Parcel sends dependent on Built-in classes Futures, reductions, generation counts, semaphores, … User defined classes Initialize, set handler, predicate Colocates data with control and synchronization Implement dataflow with parcel continuations

7 Control: Parallel Parcels and Threads
Serial work thread_continue thread_call/cc happens-before Thread 1 < Thread 2 Parallel work parcel_send unordered Thread 1 <> Thread 4 Higher level hpx_call local parfor hierarchical parfor Thread 2 Thread 1 thread_continue(x) parcel_send(q) parcel_send(r) Thread 4 parcel_send(p)

8 Control: LCO Synchronization
Thread-thread synchronization Traditional monitor style synchronization Dynamic output dependencies Blocked threads as continuations Data-flow execution Pending parcels as continuations Execution ”consumes” output Can be manually regenerated for iterative execution Generic user-defined Any set of continuations Any function and predicate Lazy evaluation of function lco_set lco_get future parcel_send(p) and lco_set(a) lco_get lco_set(b) f(a, b, …, x); pred(); parcel_send(p) lco_set(x)

9 Data Structures, Distribution
Global linked data structures Graphs, trees, DAGs Global cyclic block arrays locality(block address) Global user-defined distributions locality[block address] Active GAS Distributed directory allows blocks to be dynamically remapped from their home localities. Application-specific explicit load balancing Automatic load balancing through GAS tracing and graph partitioning (slow)

10 Fibonacci fib(n) = fib(n-1) + fib(n-2) HPX_ACTION_DECL(fib);
int fib_handler(int n) {   if (n < 2) { return HPX_THREAD_CONTINUE(n); } // sequential   int l = n - 1;   int r = n - 2;   hpx_addr_t lhs = hpx_lco_future_new(sizeof(int)); // GAS malloc   hpx_addr_t rhs = hpx_lco_future_new(sizeof(int)); // GAS malloc   hpx_call(HPX_HERE, fib, lhs, l); // parallel   hpx_call(HPX_HERE, fib, rhs, r); // parallel   hpx_lco_get(lhs, sizeof(int), &l); // LCO synchronization   hpx_lco_get(rhs, sizeof(int), &r); // LCO synchronization   hpx_lco_delete_sync(lhs); // GAS free   hpx_lco_delete_sync(rhs); // GAS free   int fn = l + r;   return HPX_THREAD_CONTINUE(fn); // sequential } HPX_ACTION(HPX_DEFAULT, 0, fib, fib_handler, HPX_INT); fib(n) = fib(n-1) + fib(n-2)

11 Networking / Comms Internal interfaces Photon Isend/Irecv
Preferred: put/get with remote completion Legacy: parcel send Photon rDMA put/get with remote completion operations Native PSM (libfabric), IB verbs, uGNI, sockets (libfabric) Parcel emulation through eager buffers Synchronized with fine-grained point-to-point locking Isend/Irecv MPI_THREAD_FUNNELED implementation PWC emulated through Isend/Irecv Portability, legacy upgrade path

12 Networking / Comms A key idea in the Photon library - Put/Get with Completion Minimal overhead to trigger waiting thread via LCO useful paradigm when combined with an “unexpected active message” capability Essentially attach parcel continuations (either already-running threads or yet-to-be-instantiated parcels) to both local and remote completion operations

13 Networking / Comms One of the key lessons in HPX-5 is the power of memget, memput with completion primitives (with associated low-level photon_pwc and photon_gwc) provides a very powerful abstraction One-sided operations in AMTs are not themselves that useful The ability to continue threads or spawn parcels provides performance improving functionality

14 Thank you hpx.crest.iu.edu


Download ppt "HPX-5 ParalleX in Action"

Similar presentations


Ads by Google