Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chandra S. Martha Min Lee 02/10/2016

Similar presentations


Presentation on theme: "Chandra S. Martha Min Lee 02/10/2016"— Presentation transcript:

1 Chandra S. Martha Min Lee 02/10/2016
“MPI+X” support in OCR Chandra S. Martha Min Lee 02/10/2016 Acknowledgment: This material is based upon work supported by the Department of Energy Office of Science under cooperative agreement DE-SC and Lawrence Livermore National Labs subcontract B

2 OCR Today – Still Evolving
OCR: The good Event-driven Task-based runtime  Resiliency, Introspection, Dynamic Load balancing Platform agnostic: Compute resources are virtualized. Can run on top of homogeneous / heterogeneous set of compute resources The bad: Still in prototype phase Today’s OCR API is very limited. No rich API features. Not a high-level language. As a result, programmer needs to worry about managing a lot of “OCR” objects: Events, Datablocks and Tasks. This leads to a lot of “Races” and garbage collection gets tricky. Machine model is not abstracted clearly yet. No easy way to create millions of tasks and impromptu collectives Not easy to understand native OCR apps (resembles spaghetti code as the task DAG can be expressed in very convoluted ways) Composability: Hard; Need to understand the DAG fairly well to change the app

3 Effective migration via OCR to Exascale
For better community adoption: Let’s have OCR support legacy, intermediate and revolutionary style of refactoring effort Legacy: MPI-lite on top of OCR Subset of MPI supported. Long-lived EDTs. Intermediate: Enable “MPI+X” type of style directly on top of native OCR Should work well for statistically load-balanced, domain-decomposition type of workloads Revolutionary: Native OCR apps exposing full level of parallelism to OCR Should work well for dynamic, load-imbalance workloads (Graph500, etc.) The focus of this talk is about extending OCR API to support “MPI+X” style natively.

4 “MPI+X” via OCR to enable Migration to Exascale
DOE is enamored with “MPI+X” line of thinking – especially for domain-decomposition based applications. So, let’s take the best features of MPI: Machine model: A cluster of communicating “end-points” Asynchronous communications (i.e., No bulk-synchronous communications) And best features of “X”  Tasking, Data-parallel tasks (Note: Ideally, we would like to have “X” supported beyond a coherency domain; “X” scope can be a set of homogenous or heterogeneous set of compute resources)

5 “MPI+X” via OCR to enable Migration to Exascale
Let’s borrow some features of MPI into OCR View MPI NOT as a runtime but a “standard” to enable message passing Ordered set of MPI processes in a group  Ordered set of SPMD EDTs in a group Asynchronous point-to-point and collectives: Enable communication among the SPMD EDTs through “implicit” dependencies (ranks, tags, communication world, etc.) Provide default communicator contexts: COMM_WORLD_VIRTUAL_NODES A virtual node = User defined to suit the application(?), OCR might dynamically “grow”/”shrink” the node size Virtual node = a coherency domain, a bunch of coherency domains, a TG chip, a Xeon node with a Co-processor Borrow some features of X (OpenMP/OpenCL/Kokkos, etc.) for “OCR-lite”: Tasks with explicit dependencies (oversubscription) Virtualized tasks (can be mapped to a uniform/hetero node – keep all the resources busy) Data-sharing constructs, loop schedulers, etc. Preserve affinity relationship: All tasks, datablocks, events (OCR objects) should inherit the affinity properties from the parent SPMD EDT.

6 “MPI+X” via OCR to enable Migration to Exascale
Let’s see what we might need in OCR to support “MPI+X” style Augment “OCR” API to support “MPI+X” style of programming “MPI + X” now becomes “OCR + OCR-lite” “OCR + OCR-lite”: Machine model: OCR across “virtual” node + “OCR-lite” within the “virtual” node Virtual Node Virtual Node Virtual Node Virtual Node Virtual Node Virtual Node Virtual Node Virtual Node

7 “MPI+X” via OCR to enable Migration to Exascale
The application DAG now becomes OCR-lite: A local DAG with most edges staying within the virtual node OCR: Some edges crossing the virtual node boundaries to enable communication Note that most of DAG edges are within a virtual node. Virtual Node Virtual Node Virtual Node Virtual Node Virtual Node Virtual Node Virtual Node Virtual Node

8 SPMD EDTs SPMD EDTs are created through Hints:
ocrTemplateCreate( &SPMD_Template, FNC_SPMD, paramc, depc ); //Set up a strong “hint” for the runt-time about SPMD EDT ocrHint_t SPMD_hint; ocrHintInit( &SPMD_hint, OCR_HINT_EDT_T ); nRanks = 1xe6; ocrSetHintValue( &SPMD_hint, OCR_HINT_EDT_NRANKS, nRanks ); ocrSetHintValue( &SPMD_hint, OCR_HINT_EDT_COMM, OCR_COMM_WORLD_PDS /*Or a derived communicator*/ ); //E.g., Communication world of OCR policy domains ocrEdtCreate( &SPMD_EDT_labelled_guid, SPMD_Template, EDT_PARAM_DEF, paramv, EDT_DEP_DEF, depv, EDT_PROP_SPMD, &SPMD_hint, &EVT_out_labelled_guid ); SPMD SPMD SPMD SPMD

9 SPMD EDT: Usage scenario – (1/3 slides)
ocrGuid_t FNC_SPMD( u32 paramc, u64* paramv, u32 depc, ocrEdtDep_t depv[] ) { //Get the SPMD hint back; and Use it to figure out which communicator context this EDT belongs to; ocrCommSize( OCR_COMM_WORLD_PDs, &nRanks); ocrCommRank( OCR_COMM_WORLD_PDs, &myEdtRank ); if( myEdtRank %2 == 0 ) ocrTemplateCreate( &TML_ocrSend, FNC_ocrSend, paramc, depc ); //OCR defines “FNC_ocrSend” as part of the API; Hence, paramv, depv must follow the spec. See below. paramv[] = {count, datatype, destinationEDTRank, tag, OCR_COMM_WORLD_PDs}; depv[] = {DBK_sendbuf, EVT_trigger}; ocrEdtCreate( &EDT_ocrSend, TML_ocrSend, EDT_PARAM_DEF, &paramv, EDT_DEP_DEF, &depv, EDT_PROP_COMM, NULL_HINT, &EVT_OUT_ocrSend ); } //Continued in next slide

10 SPMD EDT: Usage scenario (cont.) – (2/3 slides)
ocrGuid_t FNC_SPMD( u32 paramc, u64* paramv, u32 depc, ocrEdtDep_t depv[] ) //Continued from previous slide if( myEdtRank %2 == 1 ) { ocrTemplateCreate( TML_ocrRecv, FNC_ocrRecv, paramc, depc ); //OCR defines “FNC_ocrRecv” as part of the API; Hence, paramv, depv must follow the spec. See below. paramv[] = {count, datatype, sourceEDTRank, tag, OCR_COMM_WORLD_PDs}; depv[] = {DBK_Recvbuf, EVT_trigger}; ocrEdtCreate( &EDT_ocrRecv, TML_ocrRecv, EDT_PARAM_DEF, &paramv, EDT_DEP_DEF, &depv, EDT_PROP_COMM, NULL_HINT, &EVT_OUT_ocrRecv ); } //Continued in next slide

11 SPMD EDT: Usage scenario (cont.) – (3/3 slides)
ocrGuid_t FNC_SPMD( u32 paramc, u64* paramv, u32 depc, ocrEdtDep_t depv[] ) //Continued from previous slide ocrTemplateCreate( TML_ocrBarrier, FNC_ocrBarrier, paramc, depc ); //OCR defines “FNC_ocrBarrier” as part of the API; Hence, paramv, depv must follow the spec. See below. paramv[] = {OCR_COMM_WORLD_PDs}; depv[] = {EVT_trigger}; ocrEdtCreate( &EDT_ocrBarrier, TML_ocrBarrier, EDT_PARAM_DEF, &paramv, EDT_DEP_DEF, &depv, EDT_PROP_COMM, NULL_HINT, &EVT_OUT_ocrBarrier ); if( myEdtRank == 0 ) { //Create a wrap-task that depends on the event: EVT_OUT_ocrBarrier //which calls ocrShutDown(); }


Download ppt "Chandra S. Martha Min Lee 02/10/2016"

Similar presentations


Ads by Google