Presentation is loading. Please wait.

Presentation is loading. Please wait.

CDSC/InTrans Review Oct , 2016 Student names: Martin Kong, OSU/Rice

Similar presentations


Presentation on theme: "CDSC/InTrans Review Oct , 2016 Student names: Martin Kong, OSU/Rice"— Presentation transcript:

1 PIPES: A Language and Compiler for Task-based Programming on Distributed-Memory Clusters
CDSC/InTrans Review Oct , 2016 Student names: Martin Kong, OSU/Rice Faculty names: Louis-Noel Pouchet, P. Sadayappan, Vivek Sarkar Dept. of Computer Science OSU / Rice / CSU df Example: SGEMM with Cannon algorithm PIPES: a macro-dataflow language and compiler to specify parallel/distributed algorithms Main contributions: PIPES language: a dataflow-inspired language derived from CnC/DFGL, enriched with constructs for task placement / scheduling, and communication specifications PIPES compiler: optimize the polyhedral subset of PIPES, for automatic coarsening and coalesing; and translate to Intel CnC C++ runtime tuners to implement the mapping described Example: SSYR2K Can be seen as a sequence of 2 GEMM calls: GEMM(C, B, trans(A)); GEMM(C, A, trans(B)) Intel CnC is a powerful runtime system… It implements the semantics of Concurrent Collections (CnC) Only a “task” graph is needed as input The runtime decides scheduling/placement/communication policies …But to obtain high performance one needs to be able to: Specify (partial) task placement and communication strategies Adapt the granularity of tasks to the target machine Perform auto-tuning of the implementation, for max. performance PIPES: Language + Compiler to exploit CnC-like runtimes Compact/expressive language to describe task dataflow, communications, … Advanced analysis and transformations of the task graph (e.g., coarsening) Automatic code generation for Intel CnC runtime tuners - Research cluster at OSU, peak SP performance: ~1200 GF/s for 8 nodes. - Problem: single precision, matrices of 8000x8000 - Various coarsening factors explored via auto-tuning, best found per #proc reported above - Intel MKL used in the task bodies M. Kong, L.N. Pouchet, P. Sadayapan and V. Sarkar, “PIPES: A Language and Compiler for Task-based Programming on Distributed-Memory Clusters” (IEEE/ACM Supercomputing 2016)


Download ppt "CDSC/InTrans Review Oct , 2016 Student names: Martin Kong, OSU/Rice"

Similar presentations


Ads by Google