OpenMP on HiHAT James Beyer, 18 Sep 2017.

Slides:



Advertisements
Similar presentations
OpenMP.
Advertisements

OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
Parallel Processing with OpenMP
Profiling your application with Intel VTune at NERSC
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 2 nd Edition Chapter 4: Threads.
OpenMP Andrew Williams References Chandra et al, Parallel Programming in OpenMP, Morgan Kaufmann Publishers 1999 OpenMP home:
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
CS 470/570 Lecture 7 Dot Product Examples Odd-even transposition sort More OpenMP Directives.
Programming with Shared Memory Introduction to OpenMP
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
Executing OpenMP Programs Mitesh Meswani. Presentation Outline Introduction to OpenMP Machine Architectures Shared Memory (SMP) Distributed Memory MPI.
1 Datamation Sort 1 Million Record Sort using OpenMP and MPI Sammie Carter Department of Computer Science N.C. State University November 18, 2004.
Parallel Programming in Java with Shared Memory Directives.
Lecture 5: Shared-memory Computing with Open MP. Shared Memory Computing.
Lecture 8: Caffe - CPU Optimization
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
OpenMP China MCP.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
OpenMP Blue Waters Undergraduate Petascale Education Program May 29 – June
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
Extending Open64 with Transactional Memory features Jiaqi Zhang Tsinghua University.
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
OpenMP fundamentials Nikita Panov
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Processes.
DEV490 Easy Multi-threading for Native.NET Apps with OpenMP ™ and Intel ® Threading Toolkit Software Application Engineer, Intel.
OpenCL Programming James Perry EPCC The University of Edinburgh.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Threaded Programming Lecture 2: Introduction to OpenMP.
Heterogeneous Computing using openMP lecture 2 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
COMP7330/7336 Advanced Parallel and Distributed Computing OpenMP: Programming Model Dr. Xiao Qin Auburn University
Martin Kruliš by Martin Kruliš (v1.1)1.
Tutorial 4. In this tutorial session we’ll see Threads.
OpenMP Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Distributed and Parallel Processing George Wells.
Introduction to OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
Chapter 5: Process Synchronization – Part 3
Shared Memory Parallelism - OpenMP
Lecture 5: Shared-memory Computing with Open MP
CS427 Multicore Architecture and Parallel Computing
Thread Programming.
Chapter 3: Processes.
Introduction to OpenMP
Shared-Memory Programming
SHARED MEMORY PROGRAMMING WITH OpenMP
Computer Science Department
OpenMP Quiz B. Wilkinson January 22, 2016.
Multi-core CPU Computing Straightforward with OpenMP
Lab. 3 (May 11th) You may use either cygwin or visual studio for using OpenMP Compiling in cygwin “> gcc –fopenmp ex1.c” will generate a.exe Execute :
Thread Programming.
Programming with Shared Memory Introduction to OpenMP
Using OpenMP offloading in Charm++
Programming with Shared Memory
Introduction to OpenMP
OpenMP Quiz.
Programming with Shared Memory
Tutorial 4.
OpenMP Martin Kruliš.
CUDA Fortran Programming with the IBM XL Fortran Compiler
OpenMP Parallel Programming
Presentation transcript:

OpenMP on HiHAT James Beyer, 18 Sep 2017

Agenda OpenMP inside HiHAT task OpenMP runtime as HiHAT scheduler 1 OpenMP affinity HiHAT resource subsets OpenMP runtime as HiHAT Scheduler 2 Agenda

OpenMP inside HiHAT task Implement a test that shows a HiHAT task that contains an OpenMP parallel region Compile OpenMP inside a function Build closure around function hhnMkClosure(hotTeamHandle, blob, 0, &hotTeamClosure); //warm up a team hhuInvoke(hotTeamClosure, exec_pol, exec_cfg, CPU0_resrc, NULL, &invokeHandle); zeroBlob(blob, sizeof(blob)); blob[0] = &argI; hhnMkClosure(doParallelHandle, blob, 0, &parallelTeamClosure); //use the hot team hhuInvoke(parallelTeamClosure, exec_pol, exec_cfg, CPU0_resrc, NULL, &invokeHandle);

OpenMP inside HiHAT task void prepareHotTeam( void **ignored ) { // we can do anything we want here, affinity, memory allocations whatever #pragma omp parallel {} } void doParallelWork( void **args ) int i = *(int*)args[0]; int ii; #pragma omp atomic capture ii = i += 1; printf("hello from thread %d; ii = %d\n", omp_get_thread_num(), ii); } // end parallel

OpenMP runtime as HiHAT scheduler 1 Implement a test that shows how HiHAT could be integrated into an OpenMP runtime to replace the launch system within the runtime Replace GOMP_parallel with HiHAT_omp_parallel HiHAT_omp_parallel Takes same closure GOMP_parallel would have Repackages closure into HiHAT closure Invokes GOMP_parallel via HiHAT First step towards HiHAT enabled OpenMP offloading RT

OpenMP runtime as HiHAT scheduler 1 HiHAT_omp_parallel int HiHAT_omp_parallel(void (*fn) (void*), void *data, unsigned num_threads, unsigned int flags) { void *blob[4]; hhClosure closure; hhActionHndl invokeHandle; if ( !trampolineRegistered ) { hhnRegFunc(HiHat_Trampoline, CPU0_resrc, 0, &trampolineHndl); } // build closure printf("build blob!\n"); blob[0] = fn; blob[1] = data; blob[2] = &num_threads; blob[3] = &flags; hhnMkClosure(trampolineHndl, blob, 0, &closure); // invoke closure // shouldn't this be a pointer to a closure rather than a structure pass? hhuInvoke(closure, exec_pol, exec_cfg, CPU0_resrc, NULL, &invokeHandle);

OpenMP runtime as HiHAT scheduler 1 HiHat_Trampoline void HiHat_Trampoline(void** blob) { void (*fn) (void*) = blob[0]; void *data = blob[1]; unsigned num_threads = *(unsigned*)blob[2]; unsigned flags = *(unsigned*)blob[3]; fprintf(stderr, "%s calling GOMP\n", __FUNCTION__); GOMP_parallel(fn, data, num_threads, flags); }

OpenMP affinity Characterize implementation support for OpenMP affinity GCC OpenMP 3.1 affinity readily available OpenMP 4+ affinity complete but not default on most Linux distros CLANG Intel – propriety package OpenMP 4+ support Proprietary system as well

HiHAT resource subsets Add support to HiHAT for CPU subsets OpenMP unaware of HiHAT so must be forced to use available resources Investigating possible solutions Cpu sets Libnuma Kmp affinity interfaces hwloc

OpenMP runtime as HiHAT Scheduler 2 One closure two targets Build simple OpenMP scheduler around HiHAT to decide where to run a task. Need mechanism to correlate function pointer for one device to function pointer for another device Two data marshaling stages no data marshalling required, limited compute options Data movement mechanism to marshal data to correct processor Rebuild closure inside of runtime as needed to run on desired device

Summary OpenMP inside HiHAT task OpenMP runtime as HiHAT scheduler 1 OpenMP affinity HiHAT resource subsets OpenMP runtime as HiHAT Scheduler 2 Summary