OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

Slides:

Advertisements

Similar presentations

Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters

Advertisements

Parallel Processing with OpenMP

Introductions to Parallel Programming Using OpenMP

The OpenUH Compiler: A Community Resource Barbara Chapman University of Houston March, 2007 High Performance Computing and Tools Group

Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

Threads. Objectives To introduce the notion of a thread — a fundamental unit of CPU utilization that forms the basis of multithreaded computer systems.

Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.

1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*

Modified from Silberschatz, Galvin and Gagne ©2009 Lecture 7 Chapter 4: Threads (cont)

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 2 nd Edition Chapter 4: Threads.

Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.

CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.

Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.

4.7.1 Thread Signal Delivery Two types of signals –Synchronous: Occur as a direct result of program execution Should be delivered to currently executing.

Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.

Contemporary Languages in Parallel Computing Raymond Hummel.

C66x KeyStone Training OpenMP: An Overview.  Motivation: The Need  The OpenMP Solution  OpenMP Features  OpenMP Implementation  Getting Started with.

Hossein Bastan Isfahan University of Technology 1/23.

14.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 4: Threads.

Computer System Architectures Computer System Software

High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011.

CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.

國立台灣大學資訊工程學系 Chapter 4: Threads. 資工系網媒所 NEWS 實驗室 Objectives To introduce the notion of a thread — a fundamental unit of CPU utilization that forms the.

Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

+ CUDA Antonyus Pyetro do Amaral Ferreira. + The problem The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now.

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

PRET-OS for Biomedical Devices A Part IV Project.

Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.

OpenCL Programming James Perry EPCC The University of Edinburgh.

Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition, Chapter 4: Multithreaded Programming.

CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.

Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.

Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.

Single Node Optimization Computational Astrophysics.

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.

EU-Russia Call Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.

Background Computer System Architectures Computer System Software.

Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

1 Chapter 5: Threads Overview Multithreading Models & Issues Read Chapter 5 pages

Chapter 4 – Thread Concepts

Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.

Introduction to threads

Computer Engg, IIT(BHU)

Chapter 4: Multithreaded Programming

Chapter 4: Threads.

Chapter 4 – Thread Concepts

Operating System (013022) Dr. H. Iwidat

Chapter 4: Multithreaded Programming

Chapter 4: Threads.

Chapter 4: Threads.

Chapter 4: Threads.

Modified by H. Schulzrinne 02/15/10 Chapter 4: Threads.

Chapter 4: Threads.

Chapter 4: Threads.

Chapter 4: Threads & Concurrency

Chapter 4: Threads.

Chapter 4: Threads.

Chapter 4: Threads.

Chapter 4: Threads.

Multicore and GPU Programming

Presentation transcript:

OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston

Top 10 Supercomputers (June 2011) 2

Why OpenMP Shared memory parallel programming model – Extends C, C++. Fortran Directives-based – Single code for sequential and parallel version Incremental parallelism – Little code modification High-level – Leave multithreading details to compiler and runtime Widely supported by major compilers – Open64, Intel, GNU, IBM, Microsoft, … – Portable 3

OpenMP Example 4

Present/Future Architectures & Challenges they pose Node 0 Memory Node 1 Node 2Node 3 Memory accelerator Memory … Many more CPUS Location Heterogeneity Scalability 5 Node 0 Memory Node 1 Node 2Node 3 Memory

Heterogeneous Embedded Platform 6

Heterogeneous High-Performance Sy stems Each node has multiple CPU cores, and some of the nodes are equipped with additional computational accelerators, such as GPUs. 7

Must map data/computations to specific devices Usually involves substantial rewrite of code Verbose code – Move data to/from device x – Launch kernel on device – Wait until y is ready/done Portability becomes an issue – Multiple versions of same code – Hard to maintain Programming Heterogeneous Multicore: Issues Always hardware-specific! 8

Programming Models? Today’s Scenario // Run one OpenMP thread per device per MPI node #pragma omp parallel num_threads(devCount) if (initDevice()) { // Block and grid dimensions dim3 dimBlock(12,12); kernel >>(); cudaThreadExit(); } else { printf("Device error on %s\n",processor_name); } MPI_Finalize(); return 0; } 9

OpenMP in the Heterogeneous World All threads are equal – No vocabulary for heterogeneity, separate device All threads must have access to the memory – Distributed memories common in embedded systems – Memories may not be coherent Implementations rely on OS and threading libraries – Memory allocation, synchronization e.g. Linux, Pthreads 10

Extending OpenMP Example Main Memory Application data General Purpose Processor Cores HWA Application data Device cores Upload remote data Download remote data Remote Procedure call 11

Heterogeneous OpenMP Solution Stack OpenMP Application Directives, Compiler OpenMP library Environment variables Runtime library OS/system support for shared memory OpenMP Parallel Computing Solution Stack User layer Prog. layer OpenMP API System layer Core 1Core 2Core n … MCAPI, MRAPI, MTAPI Language extensions Efficient code generation 12 Target Portable Runtime Interface 12

Summarizing My Research OpenMP on heterogeneous architectures – Expressing heterogeneity – Generating efficient code for GPUs/DSPs Managing memories – Distributed – Explicitly managed – Enabling portable implementations 13

Backup 14

MCA: Generic Multicore Programming Solve portability issue in embedded multicore programming Defining and promoting open specifications for – Communication - MCAPI – Resource Management - MRAPI – Task Management - MTAPI ( 15

Heterogeneous Platform: CPU + Nvidia GPU 16