General Programming on Graphical Processing Units

Slides:



Advertisements
Similar presentations
GPGPU Programming Dominik G ö ddeke. 2Overview Choices in GPGPU programming Illustrated CPU vs. GPU step by step example GPU kernels in detail.
Advertisements

Introduction to the CUDA Platform
GPU Programming using BU Shared Computing Cluster
Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
GPU Processing for Distributed Live Video Database Jun Ye Data Systems Group.
OpenCL Peter Holvenstot. OpenCL Designed as an API and language specification Standards maintained by the Khronos group  Currently 1.0, 1.1, and 1.2.
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
Presented by Rengan Xu LCPC /16/2014
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.
Contemporary Languages in Parallel Computing Raymond Hummel.
OpenSSL acceleration using Graphics Processing Units
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
An Introduction to Programming with CUDA Paul Richmond
1 Integrating GPUs into Condor Timothy Blattner Marquette University Milwaukee, WI April 22, 2009.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.
General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.
Accelerating MATLAB with CUDA
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
Automatic translation from CUDA to C++ Luca Atzori, Vincenzo Innocente, Felice Pantaleo, Danilo Piparo 31 August, 2015.
GPU Architecture and Programming
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
Developing the Demosaicing Algorithm in GPGPU Ping Xiang Electrical engineering and computer science.
OpenCL Programming James Perry EPCC The University of Edinburgh.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.
CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.
Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,
University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.
Heterogeneous Computing With GPGPUs Matthew Piehl Overview Introduction to CUDA Project Overview Issues faced nvcc Implementation Performance Metrics Conclusions.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
Martin Kruliš by Martin Kruliš (v1.1)1.
GPU Acceleration of Particle-In-Cell Methods B. M. Cowan, J. R. Cary, S. W. Sides Tech-X Corporation.
GPGPU Programming with CUDA Leandro Avila - University of Northern Iowa Mentor: Dr. Paul Gray Computer Science Department University of Northern Iowa.
1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,
Computer Engg, IIT(BHU)
Prof. Zhang Gang School of Computer Sci. & Tech.
Introduction to CUDA Li Sung-Chi Taiwan Evolutionary Intelligence Laboratory 2016/12/14 Group Meeting Presentation.
GPU-based iterative CT reconstruction
CS 179: GPU Programming Lecture 1: Introduction 1
GPU Computing Jan Just Keijser Nikhef Jamboree, Utrecht
Enabling machine learning in embedded systems
GPU VSIPL: High Performance VSIPL Implementation for GPUs
Portable Inter-workgroup Barrier Synchronisation for GPUs
Heterogeneous Computing with D
Processing Framework Sytse van Geldermalsen
CS 179: GPU Programming Lecture 1: Introduction 1
Brook GLES Pi: Democratising Accelerator Programming
Faster File matching using GPGPU’s Deephan Mohan Professor: Dr
NVIDIA Fermi Architecture
AdaCore C/C++ Offering
General Programming on Graphical Processing Units
CS/EE 217 – GPU Architecture and Parallel Programming
Using OpenMP offloading in Charm++
Graphics Processing Unit
Chapter 4:Parallel Programming in CUDA C
Update & Roadmap. Update & Roadmap Established Ada market “Helping to preserve investments done with Ada” “Provide new cost-effective ways to develop.
6- General Purpose GPU Programming
Option Pricing Black-Scholes Equation
Presentation transcript:

General Programming on Graphical Processing Units Quentin Ochem October 4th, 2018

What is GPGPU? GPU were traditionally dedicated to graphical rendering … … but their capability is really vectorized computation Enters General Programming GPU (GPGPU)

GPGPU Programming Paradigm Debug? Optimize data transfer? core Offload computations Refactor parallel algorithms? Avoid data races? How to optimize occupancy

Why do we care about Ada? (1/2) Source: https://www.adacore.com/uploads/techPapers/Controlling-Costs-with-Software-Language-Choice-AdaCore-VDC-WP.PDF

Why do we care about Ada (2/2) Signal processing Machine learning Monte-carlo simulation Trajectory prediction Cryptography Image processing Physical simulation … and much more!

Available Hardware Desktop & Server Embedded NVIDIA Tegra ARM Mali Qualcomm Adreno IMG Power VR Freescale Vivante NVIDIA GeForce / Tesla / Quadro AMD Radeon Intel HD

Ada Support

Three options Interfacing with existing libraries “Ada-ing” existing languages Ada 2020

Interfacing existing libraries Already possible and straightforward effort “gcc –fdump-ada-specs” will provide a first binding of C to Ada We could provide “thick” bindings to e.g. Ada.Numerics matrix operations

“Ada-ing” existing languages CUDA – kernel-based language specific to NVIDIA OpenCL – portable version of CUDA OpenACC – integrated language marking parallel loops

CUDA Example (Device code) procedure Test_Cuda (A : out Float_Array; B, C : Float_Array) with Export => True, Convention => C; pragma CUDA_Kernel (Test_Cuda); (A : Float_Array; B, C : Float_Array) is begin A (CUDA_Get_Thread_X) := B (CUDA_Get_Thread_X) + C (CUDA_Get_Thread_X); end Test_cuda;

CUDA Example (Host code) A, B, C : Float_Array; begin -- initialization of B and C -- CUDA specific setup pragma CUDA_Kernel_Call (Grid’(1, 1, 1), Block’(8, 8, 8)); My_Kernel (A, B, C); -- usage of A

OpenCL example Similar to CUDA in principle Requires more code on the host code (no call conventions)

OpenACC example (Device & Host) procedure Test_OpenACC is A, B, C : Float_Array; begin -- initialization of B and C for I in A’Range loop pragma Acc_Parallel; A (I) := B (I) + C (I); end loop; end Test_OpenACC;

Ada 2020 procedure Test_Ada2020 is A, B, C : Float_Array; begin -- initialization of B and C parallel for I in A’Range loop A (I) := B (I) + C (I); end loop; end Test_Ada2020;

Lots of other language considerations Identification of memory layout (per thread, per block, global) Thread allocation specification Reduction (ability to aggregate results through operators e.g. sum or concatenation) Containers Mutual exclusion …

A word on SPARK X_Size : 1000; Y_Size : 10; Data : array (1 .. X_Size * Y_Size) of Integer; begin for X in 1 .. X_Size loop for Y in 1 .. Y_Size loop Data (X + Y_Size * Y) := Compute (X, Y); end loop; {X = 100, Y = 1}, X + Y * Y_Size = 100 + 10 = 110 {X = 10, Y = 10}, X + Y * Y_Size = 10 + 100 = 110

Next Steps AdaCore spent 1 year to run various studies and experiments Finalizing an OpenACC proof of concept on GCC About to start an OpenCL proof of concept on CCG If you want to give us feedback or register to try technology, contact us on info@adacore.com