Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.

Slides:

Advertisements

Similar presentations

Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters

Advertisements

Is There a Real Difference between DSPs and GPUs?

DSPs Vs General Purpose Microprocessors

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.

Lecture 6: Multicore Systems

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Chimera: Collaborative Preemption for Multitasking on a Shared GPU

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

Development of a track trigger based on parallel architectures Felice Pantaleo PH-CMG-CO (University of Hamburg) Felice Pantaleo PH-CMG-CO (University.

GPUs. An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and.

Team Members: Tyler Drake Robert Wrisley Kyle Von Koepping Justin Walsh Faculty Advisors: Computer Science – Prof. Sanjay Rajopadhye Electrical & Computer.

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.

Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.

Department of Electrical and Computer Engineering Texas A&M University College Station, TX Abstract 4-Level Elevator Controller Lessons Learned.

CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.

Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.

A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.

Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.

University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.

GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.

GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.

GPGPU platforms GP - General Purpose computation using GPU

Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

GPU-Qin: A Methodology For Evaluating Error Resilience of GPGPU Applications Bo Fang , Karthik Pattabiraman, Matei Ripeanu, The University of British.

Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.

GPU Programming David Monismith Based on notes taken from the Udacity Parallel Programming Course.

Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of Stromal Development Olcay Sertel 1,2, Antonio Ruiz 3, Umit.

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.

GPU Programming with CUDA – Optimisation Mike Griffiths

General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.

GPU in HPC Scott A. Friedman ATS Research Computing Technologies.

+ CUDA Antonyus Pyetro do Amaral Ferreira. + The problem The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now.

Lecture 8 : Manycore GPU Programming with CUDA Courtesy : Prof. Christopher Cooper’s and Prof. Chowdhury’s course note slides are used in this lecture.

Genetic Programming on General Purpose Graphics Processing Units (GPGPGPU) Muhammad Iqbal Evolutionary Computation Research Group School of Engineering.

Accelerating image recognition on mobile devices using GPGPU

GPU Architecture and Programming

A Closer Look At GPUs By Kayvon Fatahalian and Mike Houston Presented by Richard Stocker.

Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.

Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.

Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.

QCAdesigner – CUDA HPPS project

By Dirk Hekhuis Advisors Dr. Greg Wolffe Dr. Christian Trefftz.

Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous.

Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,

Lecture 8 : Manycore GPU Programming with CUDA Courtesy : SUNY-Stony Brook Prof. Chowdhury’s course note slides are used in this lecture note.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 Graphic Processing Processors (GPUs) Parallel.

Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.

GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.

My Coordinates Office EM G.27 contact time:

Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.

Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

Matthew Royle Supervisor: Prof Shaun Bangay.  How do we implement OpenCL for CPUs  Differences in parallel architectures  Is our CPU implementation.

GPGPU Programming with CUDA Leandro Avila - University of Northern Iowa Mentor: Dr. Paul Gray Computer Science Department University of Northern Iowa.

Prof. Zhang Gang School of Computer Sci. & Tech.

GPU Architecture and Its Application

COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE

Graphics Processing Unit

Accelerating MapReduce on a Coupled CPU-GPU Architecture

MASS CUDA Performance Analysis and Improvement

Introduction to Heterogeneous Parallel Computing

Ray Tracing on Programmable Graphics Hardware

6- General Purpose GPU Programming

CSE 502: Computer Architecture

Presentation transcript:

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science Faculty of Electronic Engineering University of Niš, Serbia 14 th Workshop on Software Engineering Education and Reverse Engineering Sinaia, Romania, August , 2014

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Presentation Outline 1.Graphics processing units (GPUs) 2.GPU computing and its applications 3.GPU architecture and programming 4.Case study 5. Conclusions

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Graphics Processing Unit (GPU) The first GPU appeared in 1999 Graphics processing unit (GPU) is a hardware device originally specialized for rendering computer graphics Early 2000s: fixed-function processors dedicated to rendering computer graphics Modern GPUs : unified programmable graphics processors and parallel computing platforms GPU design philosophy is oposite to the design of CPUs ( throughput vs latency ) different programming philosophy in 2003 Moore’s law hit a wall

Programming Concepts in GPU Computing Dušan Gajić, University of Niš CPU and GPU Processing Power

Programming Concepts in GPU Computing Dušan Gajić, University of Niš CPU and GPU Memory Bandwidth

Programming Concepts in GPU Computing Dušan Gajić, University of Niš GPU Computing General purpose computations on the GPU (GPU computing) – developed from GPGPU GPU features:  SIMD manycore architecture  high throughput and processing power  less cost and lower energy consumption Suitable for intensive computations and large data processing Nvidia CUDA (high performance, exclusive for Nvidia GPUs), appeared in 2007 OpenCL (open standard, acceleration on heterogeneous devices (CPUs, GPUs, DSPs, FPGAs), appeared in 2009

Programming Concepts in GPU Computing Dušan Gajić, University of Niš GPU Computing Applications Computational Finance Machine Learning and Computer Vision Bioinformatics Medical Imaging Digital Signal Processing Meteorology Astronomy Augmented Reality

Programming Concepts in GPU Computing Dušan Gajić, University of Niš GPU Architecture and Computing Model GPU executes kernels with high parallelism input output input buffer output buffer Novel programming approach to GPUs

Programming Concepts in GPU Computing Dušan Gajić, University of Niš GPU Programming Abstractions A GPU computing program is composed of: 1.host program (on CPUs, sequential, controls execution) 2.device program (on GPUs, massively parallel, implements kernels) Kernel is a data-parallel function executed on a GPU Each kernel describes computations performed by a single thread Block (set of threads) and grid (set of blocks) configuration defined by the host Branching is expensive, data and task parallelism,... Different choice of algorithms, e.g., heapsort and mergesort

Programming Concepts in GPU Computing Dušan Gajić, University of Niš GPU Programming Abstractions

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Case Study – Modulo p Addition Randomly generated p-valued logic function vectors F (n) Computed component-wise vector addition Y(n) = F(n) ⊕ F(n), ⊕ - modulo p The operator ⊕ was implemented on CPUs and GPUs using: 1. look-up tables (LUTs) 2. modulo arithmetic operator % from C++ and CUDA

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Case Study – Addition Modulo 3 n Processing time [ms] C++CUDA LUTMODLUTMOD p = Better GPU implementation approx. 10 × faster than the better CPU implementation Better GPU implementation approx. 10 × faster than the better CPU implementation LUTs faster than MOD on CPUs, the opposite case on GPUs LUTs faster than MOD on CPUs, the opposite case on GPUs

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Case Study – Performance on CPUs p not a power of 2 - LUTs faster than MOD, from 1.5× to 2.05× p a power of 2 - MOD as fast as LUTs due to bitwise ops p not a power of 2 - LUTs faster than MOD, from 1.5× to 2.05× p a power of 2 - MOD as fast as LUTs due to bitwise ops

Programming Concepts in GPU Computing Dušan Gajić, University of Niš In all cases - MOD faster than LUTs, from 3.3× to 19.5× As p increases, advantage of using MOD over LUTs increases In all cases - MOD faster than LUTs, from 3.3× to 19.5× As p increases, advantage of using MOD over LUTs increases Case Study – Performance on GPUs

Programming Concepts in GPU Computing Dušan Gajić, University of Niš GPU Computing at the FEE Niš Challenge for students: Making a shift in thinking from traditional CPU programming Research on performing spectral methods on GPUs (Fourier and related transforms, matrix computations,...) GPU computing used in courses: Digital Signal Processing – 4 th year Pattern Recognition – 4 th year Spectral Methods – 5 th year Founded in 1960, part of University of Niš with students

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Conclusions Heterogenous parallel processing is both the present and the future of high-performance computing (at least until we have something better ) Modern GPUs are unified programmable graphics processors and parallel computing platforms Case study – contrasting programming techniques for achieving high performance on CPUs and GPUs GPUs - SIMD parallel architecture and massively-parallel programming model Better preparation for parallel computing is necessary in programming and algorithm courses

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science Faculty of Electronic Engineering University of Niš, Serbia 14 th Workshop on Software Engineering Education and Reverse Engineering Sinaia, Romania, August , 2014