GPGPU Applications Introduction

Slides:



Advertisements
Similar presentations
COMPUTER GRAPHICS SOFTWARE.
Advertisements

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Graphics Pipeline.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
GPUGI: Global Illumination Effects on the GPU
GPU Simulator Victor Moya. Summary Rendering pipeline for 3D graphics. Rendering pipeline for 3D graphics. Graphic Processors. Graphic Processors. GPU.
Evolution of the Programmable Graphics Pipeline Patrick Cozzi University of Pennsylvania CIS Spring 2011.
Status – Week 283 Victor Moya. 3D Graphics Pipeline Akeley & Hanrahan course. Akeley & Hanrahan course. Fixed vs Programmable. Fixed vs Programmable.
The Graphics Pipeline CS2150 Anthony Jones. Introduction What is this lecture about? – The graphics pipeline as a whole – With examples from the video.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Under the Hood: 3D Pipeline. Motherboard & Chipset PCI Express x16.
Background image by chromosphere.deviantart.com Fella in following slides by devart.deviantart.com DM2336 Programming hardware shaders Dioselin Gonzalez.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Dec 31, 2012 Emergence of GPU systems and clusters for general purpose High Performance Computing.
Programmable Pipelines. Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
Computer Graphics Graphics Hardware
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Programmable Pipelines. 2 Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
The Graphics Rendering Pipeline 3D SCENE Collection of 3D primitives IMAGE Array of pixels Primitives: Basic geometric structures (points, lines, triangles,
OpenGL Conclusions OpenGL Programming and Reference Guides, other sources CSCI 6360/4360.
NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.
Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.
Shadow Mapping Chun-Fa Chang National Taiwan Normal University.
CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.
Programmable Pipelines Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts Director, Arts Technology Center University.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Fateme Hajikarami Spring  What is GPGPU ? ◦ General-Purpose computing on a Graphics Processing Unit ◦ Using graphic hardware for non-graphic computations.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
GLSL Review Monday, Nov OpenGL pipeline Command Stream Vertex Processing Geometry processing Rasterization Fragment processing Fragment Ops/Blending.
COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.
NVIDIA® TESLA™ GPU Based Super Computer By : Adam Powell Student # For COSC 3P93.
Computer Graphics Graphics Hardware
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.
GPU Architecture and Its Application
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Programmable Pipelines
A Crash Course on Programmable Graphics Hardware
Graphics on GPU © David Kirk/NVIDIA and Wen-mei W. Hwu,
CS427 Multicore Architecture and Parallel Computing
Graphics Processing Unit
Introduction to OpenGL
Chapter 6 GPU, Shaders, and Shading Languages
From Turing Machine to Global Illumination
The Graphics Rendering Pipeline
CS451Real-time Rendering Pipeline
GRAPHICS PROCESSING UNIT
CSC 2231: Parallel Computer Architecture and Programming GPUs
NVIDIA Fermi Architecture
Graphics Processing Unit
Computer Graphics Graphics Hardware
RADEON™ 9700 Architecture and 3D Performance
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
Graphics Processing Unit
Introduction to OpenGL
OpenGL-Rendering Pipeline
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

GPGPU Applications Introduction General Purpose Graphics Processing Unit – solving non graphics problems on the GPU

Why and How? Why … How … Does it worth writing parallel programs? … do we write single threaded programs? … is this a problem? How … … does this change? … can we utilize it? Does it worth writing parallel programs?

Single-Threaded Performance Image courtesy of Jeff Preshing (preshing.com)

Single-Threaded Performance Image courtesy of Jeff Preshing (preshing.com)

CPU vs GPU GFLOP/s Image courtesy of NVIDIA Corporation

CPU vs GPU memory bandwidth Image courtesy of NVIDIA Corporation

CPU vs GPU – The reason behind Image courtesy of NVIDIA Corporation Low number of cores(2-8) Complex flow control Large data and instruction cache Optimized for serial computing High number of cores (2048) Simple distributed control Minimal cache Optimized for data-parallel processing More transistor devoted to data processing

What are the possible benefits? It can be difficult to parallelize a serial program Does it worth the effort? Amdahl’s law TSerial - Serial execution time Tparallel - Parallel execution time P - Parallelizable part of the code The workload is fixed, how much faster do we get the result is we increase the number of processing units Image source - Wikipedia

What are the possible benefits? Gustafson’s law W’ is the workload on an n-node machine (number of instructions to perform) If the workload is scaled up to maintain fixed execution time, as the number of processors increases, the speedup increases linearly true parallel power is only achievable when a large parallel problem is applied The workload is scaled up to maintain fixed exeuction time as the number of processors are increased, steppness Image source - Wikipedia

GPU = Graphics Processing Unit The hardware realization of the incremental rendering algorithm of computer graphics Required performance: Animation: few nsec / pixel Massively parallel: Pipeline (stream processor) Parallel execution

Incremental rendering MODELVIEW PROJECTION Perspective transform Clipping with homogeneous coordinates Camera transform, illumination Virtual World 2. 1. color depth Final image Viewport trans.+ Rasterization + Interpolation

Texture mapping x3, y3, z3 (u3, v3) (u1, v1) x2, y2, z2 (u2, v2) Interpolation: (u, v) (u1, v1) (u1, v1) Texture object is located in the memory of the GPU image, filtering, addressing (u2, v2) (u3, v3) color

Points: Homogeneous coordinates Descartes: (x,y,z) Homogeneous: [X,Y,Z,w] = [xw,yw,zw,w] Why? Translation, rotation, projection etc. can be written as a 4x4 matrix multiplication Multiple transformation can be summarized in one matrix Parallel lines can be handled Linear transforms of homogeneous coordinates map triangles to triangles

Interpretation of homogeneous coords. x, y, z are the direction w inverse of the distance [x,y,0] [x,y,1/3] [x,y,1] [2x,2y,1] = [x,y,1/2] y x

Examples of Homogeneous Linear Transforms Translation Rotation Scaling 1 px, py, pz, 1 px, py, pz [x,y,z,1] cos sin -sin cos 1 [x,y,z,1] Sx Sy Sz 1 [x,y,z,1]

Examples of Homogeneous Linear Transforms Projection Non linear -f -b Non linear

Examples of Homogeneous Linear Transforms Projection 1 -α -1 -β 0 [x,y,z,1] Linear

Vertex processing [x,y,z,1] Tmodelviewproj = [X,Y,Z,w] MODELVIEW PROJECTION Perspective transform Camera transform Virtual World [x,y,z,1] Tmodelviewproj = [X,Y,Z,w] Zcamera

Clipping [X,Y,Z,w] = [xw,yw,zw,w] (1, 1, 1) -1 < x < 1 (-1,-1,-1) [X,Y,Z,w] = [xw,yw,zw,w]

Clipping (1, 1, 1) (-1,-1,-1)

Clipping (1, 1, 1) (-1,-1,-1)

Viewport transform Z(-1,1) (0,1) (1, 1, 1) 1 (-1,-1,-1) Screen coordinates Z(-1,1) (0,1)

Rasterization (X3,Y3) (X2,Y2) (X1,Y1)

Interpolation inside of a triangle (X2,Y2,Z2) Z Z(X,Y) = aX + bY + c (X1,Y1,Z1) Y (X3,Y3,Z3) Z(X,Y) 1 pixel / 1 sor = 1 összeadás X Z(X+1,Y) = Z(X,Y) + a

Interpolation hardware X Z(X,Y) X counter Z register CLK S a

Perspective correct interpolation u v Y Perspective correct X Linear au X+buY+cu u/h =au X+buY+cu v/h = av X+bvY+cv 1/h =ah X+bhY+ch u = v = ah X+bhY+ch av X+bvY+cv ah X+bhY+ch

Homogeneous linear interpolation hardware v X Div Div [u/h](X,Y) [v/h](X,Y) [1/h](X,Y) X counter [u/h] register [v/h] register [1/h] register S S CLK S av ah au

Compositing: The Z-buffer algorithm 2. 1. 3.  = 1 z 0.6 0.3 0.3 0.6 0.8 Depth buffer Z-buffer Color buffer

Compositing: alpha blending Interpolated color or read from texture (Rs,Gs,Bs,As) (Rd,Gd,Bd,Ad) * * first first ALU (R,G,B,A) buffer (R,G,B,A) = (RsAs+Rd(1-As), GsAs+Gd (1-As), BsAs+Bd (1-As), AsAs+Ad (1-As))

Rendering pipeline Vertex transform + lighting Viewport transform Clipping + homogeneous division Viewport transform Texture mapping Z-buffer Compositing Frame buffer Triangle setup Rasterization + Interpolation

History of the GPU Early systems (not PC) Implements the whole pipeline Workstation and Graphics Terminal HP, Sun, Textronix Vertex transform + lighting Clipping + homogeneous division Viewport transform Texture mapping Z-buffer Compositing Frame buffer Triangle setup Rasterization + Interpolation

History of the GPU 1970-1990 Typical 2D Sprites, primitives Blitter, line, square drawing Windowing speedup Amiga, Atari, S3 Vertex transform + lighting Clipping + homogeneous division Viewport transform Texture mapping Z-buffer Compositing Frame buffer Triangle setup Rasterization + Interpolation

History of the GPU 1990-1998 Vertex processing, texture filtering, Z-buffer PlayStation, Nintendo64 S3 Virge, ATI Rage, Matrox Mystique 3dfx Voodoo 3dfx Glide Microsoft DirectX 5.0 Vertex transform + lighting Clipping + homogeneous division Viewport transform Texture mapping Z-buffer Compositing Frame buffer Triangle setup Rasterization + Interpolation

History of the GPU 1990-2000 The break-up of the pipeline Transform & Lighting Microsoft DirectX 7.0 OpenGL NVIDIA GeForce 256 (NV10) The first non-professional 3D accelerator Vertex transform + lighting Clipping + homogeneous division Viewport transform Texture mapping Z-buffer Compositing Frame buffer Triangle setup Rasterization + Interpolation

The history of the GPU 2000 – The appearance of shader programs Vertex and fragment shader (2001) Geometry shader (2006) Programmable tessellator (2010) Vertex transform + lighting Clipping + homogeneous division Viewport transform Texture mapping Z-buffer Compositing Frame buffer Triangle setup Rasterization + Interpolation

Modern GPU pipeline

Graphics API General interface to the GPU Pipeline control Shaders Define the geometry to be rendered Define Textures Framebuffer Parallel rendering

Graphics API Microsoft DirectX Windows specific DirectX7: Transform & Lighting DirectX9: Vertex and pixel shader DirectX10: Geometry shader DirectX11: Tessellation, DirectCompute

Graphics API OpenGL OpenGL 2.0: Vertex and pixel shader OpenGL 3.0: Framebuffer objects OpenGL 3.1: OpenCL interop OpenGL 3.2: Geometry Shader OpenGL 3.3/4.0: Tessellation OpenGL 4.3: Compute Shader

GPGPU API CUDA Compute Unified Device Architecture 2007. NVIDIA High level, C like syntax Atomic operations Polling Double precision computing Synchronization

GPGPU API OpenCL 2008 november, Khronos Group Independent GPGPU API Common platform for the CPU and the GPU ATI, Intel, Apple, IBM és NVIDIA

Development of the OpenCL 1.0 – 2009 Support of heterogeneous environment OpenCL virtual machine Execution model Memory model Programming model OpenCL C syntax for the kernel program Support of linear and texture memory

Development of the OpenCL 1.1 – 2010 New data types Vector with 3 components, new texture formats Multi-threaded control Multiple parallel device Instructions limited to regions Expanded event system New OpenCL functions Integer clamp, shuffle, synchron copy Improved OpneGL interop.

Development of the OpenCL 1.2 – 2011 Device partitioning A part of the device can be dedicated to critical operations Separated compiling and linking Improved texture support Support of device integrated kernels Video decoding, fixed function devices Improved DirectX support Full support of the IEEE 754 standard

Development of the OpenCL 2.0 – 2013 Distributed virtual memory Embedded parallelism Kernel launch from kernel Global address space Different memory regions mapped into the same address space C11 atomic operations FIFO objects in the kernels

Development of the OpenCL 2.1 – 2015 OpenCL C++ support C++14 subset New virtual machine Based on SPIR-V, similar to Vulkan Expended kernel objects Sub-groups Cloning Improved profiler