© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter 2: GPU Computing History
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 2 Host Vertex Control Vertex Cache VS/T&L Triangle Setup Raster Shader ROP FBI Texture Cache Frame Buffer Memory CPU GPU Host Interface A Fixed Function GPU Pipeline
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 3 Texture mapping example: painting a world map texture image onto a globe object. Texture Mapping Example
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 4 3D Application or Game 3D API: OpenGL or Direct3D Programmable Vertex Processor Primitive Assembly Rasterization & Interpolation 3D API Commands Transformed Vertices Assembled Polygons, Lines, and Points GPU Command & Data Stream Programmable Fragment Processor Rasterized Pre-transformed Fragments Transformed Fragments Raster Operation s Framebuffer Pixel Updates GPU Front End Pre-transformed Vertices Vertex Index Stream Pixel Location Stream CPU – GPU Boundary CPU GPU An example of separate vertex processor and fragment processor in a programmable graphics pipeline Programmable Vertex and Pixel Processors
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 5 L2 FB SP L1 TF Thread Processor Vtx Thread Issue Setup / Rstr / ZCull Geom Thread IssuePixel Thread Issue Data Assembler Host SP L1 TF SP L1 TF SP L1 TF SP L1 TF SP L1 TF SP L1 TF SP L1 TF L2 FB L2 FB L2 FB L2 FB L2 FB Unified Graphics Pipeline
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 6 G80 CUDA mode – A Device Example Processors execute computing threads New operating mode/HW interface for computing
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign © David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 7 Identified table GPU Vertex shader Pixel shader xyz rgb IdNameLocation 1Vsource 2Vobj 3Vposition 4Vcolour 5Psource 6Pobj CP U Colours: "Host" Memoy(data): main() Id: Points: xyz......rgb H.W CPU Code main()
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 8 CUDA General Purpose Computation using GPU
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 9 What is (Historical) GPGPU ? General Purpose computation using GPU and graphics API in applications other than 3D graphics –GPU accelerates critical path of application Data parallel algorithms leverage GPU attributes –Large data arrays, streaming throughput –Fine-grain SIMD parallelism –Low-latency floating point (FP) computation Applications – see //GPGPU.org –Game effects (FX) physics, image processing –Physical modeling, computational engineering, matrix algebra, convolution, correlation, sorting
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 10 Input Registers Fragment Program Output Registers Constants Texture Temp Registers per thread per Shader per Context FB Memory The restricted input and output capabilities of a shader programming model.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 11 Previous GPGPU Constraints Dealing with graphics API –Working with the corner cases of the graphics API Addressing modes –Limited texture size/dimension Shader capabilities –Limited outputs Instruction sets –Lack of Integer & bit ops Communication limited –Between pixels –Scatter a[i] = p Input Registers Fragment Program Output Registers Constants Texture Temp Registers per thread per Shader per Context FB Memory
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 12 CUDA “Compute Unified Device Architecture” General purpose programming model –User kicks off batches of threads on the GPU –GPU = dedicated super-threaded, massively data parallel co-processor Targeted software stack –Compute oriented drivers, language, and tools Driver for loading computation programs into GPU –Standalone Driver - Optimized for computation –Interface designed for compute – graphics-free API –Data sharing with OpenGL buffer objects –Guaranteed maximum download & readback speeds –Explicit GPU memory management