A Crash Course on Programmable Graphics Hardware Li-Yi Wei 2005 at Tsinghua University, Beijing
Why do we need graphics hardware?
The evolution of graphics hardware SGI Origin 3400 NVIDIA Geforce 7800
7 years of graphics accelenation.com/?doc=123&page=1
Ray tracing General & flexible Intuitive Global illumination Hard to accelerate
Polygonal graphics pipeline Local computation Easy to accelerate Not general Unintuitive
Graphics hierarchy Layered approach Encapsulation Protection Like network layers Encapsulation Easy programming Driver optimization Driver workaround Driver simulation Protection Hardware error check
Overview Graphics pipeline GPU programming Only high level overview (so you can program), not necessarily real hardware GPU programming
Graphics pipeline
Application Mostly on CPU High level work User interface Control Simulation Physics Artificial intelligence
Host Gatekeeper of GPU Command processing Error checking State management Context switch
Geometry Vertex processor Primitive assembly Clip & cull Viewport transform
Vertex Processor Process one vertex at one time Programmable No information on other vertices Programmable Transformation Lighting
Transformation Global to eye coordinate system
Lighting Diffuse Specular
Transform & Light on Vertex Processor A sequence of assembly instructions (more on this later)
Primitive Assembly Assemble individual vertices into triangle (or line or point) Performance implication A triangle is ready only when all 3 vertices are Vertex coherence & caching
Clipping & Culling Backface culling Clipping against view frustum Remove triangles facing away from view Eliminate ½ of the triangles in theory Clipping against view frustum Triangles may become quadrilaterals
Viewport transform From floating point range [-1, 1] x [-1, 1] to integer range [0, height-1] x [0, width-1]
Rasterization Convert primitives (triangles, lines) into pixels Barycentric coordinate Attribute interpolation
Triangles into pixels
Attribute interpolation Barycentric
Perspective correct interpolation incorrect correct
Fragment processor Fragment: corresponds to a single pixel and includes color, depth, and sometimes texture-coordinate values. Compute color and depth for each pixel Most interesting part of GPU
Texture Optional Cache data Sampling/filtering (though hard to avoid) Hide latency from FB Sampling/filtering I told you this last time
ROP (Raster Operation) Write to framebuffer Comparison Z, stencil, alpha, window
Framebuffer Storing buffers and textures Connect to display Characteristics Size Bandwidth Latency
Conceptual programming model Inputs (read-only) Attributes Constants Textures Registers (read-write) Used by shader Outputs (write-only)
Simple example HPOS: position COL0: diffuse color MOV o[HPOS], v[HPOS]; MOV o[COL0], v[COL0];
More complex example o[COL0] = v[COL0] + constant*v[HPOS]; MOV o[HPOS], v[HPOS]; MOV R0, v[COL0]; MAD R0, v[HPOS], c[0], R0; MOV o[COL0], R0;
Sample instruction set
A real example
High-level shading language Writing assembly is Painful Not portable Not optimize-able High level shading language solves these Cg, HLSL
Cg example
Applications Too many of them for me to describe here The only way to learn is try to program Useless for you even if I try to describe Look at developer website NVIDIA, ATI, GPGPU
Homework Try to program GPU! Stanford course on graphics hardware Even without NVIDIA GPU, you can download the emulator Stanford course on graphics hardware http://www.graphics.stanford.edu/courses/cs448a-01-fall/ History of graphics hardware 7 years of graphics accelenation.com/?doc=123&page=1