Introduction to Programmable Hardware
Traditional Graphics Pipeline transform & lighting (per vertex operations) setup rasterizer (per primitive operation) texture blending (per fragment operation) frame-buffer anti-aliasing
Programmable features Vertex Programming Pixel Shader Texture shader Register combiner Based on nVIDIA architecture
Gives the programmer total control of vertex processing. Vertex Program (cont’d) Vertex Programming offers programmable T&L unit User-defined Vertex Processing transform & lighting setup rasterizer texture blending Gives the programmer total control of vertex processing. frame-buffer anti-aliasing
Vertex Program (cont’d) transform & lighting setup rasterizer texture blending frame-buffer anti-aliasing
Vertex Program (cont’d) Assembly language interface to T&L unit GPU instruction set to perform all vertex math Reads an untransformed, unlit vertex Creates a transformed vertex Optionally creates Lights a vertex Creates texture coordinates Creates fog coordinates Creates point sizes
Create Vertex Program Programs (assembly) are defined inline as character strings static const GLubyte vpgm[] = “\!!VP1. 0\ DP4 o[HPOS].x, c[0], v[0]; \ DP4 o[HPOS].y, c[1], v[0]; \ DP4 o[HPOS].z, c[2], v[0]; \ DP4 o[HPOS].w, c[3], v[0]; \ MOV o[COL0],v[3]; \ END";
Programming Model All quad floats Vertex Source Program Constants … V[15] Vertex Source Program Constants c[0] … c[96] 16x4 registers O[HPOS] O[COL0] O[COL1] O[FOGP] O[PSIZ] O[TEX0] … O[TEX7] Vertex Program 96x4 registers 128 instructions Temporary Registers R0 … R11 12x4 registers Vertex Output 15x4 registers All quad floats
Instruction Set: The ops 17 instructions total MOV, MUL, ADD, MAD, DST DP3, DP4 MIN, MAX, SLT, SGE RCP, RSQ, LOG, EXP, LIT ARL
Pixel Shader transform & lighting setup rasterizer User-defined per frame-buffer anti-aliasing texture blending setup rasterizer transform & lighting User-defined per pixel shading
Texture Mapping/Blending Traditional OpenGL texture mapping/blending Vertex colors Gouraud Shading Fragment color Texture Coordinate Texture Unit Blend colors Fragment color output
Multitexturing An optional extension of OpenGL 1.2 fragment color input texture unit 0 blend colors texture unit 0 blend colors texture unit 0 blend colors texture unit 0 blend colors fragment color output
Texture Compositing OpenGL 1.2 Texture Fetching Texture Environment Fragment Color Texture Environment Texture Fetching Texture Environment 1 Tex0 Tex1 Specular Color Sum Specular Color Fog Application Fog Color/Factor
Compositing Operator Choice of 5 set functions for RGB and Alpha: Ct: texture color; At: texture alpha Cf: incoming fragment color; Af: incoming fragment alpha Cc: color assigned to GL_TEXTURE_ENV_COLOR Post-environment specular color addition and fog application Function RGB Alpha Replace Ct At Modulate Cf Ct Af At Decal Cf (1 – At) + Ct At Af Blend Cf (1 – Ct) + Cc Ct Add Cf + Ct
Pixel Shader (cont’d) Based on nVIDIA’s GF3/4 architecture Texture shader 4 texture units 23 different texture shader operations Conventional (1D, 2D, 3D, texture rectangle, cube map) Special case (none, pass through, cull fragment) Dependent texture fetches (result of one texture lookup affects texture coords for subsequent unit) Dependent textures fetches with dot product (and optional reflection) calculations Register combiners 8 stages (general combiners) on GeForce3/4 Per-stage constants
Pixel Shader Based on nVIDIA’s GF3/4 architecture Texture shader + register combiner texture shader fragment color input texture unit 0 texture program texture unit 1 texture program texture unit 2 texture program texture unit 3 texture program register combiner fragment color output
Bound Texture Target/Format Texture Shader Texture program example: conventional 2D texture Tex # Texture Coords (S,T,R,Q) Shader Operations Texture Fetch Bound Texture Target/Format Output Color 2D Any Format Texture 2D Si Ti i (Si,Ti,Ri,Qi) ( , ) (R,G,B,A) Qi Qi
Texture Shader (cont’d) Texture program example: pass through Shader Operations Texture Fetch Output Color (Si,Ti,Ri,Qi) None Bound Texture Target/Format (R,G,B,A) R = Clamp0to1(Si) G = Clamp0to1(Ti) B = Clamp0to1(Ri) A = Clamp0to1(Qi)
Texture Shader (cont’d) Texture program example: dependent texture Tex # Texture Coords (S,T,R,Q) Shader Operations Texture Fetch App specific Texture specific Any type Unsigned RGB[A] R0G0B0A0 1 (A0,R0) Bound Texture Target/Format R1G1B1A1 Ignored None 2D RGBA
Register Combiner GeForce 2 (only 2 general combiner stages) 4 RGB Inputs Fragment Color 4 Alpha Inputs General Combiner 3 RGB Outputs Specular Color 3 Alpha Outputs Fog Color/Factor 4 RGB Inputs 4 Alpha Inputs Register Set General Combiner 1 Texture 0 Texture Fetching 3 RGB Outputs 3 Alpha Outputs Texture 1 Spare 0 Specular Color Final Combiner 6 RGB Inputs 1 Alpha Input
Register Combiner (cont’d) Register-based programming All textures and colors available for each and every texture blending stage 8 Stages of blending in hardware, plus specular and fog Note that GeForce3 has 8 combiners, and 4 textures. Signed color arithmetic
Diagram of a General Combiner Input RGB, Alpha Registers Input Mappings RGB Function RGB Scale/Bias Next Combiner’s RGB Registers A A op1 B B RGB Portion C op2 D C AB op3 CD D Input Alpha, Blue Registers Input Mappings Alpha Function Alpha Scale/Bias Next Combiner’s Alpha Registers A AB Alpha Portion B CD C AB op4 CD D
General Combiner Input Registers Input RGB, Alpha Registers Input Mappings RGB Function RGB Scale/Bias Next Combiner’s RGB Registers A A op1 B B RGB Portion C op2 D C AB op3 CD D Input Alpha, Blue Registers Input Mappings Alpha Function Alpha Scale/Bias Next Combiner’s Alpha Registers A AB Alpha Portion B CD C AB op4 CD D
The Register Set Primary (diffuse) color initialized to RGBA of fragment’s primary color Secondary (specular) color initialized to RGB of fragment’s secondary/specular color alpha not initialized Texture 0 and Texture 1 colors initialized to fragment’s filtered RGBA texel from numbered texture unit not initialized if numbered texture unit is disabled or non-existent Spare 0 and Spare 1 Alpha of Spare 0 is initialized to alpha of Texture 0 color (if enabled) RGB of Spare 0 and all of Spare 1 is not initialized Fog RGB is current fog color alpha is fragment’s fog factor (only available in final combiner) read-only Constant color 0 and Constant color 1 initialized to user-defined RGBA value Zero constant, read-only value of zero
General Combiner Input Mappings Input RGB, Alpha Registers Input Mappings RGB Function RGB Scale/Bias Next Combiner’s RGB Registers A A op1 B B RGB Portion C op2 D C AB op3 CD D Input Alpha, Blue Registers Input Mappings Alpha Function Alpha Scale/Bias Next Combiner’s Alpha Registers A AB Alpha Portion B CD C AB op4 CD D
General Combiner Input Mappings Signed Identity f(x) = x [-1, 1] [-1, 1] Unsigned Identity f(x) = max(0, x) [0, 1] [0, 1] Expand Normal f(x) = 2 * max(0, x) - 1 [0, 1] [-1, 1] Half Bias Normal f(x) = max(0, x) – ½ [0, 1] [-½, ½] Signed Negate f(x) = -x [-1, 1] [1, -1] Unsigned Invert f(x) = 1-min(max(0,x),1) [0, 1] [1, 0] Expand Negate f(x) = -2 * max(0, x) + 1 [0, 1] [1, -1] Half Bias Negate f(x) = -max(0, x) + ½ [0, 1] [½, -½]
General Combiner RGB Function Input RGB, Alpha Registers Input Mappings RGB Function RGB Scale/Bias Next Combiner’s RGB Registers A A op1 B B RGB Portion C op2 D C AB op3 CD D Input Alpha, Blue Registers Input Mappings Alpha Function Alpha Scale/Bias Next Combiner’s Alpha Registers A AB Alpha Portion B CD C AB op4 CD D
General Combiner RGB Functions Dot / Dot / Discard Dot / Mult / Discard Mult / Dot / Discard A A A A • B A • B AB B B B C • D CD C • D C C C D D D Mult / Mult / Mux Mult / Mult / Sum A A AB AB B B CD CD C C mux(AB, CD) AB + CD D D mux(AB, CD) = (Spare0[Alpha] ½) ? AB : CD Dot products on RGB registers: A • B = (A[red] * B[red] + A[green] * B[green] + A[blue] * B[blue], A[red] * B[red] + A[green] * B[green] + A[blue] * B[blue], A[red] * B[red] + A[green] * B[green] + A[blue] * B[blue]) Multiplication on RGB registers: AB = (A[red] * B[red], A[green] * B[green], A[blue] * B[blue])
Diagram of the Final Combiner (OpenGL only) Input RGB, Alpha Registers Available RGB Inputs Input Mappings RGB Function Input Mappings A Multiplier E EF B F RGB Portion AB + (1-A)C + D RGB Out Color Sum Unit Clamp to [0, 1] C Spare0 Sum 2nd-ary Color D Input Alpha, Blue Registers Input Mapping Alpha Portion Alpha Out