Computer Graphics 3 Lecture 4: GPU Programming Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora
Content 2 Benjamin Mora University of Wales Swansea Introduction. Vertex and Fragment Programs. Programming the GPU. –Assembly Code. –High Level Languages. Example of applications. Conclusion.
Introduction 3 Benjamin Mora University of Wales Swansea
Introduction 4 Benjamin Mora University of Wales Swansea OpenGL (SGI) early oriented the design of current graphics processors (GPUs). –Fixed pipeline. Once the different tests are passed, the fragment color is replaced by the new (textured & interpolated) one. –Not realistic enough. The graphics pipeline is fed with Primitives like Triangles, Points, etc… that are rasterized. Two main stages: –Vertex processing. –Fragment (rasterized pixel) processing. These 2 stages have been extended for more realism.
Introduction 5 Benjamin Mora University of Wales Swansea Latest evolutions –Unified shaders. Automatic graphical units balancing between vertex and fragment programs. The lower the image size is, the more cpu and vertex bound the program is. The greater the image-size is, the more fragment/pixel bound the program is. –Anti-aliasing and texture filtering parameters also contribute to this. –Geometry shaders discussed separately.
Vertex and Fragments Programs 6 Benjamin Mora University of Wales Swansea
Vertex and Fragment Programs 7 Benjamin Mora University of Wales Swansea Daniel Weiskopf, Basics of GPU-Based Programming,
Vertex and Fragment Programs 8 Benjamin Mora University of Wales Swansea Setup Rasterization Frame Buffer Blending Texture Fetch, Fragment Shading Tests (z, stencil…) Vertices Transform And Lighting Vertex Programs: User-Defined Vertex Processing Fragment Programs: User-Defined Per-Pixel Processing
Programming the GPU 9 Benjamin Mora University of Wales Swansea
Programming the GPU 10 Benjamin Mora University of Wales Swansea Low Level languages (Pseudo-assembler). –Help to understand what is possible on the GPU. –Large code is a pain to maintain/optimize. –May be specific to the graphics card generation/supplier. High Level languages. –Easier to write. –Early compilers were not very good. –Code may be more compatible. Loops.
Current Low Level Languages (APIs) 11 Benjamin Mora University of Wales Swansea DirectX 9. –Vertex shader 2.0. –Pixel shader 2.0. OpenGL extensions. –GL_ARB_vertex_program. –GL_ARB_fragment_program. Vendor APIs –NVidia vertex and fragment program.
Current High Level Languages (APIs) 12 Benjamin Mora University of Wales Swansea Microsoft, ATI. –High Level Shading Language (HLSL). NVidia. –Cg. OpenGL Shading Language.
How to use them? 13 Benjamin Mora University of Wales Swansea Assembly programs: –Can be loaded (and compiled) at run-time (OpenGL). –Several programs can be loaded at once. Applying the suitable rendering style (i.e. program) to every scene primitive. Avoid latency due to pseudo-assembly compilation. High level Programs: –Must be compiled before run-time. –The resulting (pseudo) assembly code can then be used.
Vertex Programs 14 Benjamin Mora University of Wales Swansea Vertex Program. –Bypass the T&L unit. –GPU instruction set to perform all vertex math. –Input: arbitrary vertex attributes. –Output: a transformed vertex attributes. homogeneous clip space position (required). colors (front/back, primary/secondary). fog coord. texture coordinates. Point size.
Vertex Programs 15 Benjamin Mora University of Wales Swansea Customized computation of vertex attributes –Computation of anything that can be interpolated linearly between vertices. Limitations: –Vertices can neither be generated nor destroyed. Geometry shader for that. –No information about topology or ordering of vertices is available.
Vertex Programs 16 Benjamin Mora University of Wales Swansea Vertex programs bypass the following OpenGL functionalities: –Vertex transformations. The modelview and projection matrix transformations. –Normal transformations and normalizations. –Color material. –Per-vertex lighting. –Texture coordinate generation. –Texture matrix transformations. –Raster position transformation. –Client-defined clip planes. –Per-vertex processing in EXT_point_parameters. –Per-vertex processing in NV_fog_distance. –Per-vertex point size computations.
Vertex Programs 17 Benjamin Mora University of Wales Swansea What is not replaced? –The view frustum clip. –Perspective divide (division by w). –The viewport transformation. –The depth range transformation. –Clamping the primary and secondary color to [0,1]. –Primitive assembly and per-fragment operations. –Evaluator (except the AUTO_NORMAL normalization).
NV Vertex Programs 18 Benjamin Mora University of Wales Swansea Different Versions: 1.0,1.1, 2.0, 3.0. Version 1.0: –12 temporary vectorial registers (xyzw): R0 => R11. –96 Read-Only vectorial registers (xyzw). Specified outside of glBegin/glEnd. –8 Matrices. –17 Different Vertex Programs instructions. (128 instruction Max. inside the program.) 27 in shader 3.0 model.
NV Vertex Programs 19 Benjamin Mora University of Wales Swansea Input Parameters for the vertices (v[]): MnemonicNumberTypical Meaning –OPOS 0 object position –WGHT 1 vertex weight –NRML2 normal –COL0 3 primary color –COL1 4 secondary color –FOGC 5 fog coordinate –TEX0 8 texture coordinate 0 –TEX1 9 texture coordinate 1 –TEX2 10 texture coordinate 2 –TEX3 11 texture coordinate 3 –TEX4 12 texture coordinate 4 –TEX5 13 texture coordinate 5 –TEX6 14 texture coordinate 6 –TEX7 15 texture coordinate 7
NV Vertex Programs 20 Benjamin Mora University of Wales Swansea New Output Values for the vertices (o[]): MnemonicTypical Meaning –HPOS Homogeneous clip space position (x,y,z,w) –COL0 Primary color (front-facing) (r,g,b,a) –COL1 Secondary color (front-facing) (r,g,b,a) –BFC0 Back-facing primary color (r,g,b,a) –BFC1 Back-facing secondary color (r,g,b,a) –FOGC Fog coordinate (f,*,*,*) –PSIZ Point size (p,*,*,*) –TEX0 Texture coordinate set 0 (s,t,r,q) –TEX1 Texture coordinate set 1 (s,t,r,q) –TEX2 Texture coordinate set 2 (s,t,r,q) –TEX3 Texture coordinate set 3 (s,t,r,q) –TEX4 Texture coordinate set 4 (s,t,r,q) –TEX5 Texture coordinate set 5 (s,t,r,q) –TEX6 Texture coordinate set 6 (s,t,r,q) –TEX7 Texture coordinate set 7 (s,t,r,q)
NV Vertex Programs 21 Benjamin Mora University of Wales Swansea Vertex Program Instructions: OpCodeInputs Output Operation (scalar or vector) (vector or replicated scalar) ARL s address register address register load MOV v v move MUL v,v v multiply ADD v,v v add MAD v,v,v v multiply and add RCP s ssss reciprocal RSQ s ssss reciprocal square root DP3 v,v ssss 3-component dot product DP4 v,v ssss 4-component dot product DST v,v v distance vector MIN v,v v minimum MAX v,v v maximum SLT v,v v set on less than SGE v,v v set on greater equal than EXP s v (ssss?)exponential base 2 LOG s v (ssss?) logarithm base 2 LIT v v light coefficients
NV Vertex Programs 22 Benjamin Mora University of Wales Swansea Special Instruction Manipulation: –Use of Negated Values: MOV R0,-R1; ADD R0,R1,-R2; # R0 <= R1-R2 (vectorial operation.) –Registers can be Swizzled: MOV R1,R1.wzyx; ADDR R1,R1,R1.xzxy; x y z w –Old R1: –New R1:
NV Vertex Programs 23 Benjamin Mora University of Wales Swansea Example: Normal Normalization. # v[NRML] = (nx,ny,nz) # # R0.xyz = normalize(v[NRML]) # R0.w = 1/sqrt(nx*nx + ny*ny + nz*nz) # !!VP1.0 MOV R1, v[NRML] ; DP3 R0.w, R1, R1; RSQ R0.w, R0.w; MUL R0.xyz, R1, R0.wwww; # Then use R0 to compute shading... MOV o[COL0],...
NV Vertex Programs 24 Benjamin Mora University of Wales Swansea #simple specular and diffuse lighting computation with an eye-space normal? !!VP1.0 # # c[0-3] = modelview projection (composite) matrix # c[4-7] = modelview inverse transpose # c[32] = normalized eye-space light direction (infinite light) # c[33] = normalized constant eye-space half-angle vector (infinite viewer) # c[35].x = pre-multiplied monochromatic diffuse light color & diffuse material # c[35].y = pre-multiplied monochromatic ambient light color & diffuse material # c[36] = specular color # c[38].x = specular power # # outputs homogenous position and color # DP4 o[HPOS].x, c[0], v[OPOS]; DP4 o[HPOS].y, c[1], v[OPOS]; DP4 o[HPOS].z, c[2], v[OPOS]; DP4 o[HPOS].w, c[3], v[OPOS]; DP3 R0.x, c[4], v[NRML]; DP3 R0.y, c[5], v[NRML]; DP3 R0.z, c[6], v[NRML]; # R0 = n' = transformed normal DP3 R1.x, c[32], R0; # R1.x = Lpos DOT n' DP3 R1.y, c[33], R0; # R1.y = hHat DOT n' MOV R1.w, c[38].x; # R1.w = specular power LIT R2, R1; # Compute lighting values MAD R3, c[35].x, R2.y, c[35].y; # diffuse + emissive MAD o[COL0].xyz, c[36], R2.z, R3; # + specular END
NV Fragment Programs 25 Benjamin Mora University of Wales Swansea Similar to the Vertex Programs. –Same way to load programs. –Inputs and Outputs are differents. –Different Set of instructions. More instructions, but tend to be the same… Versions available: 1.0, 2.0, and 4.0. –64 constant vector registers. –32 32-bit floating point precision registers or bit floating point precision registers.
NV Fragment Programs 26 Benjamin Mora University of Wales Swansea Fragment Program Inputs Register NameDescription f[WPOS] Position of the fragment center. (x,y,z,1/w) f[COL0] Interpolated primary color (r,g,b,a) f[COL1] Interpolated secondary color (r,g,b,a) f[FOGC] Interpolated fog distance/coord (z,0,0,0) f[TEX0] Texture coordinate (unit 0) (s,t,r,q) f[TEX1] Texture coordinate (unit 1) (s,t,r,q) f[TEX2] Texture coordinate (unit 2) (s,t,r,q) f[TEX3] Texture coordinate (unit 3) (s,t,r,q) f[TEX4] Texture coordinate (unit 4) (s,t,r,q) f[TEX5] Texture coordinate (unit 5) (s,t,r,q) f[TEX6] Texture coordinate (unit 6) (s,t,r,q) f[TEX7] Texture coordinate (unit 7) (s,t,r,q)
NV Fragment Programs 27 Benjamin Mora University of Wales Swansea Fragment Program Outputs Register NameDescription o[COLR] Final RGBA fragment color, fp32 format (color programs) o[COLH] Final RGBA fragment color, fp16 format (color programs) o[DEPR] Final fragment depth value, fp32 format o[TEX0] TEXTURE0 output, fp16 format (combiner programs) o[TEX1] TEXTURE1 output, fp16 format (combiner programs) o[TEX2] TEXTURE2 output, fp16 format (combiner programs) o[TEX3] TEXTURE3 output, fp16 format (combiner programs) Write access only!
NV Fragment Programs 28 Benjamin Mora University of Wales Swansea Fragment Program Instruction Set (V2.0) InstructionInputsOutputDescription ADD[RHX][C][_SAT] v,v v add COS[RH ][C][_SAT] s ssss cosine DDX[RH ][C][_SAT] v v derivative relative to x DDY[RH ][C][_SAT] v v derivative relative to y DP3[RHX][C][_SAT] v,v ssss 3-component dot product DP4[RHX][C][_SAT] v,v ssss 4-component dot product DST[RH ][C][_SAT] v,v v distance vector EX2[RH ][C][_SAT] s ssss exponential base 2 FLR[RHX][C][_SAT] v v floor FRC[RHX][C][_SAT] v v fraction KIL none none conditionally discard fragment LG2[RH ][C][_SAT] s ssss logarithm base 2 LIT[RH ][C][_SAT] v v compute light coefficients LRP[RHX][C][_SAT] v,v,v v linear interpolation MAD[RHX][C][_SAT] v,v,v v multiply and add MAX[RHX][C][_SAT] v,v v maximum MIN[RHX][C][_SAT] v,v v minimum MOV[RHX][C][_SAT] v v move MUL[RHX][C][_SAT] v,v v multiply PK2H v ssss pack two 16-bit floats PK2US v ssss pack two unsigned 16-bit scalars PK4B v ssss pack four signed 8-bit scalars PK4UB v ssss pack four unsigned 8-bit scalars POW[RH ][C][_SAT] s,s ssss exponentiation (x^y)
NV Fragment Programs 29 Benjamin Mora University of Wales Swansea Fragment Program Instruction Set (V2.0) InstructionInputsOutputDescription RCP[RH ][C][_SAT] s ssss reciprocal RFL[RH ][C][_SAT] v,v v reflection vector RSQ[RH ][C][_SAT] s ssss reciprocal square root SEQ[RHX][C][_SAT] v,v v set on equal SFL[RHX][C][_SAT] v,v v set on false SGE[RHX][C][_SAT] v,v v set on greater than or equal SGT[RHX][C][_SAT] v,v v set on greater than SIN[RH ][C][_SAT] s ssss sine SLE[RHX][C][_SAT] v,v v set on less than or equal SLT[RHX][C][_SAT] v,v v set on less than SNE[RHX][C][_SAT] v,v v set on not equal STR[RHX][C][_SAT] v,v v set on true SUB[RHX][C][_SAT] v,v v subtract TEX[C][_SAT] v v texture lookup TXD[C][_SAT] v,v,v v texture lookup w/partials TXP[C][_SAT] v v projective texture lookup UP2H[C][_SAT] s v unpack two 16-bit floats UP2US[C][_SAT] s v unpack two unsigned 16-bit scalars UP4B[C][_SAT] s v unpack four signed 8-bit scalars UP4UB[C][_SAT] s v unpack four unsigned 8-bit scalars X2D[RH ][C][_SAT] v,v,v v 2D coordinate transformation
NV Fragment Programs 30 Benjamin Mora University of Wales Swansea Simple Example: Red Colouring of the fragments (i.e., rasterized pixels): !!FP1.0 DEFINE red={1.0,0,0,0}; MOV o[COLR], red; END Simple Example: Applying Single Texturing. !!FP1.0 TEX R0, f[TEX0],TEX0, 2D; //Last Parameter can be 1D,2D,3D,RECT MOV o[COLR],R0; END
NV Fragment Programs 31 Benjamin Mora University of Wales Swansea Useful Instructions: –LRP: Linear Interpolation. –SIN, COS… –SGE,SLT, … : Set the comparison flags. –KILL : Stop the pixel computation. –Pack and Unpack instructions. Most instructions are done in 1 cycle (without allowing for texture access). Most instructions can conditionally update the result according the comparison flags (e.g., MOV => MOVC) Most instructions can clamp the results between 0 and 1. –MOV => MOV_SAT. Loops are now possible with the latest generation.
(Silly) Limitations 32 Benjamin Mora University of Wales Swansea Most of the limitations are for performance reasons. At the fragment level, there is no real possibility to access the frame-buffer in read-write mode. –The new pixel value cannot be computed from the old one. –Floating-point precision filtering and blending only available in recent graphics cards (NV 8x00 generation). Previous cards (e.g., GeForce 7800 series) could only filter and blend at a FP16 precision. –Actual number of registers may be less than the number of logical registers. Slower programs if a large number of registers is used.
High Level Languages 33 Benjamin Mora University of Wales Swansea Why ? –Assembly programming can be tedious when having long assembly shaders. –Inefficient or difficult programming and debugging operations. –High-level languages are more portable. But: –Final code may be slower.
High Level Languages: Cg Overview 34 Benjamin Mora University of Wales Swansea C for Graphics. –Syntax similar to C for easy shader writing. –See CG manual. The Vertex and Fragments programs take specific input vectors and values, and have to return specific outputs. Need to declare data structures that will be input and output parameters of a function.
Cg: Inputs 35 Benjamin Mora University of Wales Swansea Two kinds of shader inputs: –Varying Inputs. Inputs that are specific to each entity processed. –Vertex: Position, Normals, etc… –Fragment: Interpolated values like colors, texture coordinates, etc… –Uniform Inputs. Values that do not change when streaming vertices. –Vertex level: Transformation Matrix. –Fragment Level: Constant parameters,…
Cg: Vertex Program Inputs 36 Benjamin Mora University of Wales Swansea Supported Inputs to a CG Vertex Program (Binding semantics). –POSITION. –BLENDWEIGHT. –NORMAL. –TANGENT. –BINORMAL. –PSIZE. –BLENDINDICES. –TEXCOORD0—TEXCOORD7. Every parameter can be declared as a float array with a range of 1 to 4 components. (float, float4,…). –float3 myPosition : POSITION;
Cg: Vertex Program Inputs 37 Benjamin Mora University of Wales Swansea Example from the CG user Manual. struct myinputs { float3 myPosition : POSITION; float3 myNormal : NORMAL; float3 myTangent : TANGENT; float refractive_index : TEXCOORD3; }; outdata foo(myinputs indata) { /*... */ // Within the program, the parameters are referred to as // “indata.myPosition”, “indata.myNormal”, and so on. /*... */ }
Cg: Vertex Program Inputs 38 Benjamin Mora University of Wales Swansea Inputs can be directly specified (rather than using a struct operator). Example from the CG user Manual: outdata foo(float3 myPosition : POSITION, float3 myNormal : NORMAL, float3 myTangent : TANGENT, float refractive_index : TEXCOORD3) { /*... */ }
Cg: Vertex Program Varying Output 39 Benjamin Mora University of Wales Swansea The vertex program output type should match the fragment programs input type. The binding semantics will help the compiler to associate the vertex output to the fragment input (interoperability). The semantics do not actually impose a specific use for those channels. –Texture coordinates can be used to specify colors or locations for example.
Cg: Vertex Program Varying Output 40 Benjamin Mora University of Wales Swansea Supported outputs to a Vertex Program. –POSITION. –PSIZE. –FOG. –COLOR0–COLOR1. –TEXCOORD0–TEXCOORD7.
Cg: Vertex Program Varying Output 41 Benjamin Mora University of Wales Swansea Example from the CG user Manual: // Vertex program (inside a CG file…) struct myvf { float4 pout : POSITION; // Used for rasterization float4 diffusecolor : COLOR0; float4 uv0 : TEXCOORD0; float4 uv1 : TEXCOORD1; }; myvf foo(/*... */) { myvf outstuff; /*... */ return outstuff; }
Cg: Input/Output Interoperability 42 Benjamin Mora University of Wales Swansea Example from the CG user Manual: struct myvert2frag { float4 pos : POSITION; float4 uv0 : TEXCOORD0; float4 uv1 : TEXCOORD1; }; // Vertex program myvert2frag vertmain(...) { myvert2frag outdata; /*... */ return outdata; } // Fragment program void fragmain(myvert2frag indata ) { float4 tcoord = indata.uv0; /*... */ }
Cg: Fragment Program Varying Output 43 Benjamin Mora University of Wales Swansea Two supported outputs: COLOR and DEPTH. Examples: void main(/*... */, out float4 color : COLOR, out float depth : DEPTH) { /*...*/ color = diffuseColor * /*...*/; depth = /*...*/; } float4 main(/*... */) : COLOR { /*... */ return diffuseColor * /*... */; }
Cg: General Coding 44 Benjamin Mora University of Wales Swansea Different type of variables are supported and declarable: –float, half (16 bits), fixed (12 bits). –int, bool. –float1, float4, bool4, bool1,… –float1x1, float2x2,… –Arrays. Can declare auxiliary functions. A wide set of functions and operators is also available.
Cg: General Coding 45 Benjamin Mora University of Wales Swansea Control flow. –if, else, while, for. Function definitions and function overloads. Arithmetic operators from C. Multiplication function. –Matrix x Vector, Vector x Matrix, Matrix x Matrix. Vector constructor. Boolean and comparison operators. Swizzle operator. –float4 a; =>a.xxxx; Write mask operator. –float4 color = float4(1.0, 1.0, 0.0, 0.0); color.a=2.0; Conditional operator.
Cg: General Coding 46 Benjamin Mora University of Wales Swansea Standard nonprojective texture lookup: –tex2D (sampler2D tex, float2 s); –texRECT (samplerRECT tex, float2 s); –texCUBE (samplerCUBE tex, float3 s); Standard projective texture lookup: –tex2Dproj (sampler2D tex, float3 sq); –texRECTproj (samplerRECT tex, float3 sq); –texCUBEproj (samplerCUBE tex, float4 sq); Math functions: –abs, cos, sin, tan, acos, asin, atan, clamp, determinant, exp, log, floor, lerp, min, max, pow, sqrt, normalize, …
Applications 47 Benjamin Mora University of Wales Swansea
Application: Procedural Texturing 48 Benjamin Mora University of Wales Swansea ref: new york university media research lab, Application of textures that are not image based. –Combination of noise and various math expressions. (Perlin Noise.) –Representation of Wood, Marble, Stone, Clouds, Waves, Bumps… –Can be computed at the fragment level. –Adds computations, but reduces bandwidth. –Suppresses the issue of texturing curved surfaces.
Application: Phong Shading 49 Benjamin Mora University of Wales Swansea ref: new york university media research lab, Traditional OpenGL pipeline implements Gouraud (shading) interpolation. –Computation of colors and lighting at the vertices, followed by a linear interpolation. –Can miss the specular highlights that can occur in the middle of a triangle. Phong interpolation is better. –Linearly interpolate the normal across the triangle first. –Then compute Phong shading from the interpolated normal.
Application: Phong Shading 50 Benjamin Mora University of Wales Swansea Ian Fergusson,
Application: Phong Shading 51 Benjamin Mora University of Wales Swansea How to realize a Phong interpolation ? –Pass the normal as a texture coordinate at the vertex level. –The texture coordinates will be automatically interpolated at the fragment level. –Normalize the normal in the fragment program first, and then compute a Phong shading.
Other Applications 52 Benjamin Mora University of Wales Swansea Bump Mapping. –Can be done at the vertex or at the fragment level. Volume Rendering. –Use of 3D textures. GPGPU. –General Processing on Graphics Processor Unit. –A lot of GFLOPS… –Scientific calculations like Fourier transforms. Geometry modification (Animation, Morphing…).