Download presentation
Presentation is loading. Please wait.
Published byPeter Hart Modified over 9 years ago
1
Programmable Graphics Hardware CS 446: Real-Time Rendering & Game Technology David Luebke University of Virginia
2
2 David Luebke Real-Time Rendering Recap: Advanced Texturing Billboards –Screen-aligned, world-aligned Point sprites Imposters –Trees, buildings, portal textures, billboard clouds –Dynamic imposters for “caching” rendering results Depth textures Multitexturing –Low-res light maps, hi-res decals, etc
3
3 David Luebke Real-Time Rendering Textures: Other Important Stuff Render to texture – framebuffer objects (FBOs) –Multiple render targets Environment maps –Sphere map, cube maps (hardware supported) Shadow maps –A depth texture rendered from light source (more later) Relief textures –Demo now, details later
4
4 David Luebke Real-Time Rendering Textures: Still More Stuff Normal maps – especially for bump mapping –Gloss maps, reflectance maps, etc Generally: –Think of textures as global memory for fragment programs, with built-in filtering –Just starting to be able to access textures in vertex programs too (NVIDIA hardware only, today) Deferred shading Projective texture mapping
5
5 David Luebke Real-Time Rendering Next topic: Cg Many of the techniques we discuss in this class do not depend on programmable graphics hardware –But even those are often easier to implement! And programmable graphics opens up an endless number of tricks and techniques that could not have been efficiently implemented before So, the next topic is a brief intro to Cg –My apologies to those of you who’ve seen this –My apologies to those of you who haven’t
6
6 David Luebke Real-Time Rendering Acknowledgement & Aside Much of this lecture comes from Bill Mark’s SIGGRAPH 2002 course talk on NVIDIA’s programmable graphics technology For this reason, and because the lab is outfitted with NVIDIA cards, we will focus on NVIDIA tech I try to mention similarities and differences with ATI, the other main GPU vendor, in lecture and slides Note: many/most images are from NVIDIA as well
7
7 David Luebke Real-Time Rendering GPU The Graphics Pipeline A simplified graphics pipeline –Note that pipe widths vary –Many caches, FIFOs, and so on not shown CPU Application Transform & Light Rasterize Shade Video Memory (Textures) Graphics State Render-to-texture Assemble Primitives Vertices (3D) Final Pixels (Color, Depth) Fragments (pre-pixels) Screenspace triangles (2D) Xformed, Lit Vertices (2D)
8
8 David Luebke Real-Time Rendering GPU Pipeline: Transform Transform & light (a.k.a. vertex processor) –Transform from “world space” to “image space” –Compute per-vertex lighting Courtesy Mark Harris
9
9 David Luebke Real-Time Rendering GPU Pipeline: Rasterize Rasterizer –Convert geometric rep. (vertex) to image rep. (fragment) Fragment = image fragment –Pixel + associated data: color, depth, stencil, etc. –Interpolate per-vertex quantities across pixels Courtesy Mark Harris
10
GPU Pipeline: Shade Fragment processors (multiple in parallel) –Compute a color for each pixel –Optionally read colors from textures (images) Courtesy Mark Harris
11
11 David Luebke Real-Time Rendering GPU The Modern Graphics Pipeline CPU Application Transform & Light Rasterize Shade Video Memory (Textures) Graphics State Render-to-texture Assemble Primitives Vertices (3D) Final Pixels (Color, Depth) Fragments (pre-pixels) Screenspace triangles (2D) Xformed, Lit Vertices (2D) Programmable vertex processor! Programmable pixel processor! Fragment Processor Vertex Processor
12
12 David Luebke Real-Time Rendering GPU The Coming Soon Graphics Pipeline CPU Application Video Memory (Textures) Graphics State Render-to-texture Vertices (3D) Final Pixels (Color, Depth) Vertex Processor Fragments (pre-pixels) Screenspace triangles (2D) Xformed, Lit Vertices (2D) Rasterize Fragment Processor Assemble Primitives Programmable primitive assembly! More flexible memory access! Geometry Processor
13
13 David Luebke Real-Time Rendering 32-bit IEEE floating-point throughout pipeline –Framebuffer –Textures –Fragment processor –Vertex processor –Interpolants Precision
14
14 David Luebke Real-Time Rendering Multiple data types in hardware Can support 32-bit IEEE floating point throughout pipeline –Vertices, interpolants, framebuffer, textures, computations Fragment processor also supports: –16-bit “half” floating point, 12-bit fixed point –These may be faster than 32-bit Framebuffer/textures also support: –Large variety of fixed-point formats E.g., classical 8-bit per component RGBA, BGRA, etc. –These formats use less memory bandwidth than FP32
15
15 David Luebke Real-Time Rendering Vertex processor capabilities 4-vector FP32 operations Condition codes + true data-dependent control flow –Conditional branches, subroutine calls, jump table –Useful for avoiding extra work, e.g.: Don’t do animation, skinning if vertex will be clipped Do displacement mapping only for vertices near silhouette –Transcendental arithmetic instructions (e.g. COS) User clip-plane support Texture reads (up to 4 textures, unlimited lookups)
16
16 David Luebke Real-Time Rendering Vertex processor limitations No arbitrary memory write No “vertex kill” –Can put vertex off-screen –Can make degenerate primitives Only 32-bit texture formats supported
17
17 David Luebke Real-Time Rendering NV40-G70 vertex processor resources 65535 instructions per program Other statistics (NV30, not sure about NV40-G70): –16 temporary 4-vector registers –256 “uniform” parameter registers –2 address registers (4-vector) –6 clip-distance outputs
18
18 David Luebke Real-Time Rendering Fragment processor: texture mapping Texture reads are just another instruction Allows computed texture coordinates, nested to arbitrary depth –This is a big difference w/ NVIDIA and ATI right now Allows multiple uses of a single texture unit Optional LOD control – can specify filter extent Think of it as a memory-read instruction, with optional user-controlled filtering
19
19 David Luebke Real-Time Rendering Fragment processor capabilities Dynamic branching Conditional fragment-kill instruction Read access to window-space position Read/write access to fragment Z (but not stencil) Multiple render targets Built-in derivative instructions –Partial derivatives w.r.t. screen-space x or y –Useful for anti-aliasing shaders FP32, FP16, and fixed-point data
20
20 David Luebke Real-Time Rendering Fragment processor limitations Dynamic branching less efficient than vertex proc. –Especially for non-coherent branching (<~ 30x30 pixels) –Can do a lot with condition codes No indexed reads from registers –I.e., no indexed arrays –Must use texture reads instead No arbitrary memory write
21
21 David Luebke Real-Time Rendering Fragment processor resources 65535+ instructions Nearly unlimited constants –Each constant counts as one instruction 16 texture units (NV30, still?), reuse as often as desired 10 FP32 x 4 perspective-correct inputs (e.g. tex coords) Up to 4 128-bit framebuffer “color” outputs –Can pack as 4 x FP32, 8 x FP16, etc…) Can also set the depth output –24 or 32 bits, depending on stencil –Changing depth in fragment program may disable Z-optimizations
22
22 David Luebke Real-Time Rendering GPU vendor differences Note: this slide will be dated almost instantly NVIDIA: as described in previous slides ATI hardware today (1900XT current high-end part): –No vertex texture fetch (but good render-to-vertex-array) –Far fewer levels of computed texture coordinates –Better at fine-grained (less coherent) dynamic branching ATI Xenos (Xbox 360 chip): –Unified shader model: vertex proc == pixel proc –Scatter support: shaders can write arbitrary memory loc
23
23 David Luebke Real-Time Rendering Cg – “C for Graphics” Cg is a high-level GPU programming language Designed by NVIDIA and Microsoft Competes with the (quite similar) GL Shading Language, a.k.a GLslang
24
24 David Luebke Real-Time Rendering Programming in assembly is painful Easier to read and modify Cross-platform Combine pieces etc. Assembly … FRC R2.y, C11.w; ADD R3.x, C11.w, -R2.y; MOV H4.y, R2.y; ADD H4.x, -H4.y, C4.w; MUL R3.xy, R3.xyww, C11.xyww; ADD R3.xy, R3.xyww, C11.z; TEX H5, R3, TEX2, 2D; ADD R3.x, R3.x, C11.x; TEX H6, R3, TEX2, 2D; … … L2weight = timeval – floor(timeval); L1weight = 1.0 – L2weight; ocoord1 = floor(timeval)/64.0 + 1.0/128.0; ocoord2 = ocoord1 + 1.0/64.0; L1offset = f2tex2D(tex2, float2(ocoord1, 1.0/128.0)); L2offset = f2tex2D(tex2, float2(ocoord2, 1.0/128.0)); … Cg
25
25 David Luebke Real-Time Rendering Some points in the design space CPU languages –C – close to the hardware; general purpose –C++, Java, lisp – require memory management –RenderMan – specialized for shading Real-time shading languages –Stanford shading language –Creative Labs shading language
26
26 David Luebke Real-Time Rendering Design strategy Start with C (and a bit of C++) –Minimizes number of decisions –Gives you known mistakes instead of unknown ones Allow subsetting of the language Add features desired for GPU’s –To support GPU programming model –To enable high performance Tweak to make it fit together well
27
How are GPUs different from CPUs? 1.GPU is a stream processor –Multiple programmable processing units –Connected by data flows Application Vertex Processor Fragment Processor Assembly & Rasterization Framebuffer Operations Framebuffer Textures
28
28 David Luebke Real-Time Rendering Cg separates vertex & fragment programs Application Vertex Processor Fragment Processor Assembly & Rasterization Framebuffer Operations Framebuffer Textures Program
29
Cg programs have two kinds of inputs Varying inputs (streaming data) –e.g. normal vector – comes with each vertex –This is the default kind of input Uniform inputs (a.k.a. graphics state) –e.g. modelview matrix Note: Outputs are always varying vout MyVertexProgram( float4 normal, uniform float4x4 modelview) { …
30
Binding VP outputs to FP inputs a)Let compiler do it –Define a single structure –Use it for vertex-program output –Use it for fragment-program input struct vout { float4 color; float4 texcoord; … };
31
Binding VP outputs to FP inputs b)Do it yourself –Specify register bindings for VP outputs –Specify register bindings for FP inputs –May introduce HW dependence –Necessary for mixing Cg with assembly struct vout { float4 color : TEX3 ; float4 texcoord : TEX5; … };
32
Some inputs and outputs are special E.g. the position output from vert prog –This output drives the rasterizer –It must be marked struct vout { float4 color; float4 texcoord; float4 position : HPOS; };
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.