Advanced Programmable Shading: Beyond Per-vertex and Per-pixel Shading
T & L setup rasterizer texture blending fb anti-alias vertex transform and lighting per-pixel texture The DirectX7™ Graphics Pipeline (GeForce/GeForce2)
curved surfaces vertex shaders setup rasterizer tex-addr ops texture blending fb antialias shadows 3d tex per-vertex shading per-pixel shading The DirectX8™ Graphics Pipeline (GeForce3) programmable
Programmable Vertex Processing GeForce family introduced hardware T&L to the PC Transform and Lighting GeForce3 (next generation) makes T&L user programmable Vertex programs Developers can now write custom Vertex Transformation Vertex Lighting Special effects (layered fog, volumetric lighting, morphing…)
Custom Substitute for Standard T&L Constant Memory 96 entries 128 bits 4 floats 16 entries 13 entries 12 entries Vertex Input Vertex Output Registers Programmable Vertex Processor A0 128 instructions addr data 128 bits 4 floats 128 bits 4 floats 128 bits 4 floats addr data
curved surfaces vertex shaders setup rasterizer tex-addr ops texture blending fb antialias shadows 3d tex per-vertex shading per-pixel shading What do Developers do with all of this Programmability? programmable ? ?
curved surfaces vertex shaders setup rasterizer tex-addr ops texture blending fb antialias shadows 3d tex per-vertex shading per-pixel shading So far, mostly… programmable ? ? T & L texture blending (better)
Wassup wit dat? The advent of programmable vertex shaders and pixel shaders in hardware is the most fundamental change in graphics for a long time! But, many developers do not use them in their games, or only use them to map simple multipass shaders into a single pass
There are Bigger Opportunities A complex rendering technique can be "factored" into components executed on CPU, vertex shader, and pixel shader The true power of programmable vertex and pixel processing lies in the programmers‘ ability to map more complex and varied algorithms onto the hardware
Instead of... CPU does Game code AI Physics Scene management GPU does T&L Rasterization Texturing / Shading Drawing Triangles & Textures CPU GPU Triangles & Textures
Think in terms of... Higher level algorithms are mapped across both CPU & GPU CPU still does Game code, AI, Physics, Scene management GPU still does T&L, Rasterization, Texturing / Shading, Drawing And, MORE Partial Results CPU GPU Data Partial Results
New Things to do with Vertex/Pixel Shaders (Automatic!!!) Shadow Volume Generation for Stencil Shadows Order-independent Transparency (Depth Peeling) Requires DX8.1 PS1.3 texm3x2depth Motion Blur / Depth of Field
Stenciled Shadow Volumes Powerful technique, excellent results possible
Review: Stenciled Shadow Volumes A single point light source splits the world into Shadowed regions Unshadowed regions The surface of a shadow volume is the boundary between these shadowed and unshadowed regions The idea: determine if an object is inside the boundary of the shadowed region and thereby know if the object is shadowed First described by [Crow 77]
Visualizing Shadow Volumes Occluders and light source cast out a shadow volume Objects within the volume should be shadowed Light source Scene with shadows from an NVIDIA logo casting a shadow volume Visualization of the shadow volume
Shadow Volume Algorithm High-level view of the algorithm: Given the scene and a light source position, determine the shadow volume (harder to do than it sounds, but we’re going to make it easy!) Render the scene in two passes Draw scene with the light enabled, updating only fragments in unshadowed region Draw scene with the light disabled, updated only fragments in shadowed region We’re going to focus on the Shadow Volume Generation part of the problem
Computing Shadow Volumes Harder than you might think Easy for a single triangle, just project out three infinite polygons from the triangle, opposite the light position For complex objects, projecting object’s 2D silhouette is a good approximation, but calculating this is hard Two other new GPU Vertex Shader techniques Quick and dirty (fast, cheap, doesn’t always work) Robust (always works, but costs more computes)
The Hard Way: Computing Shadow Volumes for Polygonal Models High-level: determine “possible silhouette” edges of the model Transform light into object space Compute the plane equation for every polygon in the model (can be pre-computed for static models) For every polygon in the model, determine if the object-space light position is behind or in front of the polygon’s plane I.e., Is the planar distance from the polygon’s plane to the light positive or negative? Search for edges where polygons have opposite facingness toward the light These edges are possible silhouette edges
Quick and Dirty Way: Stencil Shadow Volume Generation with Vertex Shaders Use your polygon model (un-adulterated) Vertex shader tests vertex normal N \dot L Front-facing vertices are unchanged Back-facing vertices are pushed to FAR Has the effect of extruding approximate silhouette edges away from light to produce shadow volumes APPROXIMATE – only works for closed, highly & smoothly tesselated objects (imagine a single triangle, or a cube)
Robust Way: Stencil Shadow Volume Generation with Vertex Shaders Adulterate your polygon model Add degenerate quads at every edge Normals for new degenerate quad vertices come from real geometry Vertex shader tests vertex normal N \dot L Front-facing vertices are unchanged Back-facing vertices are pushed to FAR Has the effect of extruding silhouette edges away from light to produce shadow volumes Always works, for all objects (imagine a single triangle)
Order-Independent Transparency: Good…Bad.
Order-Independent Transparency (Depth Peeling) The algorithm uses an “implicit sort” to extract multiple depth layers First pass render finds front-most fragment color/depth Each successive pass render finds (extracts) the fragment color/depth for the next-nearest fragment on a per pixel basis Use dual depth buffers to compare previous nearest fragment with current Second “depth buffer” used for comparison (read only) from texture
Layer 0Layer 1 Layer 2Layer 3
0 depth 1 Layer 0 Layer 1Layer 2 Depth peeling strips away depth layers with each successive pass. The frames above show the frontmost (leftmost) surfaces as bold black lines, hidden surfaces as thin black lines, and “peeled away” surfaces as light grey lines. 0 depth 1
Pseudo-code for (i=0; i<num_passes; i++) { clear color buffer A = i % 2 B = (i+1) % 2 depth unit 0: if(i == 0) disable depth test else enable depth test bind buffer A disable depth writes; set depth func to GREATER depth unit 1: bind buffer B clear depth buffer enable depth writes; enable depth test; set depth func to LESS render scene save color buffer RGBA as layer I }
1 layer2 layers 3 layers4 layers
Using Vertex/Pixel Shaders to do Motion Blur / Depth of Field Advertisement: Matthias Wloka from Nvidia will talk about this tomorrow (it’s cool stuff!) Motion blur screenshot:
Depth of Field: Screenshots
Aggressive use of Vertex Shaders is SAFE and IMPORTANT It’s “safe” to use vertex shaders pervasively Many hardware platforms have them Mainstream GPUs will have them this fall CPUs can emulate vertex shaders adequately, so CPU fallback is OK It’s “important” to design content for vertex and pixel shaders Adopt vertex and pixel programming, and author content from the top-down It’s much much easier to scale down and fallback than to scale up
Acknowledgements Thanks to John Carmack, Rui Bastos, Mark Kilgard, Sim Dietrich, Matthew Papakipos, Cem Cebonoyan, Greg James, Matthias Wloka, Erik Lindholm, Doug Rogers, Cass Everitt, (and others I forgot!) for contributing ideas, slides, demos, and images. Demo/Example source code and more whitepapers can be found at