Richard Thomson DAZ 3D
Direct3D 11 CTP in November 2008 DirectX SDK Vista (and beyond) only, not on XP Evolution of Direct3D 10 Compatible with D3D 10 cards
Evolution of Direct3D Direct3D 9 Stable, been around for a while Last version to be deployed on Win XP Direct3D 10 First Vista-only version Big change from D3D 9 Direct3D 10.1 Incremental tweak to D3D 10
Direct3D 10/10.1/11 vs. 9 Enumeration factored out to DXGI Same DXGI used for 10, 10.1 and 11 Divide render/texture states into chunks Chunks of state are immutable objects “Device state” consists of set of assigned state chunks Introduces new shader stages beyond vertex and pixel shaders Tighter API specification => no CAPS
Direct3D 11 Focus Scalability and performance Improving the development experience Extending the reach of the GPU
Direct3D 11 New Features Tessellation Compute Shader Multithreading Shader Subroutines Improved Texture Compression Other Features
Tessellation Direct3D 10 pipeline Plus Three new stages for Tessellation Input Assembler Vertex Shader Pixel Shader Hull Shader Rasterizer Output Merger Tessellator Domain Shader Geometry Shader Stream Output
Hull Shader Tessellator Domain Shader HS output: Patch control pts after Basis conversion HS output: TessFactors (how much to tessellate) fixed tessellator mode declarations HS input: patch control pts One Hull Shader invocation per patch
Hull Shader Syntax [patchsize(12)] [patchconstantfunc(MyPatchConstantFunc)] MyOutPoint main(uint Id : SV_ControlPointID, InputPatch InPts) { MyOutPoint result; … result = TransformControlPoint( InPts[Id] ); return result; }
Tessellator Hull Shader TS input: TessFactors (how much to tessellate) fixed tessellator mode declarations TS output: U V {W} domain points TS output: topology (to primitive assembly) Note: Tessellator does not see control points Tessellator operates per patch
Domain Shader Hull Shader Tessellator DS input: U V {W} domain points DS input: control points TessFactors DS output: one vertex One Domain Shader invocation per point from Tessellator
Domain Shader Syntax void main( out MyDSOutput result, float2 myInputUV : SV_DomainPoint, MyDSInput DSInputs, OutputPatch ControlPts, MyTessFactors tessFactors ) { … result.Position = EvaluateSurfaceUV( ControlPoints, myInputUV ); }
Single Pass Example displacement map Evaluate surface including displacement domain shader patch control points Animate/skin Control Points transformed control points vertex shader Transform basis, Determine how much to tessellate control points in Bezier patch U V {W} domain points Sub-D Patch Bezier Patch hull shader Tess Factors Tessellate! tessellator
Current Authoring Pipeline (Rocket Frog Taken From Loop &Schaefer, "Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches“) Sub-D Modeling AnimationDisplacement Map Polygon Mesh Generate LODs
New Authoring Pipeline (Rocket Frog Taken From Loop &Schaefer, "Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches“) Sub-D Modeling AnimationDisplacement Map Optimally Tessellated Mesh GPU
Tessellation Summary Helps us get closer to eliminating “pointy heads” Scales visual quality across PC hardware configurations Supports performance increases Coarse model = compression, faster I/0 to GPU Rendering tailored to each end user’s hardware Better cross-platform (Windows + Xbox 360) development experience Xbox 360 has a subset of D3D11’s tessellation Parity = ease of cross-platform development Extra features = innovation for Windows gaming Render content as the artist created it!
More on Tessellation GameFest 2008 Slides and Audio “Direct3D 11 Tessellation” ○ Kev Gee, Microsoft “Advanced Topics in GPU Tessellation” ○ Natasha Tatarchuk, AMD/ATI “Water-Tight, Textured, Displaced Subdivision Surface Tessellation Using Direct3D 11” ○ Ignacio Castano, NVIDIA
General Purpose GPU Data Parallel Computing GPU performance continues to grow Many applications scale well to massive parallelism without tricky code changes Direct3D is the API for talking to GPU How do we expand Direct3D to GPGPU?
Compute Shader Direct3D 10 pipeline Plus Three new stages for Tessellation Plus Compute Shader Input Assembler Vertex Shader Pixel Shader Hull Shader Rasterizer Output Merger Tessellator Domain Shader Geometry Shader Stream Output Compute Shader Data Structure
Integrated with Direct3D Fully supports all Direct3D resources Targets graphics/media data types Evolution of DirectX HLSL Graphics pipeline updated to emit general data structures… …which can then be manipulated by compute shader… And then rendered by Direct3D again
Target Applications Image/Post processing: Image Reduction Image Histogram Image Convolution Image FFT A-Buffer/OIT Ray-tracing, radiosity, etc. Physics AI
Computing a Histogram Histogram() { shared int Histograms[16][256];// array of 16 float3 vPixel = load( sampler, sv_ThreadID ); float fLuminance = dot( vPixel, LUM_VECTOR ); int iBin = fLuminance*255.0f; // compute bin to increment int iHist = sv_ThreadIDInGroup & 16; // use thread index Histograms[iHist][iBin] += 1; // update bin // enable all threads in group to complete SynchronizeThreadGroup;
Computing a Histogram 2 // Write register histograms out to memory: iBin = sv_ThreadIDInGroup.x; if (sv_ThreadID.x < 256) { for (iHist = 0; iHist < 16; iHist++) { int2 destAddr = int2(iHist, iBin); OutputResource.add(destAddr, Histograms[iHist][iBin]); // atomic }
Compute Shader Summary Enables much more general algorithms Transparent parallel processing model Full cross-vendor support Broadest possible installed base GameFest 2008: “Direct3D 11 Compute Shader – More Generality for Advanced Techniques” ○ Chas Boyd, Microsoft
Multithreading Enables distribution across threads of Application code Runtime Driver Device: free threaded resource creation Immediate Context: your single primary device for state & draws Deferred Contexts: your per-thread devices for state & draws Display Lists: Recorded sequence of graphics commands Requires a driver update
Shader Subroutines Details Calls must be fast Binding applies to all primitives in a Draw call Binding operation must be fast Need parameter passing mechanism Need access to textures, samplers, etc. Advantages Reduce register usage in Über-shaders ○ Not worst case of all if statements Allows specialization of subroutines
Improved Texture Compression Why? Existing block palette interpolations too simple Results often rife with blocking artifacts No high dynamic range (HDR) support
New Texture Formats BC6 (aka BC6H) High dynamic range 6:1 compression (16 bpc RGB) Targeting high (not lossless) visual quality BC7 LDR with alpha 3:1 compression for RGB or 4:1 for RGBA High visual quality
Compression of New Formats Block compression (unchanged) Each block independent Fixed compression ratio Multiple block types (new) Tailored to different types of content Smooth gradients vs. noisy normal maps Varied alpha vs. constant alpha Decompression results must be bit- accurate with spec
Comparison Results 1 OrigBC3 OrigBC7 Abs Error
Comparison Results 2 OrigBC3 OrigBC7 Abs Error
Comparison Results 3 Abs Error HDR Original at given exposure BC6 at given exposure
Other Features Addressable Stream Out Draw Indirect Pull-model attribute eval Improved Gather4 Min-LOD texture clamps 16K texture limits Required 8-bit subtexel, submip filtering precision Conservative oDepth 2 GB Resources Geometry shader instance programming model Optional double support Read-only depth or stencil views
Thanks Allison Klein Senior Lead Program Manager Direct3D Microsoft Chas. Boyd Architect Windows Desktop & Gaming Technology Microsoft
Thank you to our Sponsors!