Advanced D3D10 Rendering Emil Persson May 24, 2007.

Slides:



Advertisements
Similar presentations
Introduction to Direct3D 10 Course Porting Game Engines to Direct3D 10: Crysis / CryEngine2 Carsten Wenzel.
Advertisements

Real-Time Rendering 靜宜大學資工研究所 蔡奇偉副教授 2010©.
COMPUTER GRAPHICS SOFTWARE.
An Optimized Soft Shadow Volume Algorithm with Real-Time Performance Ulf Assarsson 1, Michael Dougherty 2, Michael Mounier 2, and Tomas Akenine-Möller.
Practical Clustered Shading
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Graphics Pipeline.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
CS 4363/6353 BASIC RENDERING. THE GRAPHICS PIPELINE OVERVIEW Vertex Processing Coordinate transformations Compute color for each vertex Clipping and Primitive.
Filtering Approaches for Real-Time Anti-Aliasing
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
9/25/2001CS 638, Fall 2001 Today Shadow Volume Algorithms Vertex and Pixel Shaders.
The Programmable Graphics Hardware Pipeline Doug James Asst. Professor CS & Robotics.
CGDD 4003 THE MASSIVE FIELD OF COMPUTER GRAPHICS.
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering Cass Everitt and Mark J. Kilgard Speaker: Alvin Date: 5/28/2003 NVIDIA.
A Crash Course on Programmable Graphics Hardware Li-Yi Wei 2005 at Tsinghua University, Beijing.
Approximate Soft Shadows on Arbitrary Surfaces using Penumbra Wedges Tomas Akenine-Möller Ulf Assarsson Department of Computer Engineering, Chalmers University.
The Graphics Pipeline CS2150 Anthony Jones. Introduction What is this lecture about? – The graphics pipeline as a whole – With examples from the video.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.
Introduction to 3D Graphics John E. Laird. Basic Issues u Given a internal model of a 3D world, with textures and light sources how do you project it.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
University of Texas at Austin CS 378 – Game Technology Don Fussell CS 378: Computer Game Technology Beyond Meshes Spring 2012.
High Performance in Broad Reach Games Chas. Boyd
Shadow Algorithms Ikrima Elhassan.
GAM532 DPS932 – Week 1 Rendering Pipeline and Shaders.
Ultimate Graphics Performance for DirectX 10 Hardware Nicolas Thibieroz European Developer Relations AMD Graphics Products Group
© Copyright Khronos Group, Page 1 Harnessing the Horsepower of OpenGL ES Hardware Acceleration Rob Simpson, Bitboys Oy.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
Geometric Objects and Transformations. Coordinate systems rial.html.
Week 2 - Friday.  What did we talk about last time?  Graphics rendering pipeline  Geometry Stage.
1 Dr. Scott Schaefer Programmable Shaders. 2/30 Graphics Cards Performance Nvidia Geforce 6800 GTX 1  6.4 billion pixels/sec Nvidia Geforce 7900 GTX.
CSE 690: GPGPU Lecture 6: Cg Tutorial Klaus Mueller Computer Science, Stony Brook University.
Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.
Advanced Computer Graphics Advanced Shaders CO2409 Computer Graphics Week 16.
GRAPHICS PIPELINE & SHADERS SET09115 Intro to Graphics Programming.
Shader Study 이동현. Vision engine   Games Helldorado The Show Warlord.
Ritual ™ Entertainment: Next-Gen Effects on Direct3D ® 10 Sam Z. Glassenberg Program Manager Microsoft ® – Direct3D ® Doug Service Director of Technology.
Xbox MB system memory IBM 3-way symmetric core processor ATI GPU with embedded EDRAM 12x DVD Optional Hard disk.
Stencil Routed A-Buffer
CSCE 552 Spring D Models By Jijun Tang. Triangles Fundamental primitive of pipelines  Everything else constructed from them  (except lines and.
Emerging Technologies for Games Deferred Rendering CO3303 Week 22.
CSE 381 – Advanced Game Programming GLSL. Rendering Revisited.
OpenGL-ES 3.0 And Beyond Boston Photo credit :Johnson Cameraface OpenGL Basics.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Emerging Technologies for Games Capability Testing and DirectX10 Features CO3301 Week 6.
Ray Tracing using Programmable Graphics Hardware
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
Mesh Skinning Sébastien Dominé. Agenda Introduction to Mesh Skinning 2 matrix skinning 4 matrix skinning with lighting Complex skinning for character.
Mapping Computational Concepts to GPUs Mark Harris NVIDIA.
Shadows David Luebke University of Virginia. Shadows An important visual cue, traditionally hard to do in real-time rendering Outline: –Notation –Planar.
Advanced D3D10 Shader Authoring Presentation/Presenter Title Slide.
Graphics, Modeling, and Textures Computer Game Design and Development.
- Introduction - Graphics Pipeline
Programmable Shaders Dr. Scott Schaefer.
Week 2 - Friday CS361.
A Crash Course on Programmable Graphics Hardware
Graphics Processing Unit
Chapter 6 GPU, Shaders, and Shading Languages
The Graphics Rendering Pipeline
Graphics, Modeling, and Textures
UMBC Graphics for Games
Graphics Processing Unit
UMBC Graphics for Games
Lecture 13 Clipping & Scan Conversion
UMBC Graphics for Games
Computer Graphics Introduction to Shaders
Frame Buffer Applications
Presentation transcript:

Advanced D3D10 Rendering Emil Persson May 24, 2007

Overview Introduction to D3D10 Rendering techniques in D3D10 Optimizations May 24, 2007 Advanced D3D10 Rendering

Introduction Best D3D revision yet!  Clean and powerful API Lots of new features SM 4.0 New geometry shader Stream Out Texture arrays Render to volume texture MSAA individual sample access Constant buffers Sampler state decoupled from texture unit Dual-source blending Etc… May 24, 2007 Advanced D3D10 Rendering

Clean API Vista only Everything is mandatory (almost) No legacy hardware support Clean starting point for future evolution of the API Limited market short-term Some old features deprecated Fixed function Assembly shaders Alpha test Triangle fans Point sprites Clip planes May 24, 2007 Advanced D3D10 Rendering

Dealing with deprecated features Fixed function Write a few über-shaders Assembly shaders Convert to HLSL Alpha test Use discard or clip() in pixel shader Use alpha-to-coverage Triangle fans Seldom used anyway, usually just for a quad Convert to triangle list or strip Point sprites Expand point to 2 triangles in GS Clip planes Use clip distance and/or cull distance May 24, 2007 Advanced D3D10 Rendering

SM 4.0 Geometry shader Processes a full primitive (point, line, triangle) Has access to adjacency information (optional) Useful for silhouette detection, shadow volume extrusion etc. May output multiple primitives Output limitation is 1024 floats May output nothing (to kill primitive) May 24, 2007 Advanced D3D10 Rendering

SM 4.0 Infinite instruction count Integer and bitwise instruction Very long shaders may have lower throughput though Integer and bitwise instruction Indexable temporaries Allows for local arrays May be used to emulate a stack Useful system generated values SV_VertexID SV_PrimitiveID SV_InstanceID SV_Position (Like VPOS, but now .zw are defined too) SV_IsFrontFace (Like VFACE) SV_RenderTargetArrayIndex SV_ViewportArrayIndex SV_ClipDistance SV_CullDistance May 24, 2007 Advanced D3D10 Rendering

SM 4.0 Integer & bitwise instructions Signed and unsigned No idiv though, just udiv Same registers as floats Can alias without conversion with asint(), asuint(), asfloat() etc. Integer texture sample values Syntax: Texture2D <uint4> myTex; Access to individual samples in MSAA surface Allows for custom AA resolve Syntax: Texture2DMS <float4, 4> myTex; May 24, 2007 Advanced D3D10 Rendering

Pixel center Half pixel offset is gone!  DX10 DX9 Affects SV_Position as well Now matches OpenGL DX10 DX9 May 24, 2007 Advanced D3D10 Rendering

Pixel center Pixels and texels align Texel center Screenspace TexCoord = SV_Position.xy / float2(width, height) Texel center Screenspace May 24, 2007 Advanced D3D10 Rendering

The small batch problem D3D10 designed to minimize batch overhead Pulls work from draw time to creation time Validation Shader input/output configuration Immutable State Objects Input layout Rasterizer state Sampler state Depth stencil state Blend state May 24, 2007 Advanced D3D10 Rendering

The small batch problem D3D10 also provides tools to reduce draw calls Improved instancing interface Geometry shader More shader resources Constant indexing in PS Render target arrays Texture arrays May 24, 2007 Advanced D3D10 Rendering

Rendering techniques in D3D10 May 24, 2007 Advanced D3D10 Rendering

Global Illumination May 24, 2007 Advanced D3D10 Rendering

Global Illumination Probes on a volume grid across the scene Each probe captures light environment into a tiny “cubemap” Probes are converted to Spherical Harmonics coefficients Indirect lighting is computed using interpolated SH coefficients Do the same in probe passes to get multiple light bounces May 24, 2007 Advanced D3D10 Rendering

Global Illumination Awful lot of work Solution Each probe is 6 slices. We need loads of probes. Sample scene has over 300 probes Solution Use geometry shader to reduce work Distribute work across multiple frames Sample updates 40 cubes per frame Scatter updates to hide artifacts Skip over “empty” space probes May 24, 2007 Advanced D3D10 Rendering

Global Illumination The Geometry Shader advantage 40 cubes x 6 faces x n draw calls = Pain DX9 style unrealistic even for simple scenes Update multiple slices per pass with GS GS output limit is 1024 floats Keep number of interpolators down to maximize primitive count Managed to update 5 probes (30 slices) per pass 8 passes is more manageable than 240 ... May 24, 2007 Advanced D3D10 Rendering

Post tone-mapping resolve D3D10 allows for custom AA resolves Can drastically improve HDR AA quality Standard resolve occurs before tone-mapping Ideally resolve should be done after tone-mapping Standard resolve Custom resolve May 24, 2007 Advanced D3D10 Rendering

Post-tonemapping resolve Texture2DMS<float4, SAMPLES> tHDR; float4 main(float4 pos: SV_Position) : SV_Target { int3 coord; coord.xy = (int2) pos.xy; coord.z = 0; // Tone-map individual samples and sum it up float4 sum = 0; [unroll] for (int i = 0; i < SAMPLES; i++) float4 c = tHDR.Load(coord, i); sum.rgb += 1.0 – exp2(-exposure * c.rgb); } // Average sum *= (1.0 / SAMPLES); // sRGB sum.rgb = pow(sum.rgb, 1.0 / 2.2); return sum; May 24, 2007 Advanced D3D10 Rendering

Optimizations May 24, 2007 Advanced D3D10 Rendering

Geometry shader GS optimizations Input/output usually the bottleneck Reduce outputs with frustum and/or backface culling Keep input small by packing data TexCoord could be 2x16 bits in an uint Or use for instance asuint(normal.w) Merge to full float4 vectors Don’t do 2x float2 Keep output small Could be faster to trade for some work in PS Pass just position, don’t interpolate both lightVec and viewVec Or even back-project SV_Position.xyz to world space in PS Small output means more work fits within 1024 floats limit May 24, 2007 Advanced D3D10 Rendering

GS frustum and backface culling // Transform to clip space float4 pos[3]; pos[0] = mul(mvp, In[0].pos); pos[1] = mul(mvp, In[1].pos); pos[2] = mul(mvp, In[2].pos); // Use frustum culling to improve performance float4 t0 = saturate(pos[0].xyxy * float4(-1, -1, 1, 1) - pos[0].w); float4 t1 = saturate(pos[1].xyxy * float4(-1, -1, 1, 1) - pos[1].w); float4 t2 = saturate(pos[2].xyxy * float4(-1, -1, 1, 1) - pos[2].w); float4 t = t0 * t1 * t2; [branch] if (!any(t)) { // Use backface culling to improve performance float2 d0 = pos[1].xy * pos[0].w - pos[0].xy * pos[1].w; float2 d1 = pos[2].xy * pos[0].w - pos[0].xy * pos[2].w; if (d1.x * d0.y > d0.x * d1.y || min(min(pos[0].w, pos[1].w), pos[2].w) < 0.0) // Output primitive here ... } May 24, 2007 Advanced D3D10 Rendering

Miscellaneous optimizations Pre-baked constant buffers Don’t update per-material constants in DX9 style PS don’t need to return float4 anymore Use float3 if you only care about RGB May reduce instruction count Use GS to reduce draw calls Single pass render-to-cubemap Update multiple render targets per pass May 24, 2007 Advanced D3D10 Rendering

The new shader compiler SM4 shader compiler preserves semantics better This means more responsibility for you guys Be careful about your assumptions Periodically check the resulting assembly D3D10DisassembleShader() Use GPUShaderAnalyzer for performance critical shaders May 24, 2007 Advanced D3D10 Rendering

The new shader compiler Example: HLSL code: float4 main(float4 t: TEXCOORD0) : SV_Target { if (t.x > t.y) return t.xyzw; else return t.wzyx; } DX9 assembly: add r0.x, -v0.x, v0.y cmp oC0, r0.x, v0.wzyx, v0 DX10 assembly: lt r0.x, v0.y, v0.x if_nz r0.x // <--- Did you really want a branch here? mov o0.xyzw, v0.xyzw ret else mov o0.xyzw, v0.wzyx endif May 24, 2007 Advanced D3D10 Rendering

The new shader compiler Use [branch], [flatten], [unroll] & [loop] to control output code This is not for everyone Poor use could reduce performance Make sure you know what you’re doing Only use if you’re familiar with assembly code Verify that you get the code you expect Always benchmark both options New DX10 assembly (using [flatten]): lt r0.x, v0.y, v0.x movc o0.xyzw, r0.xxxx, v0.xyzw, v0.wzyx ret May 24, 2007 Advanced D3D10 Rendering

Questions? emil.persson@amd.com May 24, 2007 Advanced D3D10 Rendering