OIT and Indirect Illumination using DX11 Linked Lists

Slides:

Advertisements

Similar presentations

Destruction Masking in Frostbite 2 using Volume Distance Fields

Advertisements

Introduction to Direct3D 10 Course Porting Game Engines to Direct3D 10: Crysis / CryEngine2 Carsten Wenzel.

Advanced Virtual Texture Topics

Accelerating Real-Time Shading with Reverse Reprojection Caching Diego Nehab 1 Pedro V. Sander 2 Jason Lawrence 3 Natalya Tatarchuk 4 John R. Isidoro 4.

DirectCompute Performance on DX11 Hardware

Advanced Visual Effects with Direct3D

Grass, Fur and all things hairy Nicolas ThibierozKarl Hillesland Gaming Engineering Manager, AMDSenior Research Engineer, AMD.

Using 2D sprite with OpenGL 2003 team Koguyue. Overview Motivation and basic concepts Advantages with using OpenGL Basic requirements of implementation.

Technology Behind AMD’s “Leo Demo” Jay McKee MTS Engineer, AMD

Exploration of advanced lighting and shading techniques

DirectX11 Performance Reloaded

Jason Yang and Jay McKee

Deferred Shading Optimizations

Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD.

CS123 | INTRODUCTION TO COMPUTER GRAPHICS Andries van Dam © 1/16 Deferred Lighting Deferred Lighting – 11/18/2014.

Technische Universität München Computer Graphics SS 2014 Graphics Effects Rüdiger Westermann Lehrstuhl für Computer Graphik und Visualisierung.

Practical Clustered Shading

Ray tracing. New Concepts The recursive ray tracing algorithm Generating eye rays Non Real-time rendering.

Graphics Pipeline.

Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.

OPENGL Return of the Survival Guide. Buffers (0,0) OpenGL holds the buffers in a coordinate system such that the origin is the lower left corner.

CS 4363/6353 BASIC RENDERING. THE GRAPHICS PIPELINE OVERVIEW Vertex Processing Coordinate transformations Compute color for each vertex Clipping and Primitive.

Filtering Approaches for Real-Time Anti-Aliasing

Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.

Direct3D 11 Performance Tips & Tricks Holger GruenAMD ISV Relations Cem CebenoyanNVIDIA ISV Relations.

The Art and Technology Behind Bioshock’s Special Effects

Real-time lighting via Light Linked List 8/07/2014 Abdul Bezrati.

CHAPTER 12 Height Maps, Hidden Surface Removal, Clipping and Level of Detail Algorithms © 2008 Cengage Learning EMEA.

CHAPTER 10 Alpha Blending and Fog © 2008 Cengage Learning EMEA.

CS 325 Introduction to Computer Graphics 04 / 09 / 2010 Instructor: Michael Eckmann.

Status – Week 274 Victor Moya. Simulator model Boxes. Boxes. Perform the actual work. Perform the actual work. A box can only access its own data, external.

Skin Rendering GPU Graphics Gary J. Katz University of Pennsylvania CIS 665 Adapted from David Gosselin’s Power Point and article, Real-time skin rendering,

3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.

Sorting and Searching Timothy J. PurcellStanford / NVIDIA Updated Gary J. Katz based on GPUTeraSort (MSR TR )U. of Pennsylvania.

IN4151 Introduction 3D graphics 1 Introduction to 3D computer graphics part 2 Viewing pipeline Multi-processor implementation GPU architecture GPU algorithms.

The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.

Paper by Alexander Keller

Introduction to 3D Graphics John E. Laird. Basic Issues u Given a internal model of a 3D world, with textures and light sources how do you project it.

Shader Model 5.0 and Compute Shader

Shadows Computer Graphics. Shadows Shadows Extended light sources produce penumbras In real-time, we only use point light sources –Extended light sources.

Technology and Historical Overview. Introduction to 3d Computer Graphics  3D computer graphics is the science, study, and method of projecting a mathematical.

Advanced Computer Graphics Depth & Stencil Buffers / Rendering to Textures CO2409 Computer Graphics Week 19.

Addison Wesley is an imprint of © 2010 Pearson Addison-Wesley. All rights reserved. Chapter 7 The Game Loop and Animation Starting Out with Games & Graphics.

3D Graphics for Game Programming Chapter IV Fragment Processing and Output Merging.

Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.

DirectX Objects Finalised Paul Taylor Packing Your Objects

Shadow Mapping Chun-Fa Chang National Taiwan Normal University.

Sample Based Visibility for Soft Shadows using Alias-free Shadow Maps Erik Sintorn – Ulf Assarsson – uffe.

09/16/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Environment mapping Light mapping Project Goals for Stage 1.

Texture Mapping CAP4730: Computational Structures in Computer Graphics.

Stencil Routed A-Buffer

- Laboratoire d'InfoRmatique en Image et Systèmes d'information

Emerging Technologies for Games Deferred Rendering CO3303 Week 22.

What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.

Real-Time Dynamic Shadow Algorithms Evan Closson CSE 528.

Addressing Modes. Register Addressing Immediate Addressing Base Addressing Indexed Addressing PC-Relative Addressing.

Compositing and Rendering

Scene Manager Creates and places movable objects like lights and cameras so as to access them efficiently, e.g. for rendering. Loads and assembles world.

Reflective Shadow Mapping By: Mitchell Allen.

Patrick Cozzi University of Pennsylvania CIS Fall 2013

Deferred Lighting.

3D Graphics Rendering PPT By Ricardo Veguilla.

The Graphics Rendering Pipeline

(c) 2002 University of Wisconsin

Static Image Filtering on Commodity Graphics Processors

UMBC Graphics for Games

RADEON™ 9700 Architecture and 3D Performance

Frame Buffer Applications

Frame Buffers Fall 2018 CS480/680.

Presentation transcript:

OIT and Indirect Illumination using DX11 Linked Lists Holger Gruen AMD ISV Relations Nicolas Thibieroz AMD ISV Relations

Agenda Introduction Linked List Rendering Order Independent Transparency Indirect Illumination Q&A

Introduction Direct3D 11 HW opens the door to many new rendering algorithms In particular per pixel linked lists allow for a number of new techniques OIT, Indirect Shadows, Ray Tracing of dynamic scenes, REYES surface dicing, custom AA, Irregular Z-buffering, custom blending, Advanced Depth of Field, etc. This talk will walk you through: A DX11 implementation of per-pixel linked list and two effects that utilize this techique OIT Indirect Illumination

Per-Pixel Linked Lists with Direct3D 11 Element Element Element Element Link Link Link Link Nicolas Thibieroz European ISV Relations AMD

Why Linked Lists? Data structure useful for programming Very hard to implement efficiently with previous real-time graphics APIs DX11 allows efficient creation and parsing of linked lists Per-pixel linked lists A collection of linked lists enumerating all pixels belonging to the same screen position Element Element Element Element Link Link Link Link

Two-step process 1) Linked List Creation 2) Rendering from Linked List Store incoming fragments into linked lists 2) Rendering from Linked List Linked List traversal and processing of stored fragments

Creating Per-Pixel Linked Lists

PS5.0 and UAVs Uses a Pixel Shader 5.0 to store fragments into linked lists Not a Compute Shader 5.0! Uses atomic operations Two UAV buffers required - “Fragment & Link” buffer - “Start Offset” buffer UAV = Unordered Access View

Fragment & Link Buffer The “Fragment & Link” buffer contains data and link for all fragments to store Must be large enough to store all fragments Created with Counter support D3D11_BUFFER_UAV_FLAG_COUNTER flag in UAV view Declaration: struct FragmentAndLinkBuffer_STRUCT { FragmentData_STRUCT FragmentData; // Fragment data uint uNext; // Link to next fragment }; RWStructuredBuffer <FragmentAndLinkBuffer_STRUCT> FLBuffer;

Start Offset Buffer The “Start Offset” buffer contains the offset of the last fragment written at every pixel location Screen-sized: (width * height * sizeof(UINT32) ) Initialized to magic value (e.g. -1) Magic value indicates no more fragments are stored (i.e. end of the list) Declaration: RWByteAddressBuffer StartOffsetBuffer;

Linked List Creation (1) No color Render Target bound! No rendering yet, just storing in L.L. Depth buffer bound if needed OIT will need it in a few slides UAVs bounds as input/output: StartOffsetBuffer (R/W) FragmentAndLinkBuffer (W)

Linked List Creation (2a) Start Offset Buffer -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Viewport -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Fragment and Link Buffer Counter = Fragment and Link Buffer Fragment and Link Buffer Fragment and Link Buffer

Linked List Creation (2b) Start Offset Buffer -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Viewport -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 Fragment and Link Buffer Counter = Fragment and Link Buffer -1 -1 Fragment and Link Buffer Fragment and Link Buffer

Linked List Creation (2c) Start Offset Buffer -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Viewport -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 2 3 1 Fragment and Link Buffer Counter = Fragment and Link Buffer -1 -1 -1 Fragment and Link Buffer Fragment and Link Buffer

Linked List Creation (2d) Start Offset Buffer -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Viewport -1 -1 -1 -1 -1 -1 -1 -1 -1 1 2 -1 -1 -1 -1 -1 -1 -1 4 3 5 Fragment and Link Buffer Counter = -1 -1 -1 -1 Fragment and Link Buffer Fragment and Link Buffer

Linked List Creation - Code float PS_StoreFragments(PS_INPUT input) : SV_Target { // Calculate fragment data (color, depth, etc.) FragmentData_STRUCT FragmentData = ComputeFragment(); // Retrieve current pixel count and increase counter uint uPixelCount = FLBuffer.IncrementCounter(); // Exchange offsets in StartOffsetBuffer uint vPos = uint(input.vPos); uint uStartOffsetAddress= 4 * ( (SCREEN_WIDTH*vPos.y) + vPos.x ); uint uOldStartOffset; StartOffsetBuffer.InterlockedExchange( uStartOffsetAddress, uPixelCount, uOldStartOffset); // Add new fragment entry in Fragment & Link Buffer FragmentAndLinkBuffer_STRUCT Element; Element.FragmentData = FragmentData; Element.uNext = uOldStartOffset; FLBuffer[uPixelCount] = Element; }

Traversing Per-Pixel Linked Lists

Rendering Pixels (1) “Start Offset” Buffer and “Fragment & Link” Buffer now bound as SRV Buffer<uint> StartOffsetBufferSRV; StructuredBuffer<FragmentAndLinkBuffer_STRUCT> FLBufferSRV; Render a fullscreen quad For each pixel, parse the linked list and retrieve fragments for this screen position Process list of fragments as required Depends on algorithm e.g. sorting, finding maximum, etc. SRV = Shader Resource View

Rendering from Linked List Start Offset Buffer -1 -1 -1 -1 -1 -1 -1 3 3 4 -1 -1 -1 -1 -1 -1 -1 -1 -1 Render Target -1 -1 -1 -1 -1 -1 -1 -1 -1 1 2 -1 -1 -1 -1 -1 -1 -1 Fragment and Link Buffer -1 -1 -1 -1 -1 Fragment and Link Buffer Fragment and Link Buffer

Rendering Pixels (2) float4 PS_RenderFragments(PS_INPUT input) : SV_Target { // Calculate UINT-aligned start offset buffer address uint vPos = uint(input.vPos); uint uStartOffsetAddress = SCREEN_WIDTH*vPos.y + vPos.x; // Fetch offset of first fragment for current pixel uint uOffset = StartOffsetBufferSRV.Load(uStartOffsetAddress); // Parse linked list for all fragments at this position float4 FinalColor=float4(0,0,0,0); while (uOffset!=0xFFFFFFFF) // 0xFFFFFFFF is magic value // Retrieve pixel at current offset Element=FLBufferSRV[uOffset]; // Process pixel as required ProcessPixel(Element, FinalColor); // Retrieve next offset uOffset = Element.uNext; } return (FinalColor);

Order-Independent Transparency via Per-Pixel Linked Lists Nicolas Thibieroz European ISV Relations AMD

Description Straight application of the linked list algorithm Stores transparent fragments into PPLL Rendering phase sorts pixels in a back-to-front order and blends them manually in a pixel shader Blend mode can be unique per-pixel! Special case for MSAA support

Linked List Structure Optimize performance by reducing amount of data to write to/read from UAV E.g. uint instead of float4 for color Example data structure for OIT: struct FragmentAndLinkBuffer_STRUCT { uint uPixelColor; // Packed pixel color uint uDepth; // Pixel depth uint uNext; // Address of next link }; May also get away with packed color and depth into the same uint! (if same alpha) 16 bits color (565) + 16 bits depth Performance/memory/quality trade-off

Visible Fragments Only! Use [earlydepthstencil] in front of Linked List creation pixel shader This ensures only transparent fragments that pass the depth test are stored i.e. Visible fragments! Allows performance savings and rendering correctness! [earlydepthstencil] float PS_StoreFragments(PS_INPUT input) : SV_Target { ... }

Sorting Pixels Sorting in place requires R/W access to Linked List Sparse memory accesses = slow! Better way is to copy all pixels into array of temp registers Then do the sorting Temp array declaration means a hard limit on number of pixel per screen coordinates Required trade-off for performance

Sorting and Blending Blend fragments back to front in PS 0.93 34 0.95 0.95 12 0.93 0.87 0.87 0.98 -1 0.98 Background color Temp Array Render Target PS color Blend fragments back to front in PS Blending algorithm up to app Example: SRCALPHA-INVSRCALPHA Or unique per pixel! (stored in fragment data) Background passed as input texture Actual HW blending mode disabled

Storing Pixels for Sorting (...) static uint2 SortedPixels[MAX_SORTED_PIXELS]; // Parse linked list for all pixels at this position // and store them into temp array for later sorting int nNumPixels=0; while (uOffset!=0xFFFFFFFF) { // Retrieve pixel at current offset Element=FLBufferSRV[uOffset]; // Copy pixel data into temp array SortedPixels[nNumPixels++]= uint2(Element.uPixelColor, Element.uDepth); // Retrieve next offset [flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ? 0xFFFFFFFF : Element.uNext; } // Sort pixels in-place SortPixelsInPlace(SortedPixels, nNumPixels);

Pixel Blending in PS (...) // Retrieve current color from background texture float4 vCurrentColor=BackgroundTexture.Load(int3(vPos.xy, 0)); // Rendering pixels using SRCALPHA-INVSRCALPHA blending for (int k=0; k<nNumPixels; k++) { // Retrieve next unblended furthermost pixel float4 vPixColor= UnpackFromUint(SortedPixels[k].x); // Manual blending between current fragment and previous one vCurrentColor.xyz= lerp(vCurrentColor.xyz, vPixColor.xyz, vPixColor.w); } // Return manually-blended color return vCurrentColor;

OIT via Per-Pixel Linked Lists with MSAA Support

Sample Coverage Storing individual samples into Linked Lists requires a huge amount of memory ... and performance will suffer! Solution is to store transparent pixels into PPLL as before But including sample coverage too! Requires as many bits as MSAA mode Declare SV_COVERAGE in PS structure struct PS_INPUT { float3 vNormal : NORMAL; float2 vTex : TEXCOORD; float4 vPos : SV_POSITION; uint uCoverage : SV_COVERAGE; }

Linked List Structure Almost unchanged from previously Depth is now packed into 24 bits 8 Bits are used to store coverage struct FragmentAndLinkBuffer_STRUCT { uint uPixelColor; // Packed pixel color uint uDepthAndCoverage; // Depth + coverage uint uNext; // Address of next link };

Sample Coverage Example Pixel Center Sample Third sample is covered uCoverage = 0x04 (0100 in binary) Element.uDepthAndCoverage = ( In.vPos.z*(2^24-1) << 8 ) | In.uCoverage;

Rendering Samples (1) Rendering phase needs to be able to write individual samples Thus PS is run at sample frequency Can be done by declaring SV_SAMPLEINDEX in input structure Parse linked list and store pixels into temp array for later sorting Similar to non-MSAA case Difference is to only store sample if coverage matches sample index being rasterized

Rendering Samples (2) static uint2 SortedPixels[MAX_SORTED_PIXELS]; // Parse linked list for all pixels at this position // and store them into temp array for later sorting int nNumPixels=0; while (uOffset!=0xFFFFFFFF) { // Retrieve pixel at current offset Element=FLBufferSRV[uOffset]; // Retrieve pixel coverage from linked list element uint uCoverage=UnpackCoverage(Element.uDepthAndCoverage); if ( uCoverage & (1<<In.uSampleIndex) ) // Coverage matches current sample so copy pixel SortedPixels[nNumPixels++]=Element; } // Retrieve next offset [flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ? 0xFFFFFFFF : Element.uNext;

DEMO OIT Linked List Demo

Holger Gruen European ISV Relations AMD Direct3D 11 Indirect Illumination Holger Gruen European ISV Relations AMD

Indirect Illumination Introduction 1 Real-time Indirect illumination is an active research topic Numerous approaches exist Reflective Shadow Maps (RSM) [Dachsbacher/Stammiger05] Splatting Indirect Illumination [Dachsbacher/Stammiger2006] Multi-Res Splatting of Illumination [Wyman2009] Light propagation volumes [Kapalanyan2009] Approximating Dynamic Global Illumination in Image Space [Ritschel2009] Only a few support indirect shadows Imperfect Shadow Maps [Ritschel/Grosch2008] Micro-Rendering for Scalable, Parallel Final Gathering(SSDO) [Ritschel2010] Cascaded light propagation volumes for real-time indirect illumination [Kapalanyan/Dachsbacher2010] Most approaches somehow extend to multi-bounce lighting

Indirect Illumination Introduction 2 This section will cover An efficient and simple DX9-compliant RSM based implementation for smooth one bounce indirect illumination Indirect shadows are ignored here A Direct3D 11 technique that traces rays to compute indirect shadows Part of this technique could generally be used for ray-tracing dynamic scenes

Indirect Illumination w/o Indirect Shadows Draw scene g-buffer Draw Reflective Shadowmap (RSM) RSM shows the part of the scene that recieves direct light from the light source Draw Indirect Light buffer at ¼ res RSM texels are used as light sources on g-buffer pixels for indirect lighting Upsample Indirect Light (IL) Draw final image adding IL

Step 1 G-Buffer needs to allow reconstruction of World/Camera space position World/Camera space normal Color/ Albedo DXGI_FORMAT_R32G32B32A32_FLOAT positions may be required for precise ray queries for indirect shadows

Step 2 RSM needs to allow reconstruction of World space position World space normal Color/ Albedo Only draw emitters of indirect light DXGI_FORMAT_R32G32B32A32_FLOAT position may be required for ray precise queries for indirect shadows

Step 3 Render a ¼ res IL as a deferred op Transform g-buffer pix to RSM space ->Light Space->project to RSM texel space Use a kernel of RSM texels as light sources RSM texels also called Virtual Point Light(VPL) Kernel size depends on Desired speed Desired look of the effect RSM resolution

Computing IL at a G-buf Pixel 1 Sum up contribution of all VPLs in the kernel

Computing IL at a G-buf Pixel 2 RSM texel/VPL g-buffer pixel This term is very similar to terms used in radiosity form factor computations

Computing IL at a G-buf Pixel 3 A naive solution for smooth IL needs to consider four VPL kernels with centers at t0, t1, t2 and t3. stx : sub RSM texel x position [0.0, 1.0[ sty : sub RSM texel y position [0.0, 1.0[

Computing IL at a g-buf pixel 4 IndirectLight = (1.0f-sty) * ((1.0f-stx) * + stx * ) + (0.0f+sty) * ((1.0f-stx) * + stx * ) Evaluation of 4 big VPL kernels is slow  stx : sub texel x position [0.0, 1.0[ VPL kernel at t0 VPL kernel at t2 sty : sub texel y position [0.0, 1.0[ VPL kernel at t1 VPL kernel at t3

Computing IL at a g-buf pixel 5 SmoothIndirectLight = (1.0f-sty)*(((1.0f-stx)*(B0+B3)+stx*(B2+B5))+B1)+ (0.0f+sty)*(((1.0f-stx)*(B6+B3)+stx*(B8+B5))+B7)+B4 stx : sub RSM texel x position of g-buf pix [0.0, 1.0[ sty : sub RSM texel y position of g-buf pix [0.0, 1.0[ This trick is probably known to some of you already. See backup for a detailed explanation !

Indirect Light Buffer

Step 4 Indirect Light buffer is ¼ res Perform a bilateral upsampling step See Peter-Pike Sloan, Naga K. Govindaraju, Derek Nowrouzezahrai, John Snyder. "Image-Based Proxy Accumulation for Real-Time Soft Global Illumination". Pacific Graphics 2007 Result is a full resolution IL

Step 5 Combine Direct Illumination Indirect Illumination Shadows (not mentioned)

DEMO Combined Image ~280 FPS on a HD5970 @ 1280x1024 for a 15x15 VPL kernel Deffered IL pass + bilateral upsampling costs ~2.5 ms

How to add Indirect Shadows Use a CS and the linked lists technique Insert blockers of IL into 3D grid of lists – let‘s use triangles for now see backup for alternative data structure Look at a kernel of VPLs again Only accumulate light of VPLs that are occluded by blocker tris Trace rays through 3d grid to detect occluded VPLs Render low res buffer only Subtract blocked indirect light from IL buffer Subtract a blurred version of low res version Blur is combined bilateral blurring/upsampling

Insert tris into 3D grid of triangle lists Rasterize dynamic blockers to 3D grid using a CS and atomics Scene

Insert tris into 3D grid of triangle lists (0,1,0) Rasterize dynamic blockers to 3D grid using a CS and atomics World space 3D grid of triangle lists around IL blockers laid out in a UAV Scene eol = End of list (0xffffffff)

3D Grid Demo

Indirect Light Buffer Blocker of green light Emitter of green Expected indirect shadow

Blocked Indirect Light

Indirect Light Buffer

Subtracting Blocked IL

DEMO Final Image ~70 FPS on a HD5970 @ 1280x1024 ~300 million rays per second for Indirect Shadows Ray casting costs ~9 ms

Future directions Speed up IL rendering Speed up ray-tracing Render IL at even lower res Look into multi-res RSMs Speed up ray-tracing Per pixel array of lists for depth buckets Other data structures Raytrace other primitive types Splats, fuzzy ellipsoids etc. Proxy geometry or bounding volumes of blockers Get rid of Interlocked*() ops Just mark grid cells as occupied Lower quality but could work on earlier hardware Need to solve self-occlusion issues

Q&A Holger Gruen holger.gruen@AMD.com Nicolas Thibieroz nicolas.thibieroz@AMD.com Credits for the basic idea of how to implement PPLL under Direct3D 11 go to Jakub Klarowicz (Techland), Holger Gruen and Nicolas Thibieroz (AMD)