Stencil Routed A-Buffer

Slides:



Advertisements
Similar presentations
Exploration of advanced lighting and shading techniques
Advertisements

Deferred Shading Optimizations
CS123 | INTRODUCTION TO COMPUTER GRAPHICS Andries van Dam © 1/16 Deferred Lighting Deferred Lighting – 11/18/2014.
Photon Mapping on Programmable Graphics Hardware Timothy J. Purcell Mike Cammarano Pat Hanrahan Stanford University Craig Donner Henrik Wann Jensen University.
CS 352: Computer Graphics Chapter 7: The Rendering Pipeline.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
CS 4363/6353 BASIC RENDERING. THE GRAPHICS PIPELINE OVERVIEW Vertex Processing Coordinate transformations Compute color for each vertex Clipping and Primitive.
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
CSL 859: Advanced Computer Graphics Dept of Computer Sc. & Engg. IIT Delhi.
Week 7 - Monday.  What did we talk about last time?  Specular shading  Aliasing and antialiasing.
Real-time lighting via Light Linked List 8/07/2014 Abdul Bezrati.
Depth - fighting aware Methods for Multifragment Rendering Andreas A. Vasilakis and Ioannis Fudos Department of Computer Science, University of Ioannina,
Direct3D New Rendering Features
Week 10 - Monday.  What did we talk about last time?  Global illumination  Shadows  Projection shadows  Soft shadows.
03/12/02 (c) 2002 University of Wisconsin, CS559 Last Time Some Visibility (Hidden Surface Removal) algorithms –Painter’s Draw in some order Things drawn.
Computer Graphics1 The A-buffer an Antialiased Hidden Surface Method.
Fast GPU Histogram Analysis for Scene Post- Processing Andy Luedke Halo Development Team Microsoft Game Studios.
I3D Fast Non-Linear Projections using Graphics Hardware Jean-Dominique Gascuel, Nicolas Holzschuch, Gabriel Fournier, Bernard Péroche I3D 2008.
Eurographics 2012, Cagliari, Italy S-buffer: Sparsity-aware Multi-fragment Rendering Andreas A. Vasilakis and Ioannis Fudos Department of Computer Science,
Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis.
Chapter 6: Vertices to Fragments Part 2 E. Angel and D. Shreiner: Interactive Computer Graphics 6E © Addison-Wesley Mohan Sridharan Based on Slides.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.
Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.
Hardware-Assisted Visibility Sorting for Tetrahedral Volume Rendering Steven Callahan Milan Ikits João Comba Cláudio Silva Steven Callahan Milan Ikits.
Mapping Computational Concepts to GPU’s Jesper Mosegaard Based primarily on SIGGRAPH 2004 GPGPU COURSE and Visualization 2004 Course.
Computer Graphics Shadows
Shadows Computer Graphics. Shadows Shadows Extended light sources produce penumbras In real-time, we only use point light sources –Extended light sources.
Computer Graphics Mirror and Shadows
Filtering Approaches for Real-Time Anti-Aliasing /
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
© Copyright Khronos Group, Page 1 Harnessing the Horsepower of OpenGL ES Hardware Acceleration Rob Simpson, Bitboys Oy.
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
Mapping Computational Concepts to GPUs Mark Harris NVIDIA Developer Technology.
NVIDIA PROPRIETARY AND CONFIDENTIAL Occlusion (HP and NV Extensions) Ashu Rege.
UW EXTENSION CERTIFICATE PROGRAM IN GAME DEVELOPMENT 2 ND QUARTER: ADVANCED GRAPHICS Visual quality techniques.
Cg Programming Mapping Computational Concepts to GPUs.
Advanced Computer Graphics Depth & Stencil Buffers / Rendering to Textures CO2409 Computer Graphics Week 19.
CS 450: COMPUTER GRAPHICS ANTIALIASING SPRING 2015 DR. MICHAEL J. REALE.
Week 6 - Wednesday.  What did we talk about last time?  Light  Material  Sensors.
Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.
GRAPHICS PIPELINE & SHADERS SET09115 Intro to Graphics Programming.
GPU Computation Strategies & Tricks Ian Buck NVIDIA.
Rendering Fake Soft Shadows with Smoothies Eric Chan Massachusetts Institute of Technology.
Sample Based Visibility for Soft Shadows using Alias-free Shadow Maps Erik Sintorn – Ulf Assarsson – uffe.
Xbox MB system memory IBM 3-way symmetric core processor ATI GPU with embedded EDRAM 12x DVD Optional Hard disk.
09/16/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Environment mapping Light mapping Project Goals for Stage 1.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Based on paper by: Rahul Khardekar, Sara McMains Mechanical Engineering University of California, Berkeley ASME 2006 International Design Engineering Technical.
Emerging Technologies for Games Deferred Rendering CO3303 Week 22.
Mobile Graphics Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Single Pass Point Rendering and Transparent Shading Paper by Yanci Zhang and Renato Pajarola Presentation by Harmen de Weerd and Hedde Bosman.
Ray Tracing using Programmable Graphics Hardware
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
Advanced Programmable Shading: Beyond Per-vertex and Per-pixel Shading.
Real-Time Dynamic Shadow Algorithms Evan Closson CSE 528.
Shadows David Luebke University of Virginia. Shadows An important visual cue, traditionally hard to do in real-time rendering Outline: –Notation –Planar.
The Graphics Pipeline Revisited Real Time Rendering Instructor: David Luebke.
Image Fusion In Real-time, on a PC. Goals Interactive display of volume data in 3D –Allow more than one data set –Allow fusion of different modalities.
SHADOW CASTER CULLING FOR EFFICIENT SHADOW MAPPING JIŘÍ BITTNER 1 OLIVER MATTAUSCH 2 ARI SILVENNOINEN 3 MICHAEL WIMMER 2 1 CZECH TECHNICAL UNIVERSITY IN.
Patrick Cozzi University of Pennsylvania CIS Fall 2013
Deferred Lighting.
Chapter 6 GPU, Shaders, and Shading Languages
UMBC Graphics for Games
Texture and Shadow Mapping
RADEON™ 9700 Architecture and 3D Performance
Frame Buffer Applications
Presentation transcript:

Stencil Routed A-Buffer Kevin Myers and Louis Bavoil NVIDIA

Our Cool Thing

What is it? A-Buffer Related Work Simply a list of fragments per-pixel “The A-buffer, an antialiased hidden surface method” [Carpenter 84] Related Work Depth Peeling [Mammen 89] [Everitt 01] k-Buffer [Bavoil et al. 07]

Why do I need this? Often want more than nearest Alpha blending Volume rendering Collision detection Refraction and caustics Global illumination

Why is it hard? GPU’s optimized to capture nearest layer Z buffering and early z test Fine for most real-time lighting models Wasteful if not rendering front to back

Things that don’t work Blending can’t just turn of z-buffering MRT Most operations non-commutative MRT Can’t direct output Reading what you’re writing Hazardous “Multi-Layer Depth Peeling via Fragment Sort” [Liu et al. 06] k-Buffer [Bavoil et al. 07]

A-Buffer “A list of fragments per-pixel” MSAA Anything on the GPU that resembles this? MSAA “A list of samples per-pixel” Samples store coverage

MSAA in review Multisampled Antialiasing Fragments are rasterized at a higher res 8xMSAA == 8 x aliased resolution Pixel shader is run once per-pixel Frame buffer storage is at sample resolution

Say What? MSAA samples == A-Buffer pixels?? MSAA sample patterns don’t help Need all MSAA samples at pixel center

Line up your Sub-samples Turn off multisampling Still render to an MSAA buffer Pixel shader output bloats to all sub-samples BOOL D3D10_RASTERIZER_DESC::MultisampleEnable Now writing 8 samples per pixel All have the same value!!

Bloating Your Pixel Applause? Meets the definition “List of fragments per-pixel” Not exactly what we want Each item contains same value Next fragment will clobber the entire list Need to update one entry in the list Once and only once

Stencil always increments Stencil Routing Stencil always increments Stencil passes when 4

Stencil Routing First introduced by Purcell et al 2003 Did not work for general rasterization Tile aligned points Fat point is spread across four pixels Four pixels get same value Stencil allows one pixel to update

Stencil Routing and MSAA Stencil always operates at sample res Regardless of MultisampleEnable state DX10 Spec Use sub-samples to route Allows any pixel shader output to be routed Arbitrary primitives

Stencil Routing and MSAA

A Stencil Test That Works StencilFunc D3D10_COMPARISON_EQUAL StencilRef 2 More on this later StencilPassOp and StencilFailOp D3D10_STENCIL_OP_DECR_SAT

Initializing Stencil Clear stencil buffer to pass value ( 2 ) Initializes sample 0 to 2 Use SampleMask to selectively update Stencil set to replace with refrence value

Why start at 2? When all sub-samples are written When overflow occurs Most stencil values will be 0 Except the last one written Last sample written stencil == 1 When overflow occurs All stencil values will be 0

Occlusion Query Test Pixel did not overflow Pixel overflowed

Handling Overflow Set sample mask to last sample updated Draw full screen quad Issue an occlusion query Set stencil to pass if stencil == 0 Check occlusion query Sample pass count == overflow count

Handling Overflow Occlusion query Good Bad Very fast Allows for dynamic A-Buffer sizing Bad Requires some CPU intervention Ideally A-Buffer size is fixed

Demo Demo Time!

Secrets of the Dragon Single A-Buffer Post process sort RG32F R is packed color G is depth Saves on texture loads Post process sort 8 fragment per-pixel bitonic sort Additional fragments, insertion sort

8800 GTX Performance Alpha Blended Stanford Dragon 8 Layers Depth Peeling 8xABuffer ABuffer Speedup 640x480 30.9 164 5.3 800x600 30.4 139 4.6 1024x768 29.5 110 3.7 1280x960 28.1 81.4 2.9 1600x1200 26.2 54.9 2.1 16 Layers 15.5 76.7 4.9 15.3 63.0 4.1 14.7 48.0 3.3 14.1 34.6 2.5 13.3 23.1 1.7

Limits…DOH! 254 layers of depth max Fragments at same depth MSAA 8-bit stencil ( 255 – 1 for overflow bit ) If you do this call us cause that’s crazy Fragments at same depth Must be handled in post-process MSAA

Summary Stencil Routed A-Buffer A-buffer can be dynamically resized Ideally suited for complex geometries Much faster than depth peeling A-buffer can be dynamically resized Use an occlusion query Best to pre-determine size

Future Work Render target arrays Each target has its own stencil buffer Target replaces sub-sample Or augments sub-sample #arrays * MSAA level in one “CPU pass” With dx10 saturates 254 layers Use instancing for additional “GPU passes”

Thanks for all the fish Claudio Silva, Steven Callahan, Joao Comba, Aaron Lefohn, Cass Everitt, Peach Myers

The last slide… ? kmyers@nvidia.com lbavoil@nvidia.com