Eurographics 2012, Cagliari, Italy S-buffer: Sparsity-aware Multi-fragment Rendering Andreas A. Vasilakis and Ioannis Fudos Department of Computer Science,

Slides:



Advertisements
Similar presentations
Accelerating Real-Time Shading with Reverse Reprojection Caching Diego Nehab 1 Pedro V. Sander 2 Jason Lawrence 3 Natalya Tatarchuk 4 John R. Isidoro 4.
Advertisements

Chapter 12: File System Implementation
Exploration of advanced lighting and shading techniques
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Parallax-Interpolated Shadow Map Occlusion
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Depth - fighting aware Methods for Multifragment Rendering Andreas A. Vasilakis and Ioannis Fudos Department of Computer Science, University of Ioannina,
File Systems.
Direct3D New Rendering Features
Chapter 11: File System Implementation
Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis.
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
Introduction to Geometry Shaders Patrick Cozzi Analytical Graphics, Inc.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering Cass Everitt and Mark J. Kilgard Speaker: Alvin Date: 5/28/2003 NVIDIA.
Introduction to Geometry Shaders Patrick Cozzi Analytical Graphics, Inc.
Chapter 12: File System Implementation
Final Gathering on GPU Toshiya Hachisuka University of Tokyo Introduction Producing global illumination image without any noise.
File System Structure §File structure l Logical storage unit l Collection of related information §File system resides on secondary storage (disks). §File.
The Graphics Pipeline CS2150 Anthony Jones. Introduction What is this lecture about? – The graphics pipeline as a whole – With examples from the video.
Paper by Alexander Keller
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Efficient Pseudo-Random Number Generation for Monte-Carlo Simulations Using GPU Siddhant Mohanty, Subho Shankar Banerjee, Dushyant Goyal, Ajit Mohanty.
Erdem Alpay Ala Nawaiseh. Why Shadows? Real world has shadows More control of the game’s feel  dramatic effects  spooky effects Without shadows the.
Basic File Structures and Hashing Lectured by, Jesmin Akhter, Assistant professor, IIT, JU.
NVIDIA PROPRIETARY AND CONFIDENTIAL Occlusion (HP and NV Extensions) Ashu Rege.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia.
Advanced Computer Graphics Depth & Stencil Buffers / Rendering to Textures CO2409 Computer Graphics Week 19.
On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-8 Memory Management (2) Department of Computer Science and Software.
Shadows. Shadows is important in scenes, consolidating spatial relationships “Geometric shadows”: the shape of an area in shadow Early days, just pasted.
Shadow Mapping Chun-Fa Chang National Taiwan Normal University.
CS 149: Operating Systems March 3 Class Meeting Department of Computer Science San Jose State University Spring 2015 Instructor: Ron Mak
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
Improving k-buffer methods via Occupancy Maps Andreas A. Vasilakis and Georgios Papaioannou Dept. of Informatics, Athens University of Economics & Business,
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Stencil Routed A-Buffer
Based on paper by: Rahul Khardekar, Sara McMains Mechanical Engineering University of California, Berkeley ASME 2006 International Design Engineering Technical.
10.1 CSE Department MAITSandeep Tayal 10 :File-System Implementation File-System Structure Allocation Methods Free-Space Management Directory Implementation.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Collections Data structures in Java. OBJECTIVE “ WHEN TO USE WHICH DATA STRUCTURE ” D e b u g.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
Real-Time Dynamic Shadow Algorithms Evan Closson CSE 528.
FILE SYSTEM IMPLEMENTATION 1. 2 File-System Structure File structure Logical storage unit Collection of related information File system resides on secondary.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
CGI2014 SPONSORED BY Ray tracing via GPU Rasterization Wei Hu 1 Yangyu Huang 1 Fan Zhang 1 Guodong Yuan 2 1 Beijing University of Chemical Technology,
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
GPU-based iterative CT reconstruction
Real-Time Soft Shadows with Adaptive Light Source Sampling
File-System Implementation
Deep Partitioned Shadow Volumes Using Stackless and Hybrid Traversals
Graphics Processing Unit
Deferred Lighting.
From Turing Machine to Global Illumination
NVIDIA Fermi Architecture
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Real-time Volumetric Lighting in Participating Media
Advance Database System
Texture and Shadow Mapping
Accelerating k+-buffer using efficient fragment culling
Improving k-buffer methods via Occupancy Maps
By Yogesh Neopaney Assistant Professor Department of Computer Science
Overview Problem Solution CPU vs Memory performance imbalance
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

Eurographics 2012, Cagliari, Italy S-buffer: Sparsity-aware Multi-fragment Rendering Andreas A. Vasilakis and Ioannis Fudos Department of Computer Science, University of Ioannina, Greece

Eurographics 2012, Cagliari, Italy Why processing multiple fragments? A number of image-based applications require operations on more than one (maybe occluded) fragment per pixel: –transparency effects –volume and csg rendering –collision detection –shadow mapping –global illumination –voxelization –… 2

Eurographics 2012, Cagliari, Italy Prior Art Geometry Sorting Methods –Object sorting –Primitive sorting Fragment Sorting Methods –Depth Peeling –Buffer-based 3

Eurographics 2012, Cagliari, Italy Prior Art Multi-Fragment Rendering Design Goals –Quality: Fragment extraction accuracy (A) –Time performance (P) –Memory allocation (Ma) and caching (Mc) –Gpu capabilities - (G) 4

Eurographics 2012, Cagliari, Italy Prior Art Depth Peeling Methods [Everitt01,Bavoil08,Liu09] –A: z-fighting artifacts –P: slow due to multi-pass rendering –Ma: low/constant budget, Mc: fast –G: commodity and modern cards 5 1 st pass 2 nd pass 3 rd pass background

Eurographics 2012, Cagliari, Italy Prior Art Buffer-based Methods –Fixed-sized Arrays Ma: huge (most of them goes unused) Mc: fast G: –Commodity: K-buffer [Bavoil07], SRAB [Myers07] »A: 8 fragments per pixel »P: fast (possible multi-pass) –Modern: FreePipe [Liu2010] »A: 100% if enough memory »P: fastest (single pass) 6

Eurographics 2012, Cagliari, Italy Prior Art Buffer-based Methods –Linked Lists [Yang10] A: 100% if enough memory P: fast (fragment congestion) Ma: high –if overflow: accurate reallocation (extra pass needed) –else: wasted memory Mc: low cache hit ratio G: only modern cards 7

Eurographics 2012, Cagliari, Italy Prior Art Buffer-based Methods –Variable-length Arrays A: 100% if enough memory P: fast (2 passes needed) Ma: precise Mc: fast G: –Commodity: »PreCalc [Peeper08] (common prefix sum) »L-buffer [Lipowski10] (randomized prefix sum) 8

Eurographics 2012, Cagliari, Italy Example: (PreCalc, L-buffer) 9 Counter Buffer PreCalc Memory Offsets L-buffer Memory Offsets

Eurographics 2012, Cagliari, ItalyS-buffer 1.Fragment Count Rendering Pass 1.Number of fragments per pixel 2.Total generated fragments 2.Memory Referencing –Parallelized randomized prefix sum S multiple shared counters: Simple hash function: Sequential prefix sum on shared counters: Inverse Mapping –Slit to two groups: –Final memory offset: 10

Eurographics 2012, Cagliari, ItalyS-buffer 2.Fragment Storing Rendering Pass 3.Fragment Sorting –Insertion Sort 4.Resolve 11

Eurographics 2012, Cagliari, Italy Example: S-buffer(3) 12 Counter Buffer Local Address Buffer C(i) 164 C pr (i) 017 Memory Offsets C pr (i) 010 Memory Offsets Inverse mapping

Eurographics 2012, Cagliari, ItalyResults Time and Memory Efficiency PreCalc_OpenCL –Parallel Implementation of Prefix Sum [NVIDIA SDK] PreCalc_Fixed –One rendering pass (Fixed-size Structure) –Memory Offsetting: FreePipe_OpenGL –CUDA-free implementation [Crassin10] Advanced l-buffer – S-buffer using only 1 shared counter OpenGL 4.2 API - NVIDIA GTX

Eurographics 2012, Cagliari, ItalyResults Performance (70000 faces, 12 layers, viewport) –Linked Lists: O(m), m(>n) = total fragments –L-buffer: O(n), n = non-empty pixels –S-buffer’s speed up: n/S, S = shared counters –PreCalc_OpenCL: OpenGL/OpenCL syncing time 14

Eurographics 2012, Cagliari, ItalyResults Performance ( faces, 25 layers, 55% sparsity) –Different Resolutions –S-buffer = 85% of PreCalc_Fixed –Forward vs Inverse Mapping 15

Eurographics 2012, Cagliari, ItalyResults Memory Allocation (25 depth layers) –Fixed Sized Arrays Wasted resources (88%) KB,SRAB: 30% less memory due to 8 fragments/pixel –Linked Lists Extra memory for storing pointers to next fragment 16

Eurographics 2012, Cagliari, ItalyConclusions S-buffer –Gpu-accelerated A-buffer Fragment distribution and pixel sparsity Parallelism – Inverse Mapping OpenGL Pipeline Limitations –Additional rendering pass –Unbounded storage requirements and Per-pixel post-sorting –OpenGL 4.2 Future Work –Tessellation –History-based 17

Eurographics 2012, Cagliari, Italy Thank You - Questions? Thank You - Questions? Source Code Available at: 18

Eurographics 2012, Cagliari, ItalyNotes # shared counters GeForce 480 GTX – 35 multiprocessors OpenCL prefix sum from NVIDIA SDK –256 threads [16,16] ? 19

Eurographics 2012, Cagliari, ItalyResults Performance - Memory Referencing –Inverse Mapping –OpenGL/OpenCL interoperability 20