IIIT, Hyderabad Ray Tracing Dynamic Scenes on the GPU Sashidhar Guntury Centre for Visual Information Technology IIIT, Hyderabad. India.

Slides:

Advertisements

Similar presentations

Sven Woop Computer Graphics Lab Saarland University

Advertisements

Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Physically Based Real-time Ray Tracing Ryan Overbeck.

IIIT Hyderabad Hybrid Ray Tracing and Path Tracing of Bezier Surfaces using a mixed hierarchy Rohit Nigam, P. J. Narayanan CVIT, IIIT Hyderabad, Hyderabad,

Ray Tracing CMSC 635. Basic idea How many intersections?  Pixels  ~10 3 to ~10 7  Rays per Pixel  1 to ~10  Primitives  ~10 to ~10 7  Every ray.

A Coherent Grid Traversal Algorithm for Volume Rendering Ioannis Makris Supervisors: Philipp Slusallek*, Céline Loscos *Computer Graphics Lab, Universität.

Week 10 - Monday.  What did we talk about last time?  Global illumination  Shadows  Projection shadows  Soft shadows.

High-Quality Parallel Depth-of- Field Using Line Samples Stanley Tzeng, Anjul Patney, Andrew Davidson, Mohamed S. Ebeida, Scott A. Mitchell, John D. Owens.

Two-Level Grids for Ray Tracing on GPUs

I3D Fast Non-Linear Projections using Graphics Hardware Jean-Dominique Gascuel, Nicolas Holzschuch, Gabriel Fournier, Bernard Péroche I3D 2008.

Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Mario Rincón-Nigro PhD Showcase Feb 17 th, 2012.

Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek.

Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.

Computer Graphics (Fall 2005) COMS 4160, Lecture 21: Ray Tracing

1 View Coherence Acceleration for Ray Traced Animation University of Colorado at Colorado Springs Master’s Thesis Defense by Philip Glen Gage April 19,

3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.

Specialized Acceleration Structures for Ray-Tracing Warren Hunt Bill Mark.

Final Gathering on GPU Toshiya Hachisuka University of Tokyo Introduction Producing global illumination image without any noise.

RT08, August ‘08 Large Ray Packets for Real-time Whitted Ray Tracing Ryan Overbeck Columbia University Ravi Ramamoorthi Columbia University William R.

Erdem Alpay Ala Nawaiseh. Why Shadows? Real world has shadows More control of the game’s feel  dramatic effects  spooky effects Without shadows the.

Ray Tracing Primer Ref: SIGGRAPH HyperGraphHyperGraph.

Computer Graphics 2 Lecture x: Acceleration Techniques for Ray-Tracing Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.

Interactive Ray Tracing: From bad joke to old news David Luebke University of Virginia.

Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.

Technology and Historical Overview. Introduction to 3d Computer Graphics  3D computer graphics is the science, study, and method of projecting a mathematical.

1 Speeding Up Ray Tracing Images from Virtual Light Field Project ©Slides Anthony Steed 1999 & Mel Slater 2004.

Cg Programming Mapping Computational Concepts to GPUs.

Stefan PopovHigh Performance GPU Ray Tracing Real-time Ray Tracing on GPU with BVH-based Packet Traversal Stefan Popov, Johannes Günther, Hans- Peter Seidel,

On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University.

Real-Time Rendering SPEEDING UP RENDERING Lecture 04 Marina Gavrilova.

Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002.

Global Illumination with a Virtual Light Field Mel Slater Jesper Mortensen Pankaj Khanna Insu Yu Dept of Computer Science University College London

1 Real-time visualization of large detailed volumes on GPU Cyril Crassin, Fabrice Neyret, Sylvain Lefebvre INRIA Rhône-Alpes / Grenoble Universities Interactive.

Saarland University, Germany B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek.

Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)

Binary Space Partitioning Trees Ray Casting Depth Buffering

Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward.

David Luebke11/26/2015 CS 551 / 645: Introductory Computer Graphics David Luebke

Sample Based Visibility for Soft Shadows using Alias-free Shadow Maps Erik Sintorn – Ulf Assarsson – uffe.

Hierarchical Penumbra Casting Samuli Laine Timo Aila Helsinki University of Technology Hybrid Graphics, Ltd.

Computer Graphics II University of Illinois at Chicago Volume Rendering Presentation for Computer Graphics II Prof. Andy Johnson By Raj Vikram Singh.

- Laboratoire d'InfoRmatique en Image et Systèmes d'information

Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin.

Emerging Technologies for Games Deferred Rendering CO3303 Week 22.

Interactive Ray Tracing of Dynamic Scenes Tomáš DAVIDOVIČ Czech Technical University.

Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna.

Pure Path Tracing: the Good and the Bad Path tracing concentrates on important paths only –Those that hit the eye –Those from bright emitters/reflectors.

Ray Tracing Fall, Introduction Simple idea  Forward Mapping  Natural phenomenon infinite number of rays from light source to object to viewer.

Discontinuous Displacement Mapping for Volume Graphics, Volume Graphics 2006, July 30, Boston, MA Discontinuous Displacement Mapping for Volume Graphics.

Compact, Fast and Robust Grids for Ray Tracing Ares Lagae & Philip Dutré 19 th Eurographics Symposium on Rendering EGSR 2008Wednesday, June 25th.

Compact, Fast and Robust Grids for Ray Tracing

COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.

Real-Time Dynamic Shadow Algorithms Evan Closson CSE 528.

Ray Tracing Optimizations

David Luebke 3/5/2016 Advanced Computer Graphics Lecture 4: Faster Ray Tracing David Luebke

David Luebke3/12/2016 Advanced Computer Graphics Lecture 3: More Ray Tracing David Luebke

Path/Ray Tracing Examples. Path/Ray Tracing Rendering algorithms that trace photon rays Trace from eye – Where does this photon come from? Trace from.

SHADOW CASTER CULLING FOR EFFICIENT SHADOW MAPPING JIŘÍ BITTNER 1 OLIVER MATTAUSCH 2 ARI SILVENNOINEN 3 MICHAEL WIMMER 2 1 CZECH TECHNICAL UNIVERSITY IN.

Fabianowski · DinglianaInteractive Global Photon Mapping1 / 22 Interactive Global Photon Mapping Bartosz Fabianowski · John Dingliana Trinity College Dublin.

Real-Time ray casting for virtual reality

Hybrid Ray Tracing and Path Tracing of Bezier Surfaces using a mixed hierarchy Rohit Nigam, P. J. Narayanan CVIT, IIIT Hyderabad, Hyderabad, India.

Two-Level Grids for Ray Tracing on GPUs

Graphics Processing Unit

Real-Time Ray Tracing Stefan Popov.

Hybrid Ray Tracing of Massive Models

© University of Wisconsin, CS559 Fall 2004

GEARS: A General and Efficient Algorithm for Rendering Shadows

Real-time Global Illumination with precomputed probe

Presentation transcript:

IIIT, Hyderabad Ray Tracing Dynamic Scenes on the GPU Sashidhar Guntury Centre for Visual Information Technology IIIT, Hyderabad. India

IIIT, Hyderabad Raytracing? Raytracing is a technique to synthesize image by simulating light travel in real world For every pixel, a ray is shot to see if it intersects any geometry in the world – If it does, what is its color? Computationally very expensive! Different from rasterization which determines the geometry visible from camera

IIIT, Hyderabad Acceleration Datastructures Process of raytracing can be speeded up using Acceleration data structures – BVH, Kd-tree, Grids are popular Exploit the fact that rays in a scene have a fixed direction and move in straight lines Often groups of rays have same direction which is call coherence Coherence very important for using the SIMD of multicores to speedup tracing

IIIT, Hyderabad Hierarchical Structures Kd-tree - Divide the space according to some cost metric BVH – bounding volumes to encompass geometry Costly to build on GPU

IIIT, Hyderabad Grids Very easy to build. Bin triangles in a voxel in world space Cheap to build on GPU Not as versatile as BVH or Kd-tree

IIIT, Hyderabad GPU computing model GPU are fine grained multicore architectures – To abstract the hardware, tasks are divided into groups and blocks. – Threads in a block share data among themselves – Large global memory but slow – Fast shared memory but confined to a block – Cache in recent architectures Threads are scheduled batches of 32 (warp) Lots of data movement!

IIIT, Hyderabad Objective Interactive ray-tracing using commodity processors GPUs suitable with high compute power Render a million triangles to a million pixels at interactive rates – better grid datastructure – shadows – reflection rays Typical game scene

IIIT, Hyderabad Related Work Patidar08:primary rays on GPU – Perspective grids and fast sorting – 20 fps on a 1M model, 100 on 70K Kalojanov09: improved uniform grid construction – sort cell-triangle pair list Wald 06:traversing grid in slice wise manner on multicore CPU – Great for small SIMD architectures Hunt 08: building acceleration structure for each light source – perspective grids

IIIT, Hyderabad What do we do? Achieving acceptable frame rates – Fast traversal (processing) of rays capitalize on the behavior of the rays Increase coherence – Minimize costly data transfers bundle rays as packets Deformable or dynamic scenes – Fast build time for the acceleration structure Perspective grids: cheap and coherent

IIIT, Hyderabad Perspective grids for Primary Rays Rays which are spatially similar always remain together – Share same data and traversal Keep them together as packets. Cost amortized by bringing triangle data in one go. Bundle as many rays in a packet, as the GPU has wide SIMD Sorting is the key step Z buffer style early termination

IIIT, Hyderabad Ideas from Rasterization Primary rays need not have all triangle data in the space Can get rid of substantial geometry by culling back face geometry as the grid is specialized for primary rays Same philosophy for view frustum culling

IIIT, Hyderabad Indirect Mapping Use small tiles and voxels for efficiency – Sorting is cheap – Fewer triangles to be considered for intersection But not too small for the SIMD width – GPUs have large SIMD width – Inefficient if ray packet size less than SIMD width A hybrid solution: – Sort to small tiles – Trace multiple such tiles together for coherence

IIIT, Hyderabad Mapping of Tiles k x k tiles processed together for tile traversal K = 2 speeds up traversal – sort to 256 x 256 tiles and trace to 128 x 128 tiles Spatially close rays reuse data to allow better performance – Shared memory partitioned among tiles – Triangles queried multiple time loaded directly from L1 cache => better performance Sorting TilesRendering Blocks

IIIT, Hyderabad Results Marked difference with these two simple techniques Lesser triangles => lesser intersection checks => savings made in tracing step as well

IIIT, Hyderabad Results

IIIT, Hyderabad Ray Tracing with Shadows Lauterbach 09:BVH Construction – Performance showed for shadows – 14 fps for 92k model (2 nd generation hardware) Hou09: kd-tree construction – costly to build for large models – More suitable for static scenes as building is costly – 5.26 fps for 178k model (1 st generation hardware) Hunt 08:build grid for every frame from light point of view. – On multicores

IIIT, Hyderabad Properties of Shadow Rays Exhibit spatial coherence BVH and Kd-tree take advantage – Grids don’t use this coherence Wald et. Al proposed a method to group rays in a frustum and then traverse the voxels in this frustum – Good for small SIMD. GPU has wide SIMD width – Problems near silhouttes – Grid traversal time still large

IIIT, Hyderabad Rebuilding datastructure Forwarded by Hunt and Mark 2008 – Build grid datastructure from the point of view of every light source – Very fast compared to BVH or Kd- tree Specialized datastructure to take advantage of pinpoint coherence of shadow rays

IIIT, Hyderabad Handling shadow rays? Seen from primary rays’ grid, shadow rays have arbitrary directions Building a grid from point of view of light will make shadow rays coherent Treat like primary rays only – Get the tile for the ray – Traverse along the depth direction – Stop when all rays in the tile are done

IIIT, Hyderabad Mapping of rays to Tiles For primary rays, tiles are determined using pixel values For shadow rays, use intersection – For every ray, get the intersection point with near plane of the grid – Get the tile that the point is part of – Associate ray with this tile

IIIT, Hyderabad Reordering Rays by Tiles Shadow rays are indexed according to the pixel ID of the primary rays Determine the tile each ray falls in – a ray-tile pair array – currently sorted by ray ID Bring all the rays of the same tile together – Processed together – Sort on tile ID Use GPU hardware better with reordering Buid DS Reorder Rays Load Balancing Traverse Rays

IIIT, Hyderabad Load Balancing Distribution of light rays in tiles not uniform – dependent on light, scene – some tiles excessively populated Devote one tile to a CUDA block – inefficient on GPU due to mismatched loads Distribute the load into smaller, independent work units Build DS Reorder Rays Load Balancing Traverse Rays

IIIT, Hyderabad What if light in inside the model? – Typical for architectural models Define a spherical coordinate system for light – Build the frustum only for the necessary range Pole singularity – duplication of triangles along the pole – Good choice of forward direction Handling more scenes

IIIT, Hyderabad Timings

IIIT, Hyderabad Position of Light Since the grid is built from the point of view of light, its distance can change things – More rays get bundled in same tile leading to disparities in population of cells Load balancing helps us in keeping the number of rays at a threshold level

IIIT, Hyderabad Comparision with SBVH Almost in line with the performance of SBVH inspite of low grid build time SBVH build time large and eats into the advantage of faster shadow computation Number of lights needed so that SBVH performance may be same as Grid

IIIT, Hyderabad Sample Images

IIIT, Hyderabad Reflection Rays Representative of general secondary rays Much harder to exploit coherence than shadow rays – Spread in arbitrary directions – Can not afford to build as many datastructures as the directions Perspective grid not at all useful – Fall back on traditional grids, Kalojanov et. al – No BFC, VFC

IIIT, Hyderabad Independent Voxel Walk Find the voxel for the ray (in parallel) and check for intersection. – Each ray functions independently – Coherence not set explicitly Different voxels => large number of misses in cache Trust the accidental coherence of the scene IVW benefits from cache if multiple rays check in same voxel – L2 cache fairly large – Attractive as cache mechanisms evolve

IIIT, Hyderabad Enforced Coherence Explicitly make the rays coherent – Non-trivial – Time taken can impact the performance Take rays and make a list of voxels they traverse Sort them by voxel ID to get all rays in a voxel together – Ordering lost – Have to check all voxels. No early termination  – Atomic operation to compute earliest intersection

IIIT, Hyderabad Results Three models were used – Buddha (1.09M tris) :: geometry clumped together Finely tesselated => varying normals – Fairy in Forest (176K tris) :: mostly sparse with dense areas Lot of varying normals – Conference (252K tris) :: well distributed geometry Lots of flat surfaces which makes normals aligned and coherent

IIIT, Hyderabad EC slower on GTX 480 as IVW is exploiting cache very well Times reversed on GTX 280 as it does not have cache and penalizes scenes with arbitrary normals

IIIT, Hyderabad Performance of Buddha IVW has early exit for rays Least number of iterations for Buddha (80% done in about 80 iterations) – Mostly because we are taking self reflection – All geometry densely packed

IIIT, Hyderabad Voxel Concentration Higher rays in a voxel, more times the same data is used among CUDA blocks => better L2 usage Conference scene has better voxel concentration

IIIT, Hyderabad Voxel Divergence Lesser the voxel divergence, more data is reused in a CUDA block => L1 cache used well! Conference manages to keep its voxel divergence low for more tiles which leads to better performance

IIIT, Hyderabad Comparision with SBVH Grids for reflection rays are largely inefficient However SBVH construction time really huge and that preprocessing doesn’t show here Gains can be made in dynamic scene

IIIT, Hyderabad Timings for a pass SBVH method gains 25 ms for every reflection pass If GPU SBVH build is 40 times faster than on CPU, it’ll take 35 passes to catch up with grid performance.

IIIT, Hyderabad Dynamic scenes Now SBVH has to be built every frame SBVH time atleast as much as HBVH (160 ms)

IIIT, Hyderabad Contributions Modified datastructure to eliminate triangles that do not contribute to raytracing Introduced indirect mapping to exploit faster sorting times and the hardware of GPU Fast shadow tracing by extending perspective grids to shadow rays Used spherical grid mapping to accommodate lights inside a scene Load balancing to distribute unevenly spread shadow rays evenly for better processing Proposed Enforced Coherence (EC) method to gather rays and treat them together Modified the load balancing scheme of shadow rays to achieve equitable distribution for processing

IIIT, Hyderabad Conclusions Coherence is most important on wide SIMD architectures Grids well suited to dynamic scenes – Cheap; the datastructure can be built every frame – Tree structures costly to create Perspective grids provide coherence for primary rays – Secondary rays are equally bad as with BVH, Kd-tree Indirect mapping to match work load to architecture – Sorting is fast and can reduce triangle intersections – Match work with SIMD width Spherical space treats point and spotlights coherently

IIIT, Hyderabad More Conclusions Most work done till now looks at raytracing in an isolated fashion – Moderate or inferior datastructures are easy to build but costly to traverse – Good traversal needs better datastructures => more time to build them Our method strikes a good deal with both benchmarks Use different datastructures for different kinds of rays

IIIT, Hyderabad Future Work Grid resolution is not impervious to problems – 128x128x128 resolution for a long thin object… Extending the method to area light sources Examining the work in the context of higher level algorithms like gathering and point based color bleeding

IIIT, Hyderabad Publications Raytracing Dynamic Scenes with Shadows on GPU Sashidhar Guntury and P. J. Narayanan Eurographics Symposium on Parallel Graphics and Visualization, 2010 (EG-PGV 2010) Norrkoping, Sweden (the paper was adjudged best paper in the conference) Raytracing Dynamic Scenes on GPU Sashidhar Guntury and P. J. Narayanan IEEE Transactions on Visualization and Computer Graphics

IIIT, Hyderabad Thank you!

IIIT, Hyderabad Any Questions?

IIIT, Hyderabad Reordering Rays by Tiles Shadow rays are indexed according to the pixel ID of the primary rays Determine the tile each ray falls in – a ray-tile pair array – currently sorted by ray ID Bring all the rays of the same tile together – Processed together – Sort on tile ID Use GPU hardware better with reordering! Buid DS Reorder Rays Load Balancing Traverse Rays

IIIT, Hyderabad …… …… … … … … sorting Ray ID Tile ID Buid DS Reorder Rays Load Balancing Traverse Rays

IIIT, Hyderabad Load Balancing Distribution of light rays in tiles not uniform – dependent on light, scene – some tiles excessively populated Devote one tile to a CUDA block – inefficient on GPU due to mismatched loads Distribute the load into smaller, independent work units Build DS Reorder Rays Load Balancing Traverse Rays

IIIT, Hyderabad Load Balancing Black Array: A linear index of each ray Blue Array: Ray ID; Colored Array: Tile ID Find the number of rays mapping to each tile using segmented scan segmented scan NUMBER OF RAYS Build DS Reorder Rays Load Balancing Traverse Rays

IIIT, Hyderabad Hard and soft chunks Rays belonging to same tile form a hard chunk According to a threshold number rays in a hard chunk are broken into soft chunks Load is distributed in a equitable fashion

IIIT, Hyderabad Load Balancing thresholding step get the soft boundaries.. Before thresholding: Number of rays belonging to each tile After thresholding: Number of rays belonging to a CUDA work unit Build DS Reorder Rays Load Balancing Traverse Rays

IIIT, Hyderabad Hard and soft chunks Rays belonging to same tile form a hard chunk According to a threshold number rays in a hard chunk are broken into soft chunks Load is distributed in a equitable fashion

IIIT, Hyderabad Load Balancing Green Array: Offset of rays belonging to each rendering block/tile Difference of two consecutive locations: number of rays stream compaction step TOTAL NUMBER OF BLOCKS Build DS Reorder Rays Load Balancing Traverse Rays

IIIT, Hyderabad Spherical Mapping at Work Build DS Reorder Rays Load Balancing Traverse Rays Create a light frustum in alpha-theta space Measure alpha from forward direction in forward- right plane Measure alpha from up forward direction in forward- up plane Pole singularity – duplication of triangles along the pole – Good choice of forward direction

IIIT, Hyderabad Frustum Clamping Get bounding box of angles of all the shadow rays Include those triangles which fall in the bounding box – Frustum clamping User defined Clamp values can implement spotlights Build a perspective grid with respect to the light source Build DS Reorder Rays Load Balancing Traverse Rays