GPU Cost Estimation for Load Balancing in Parallel Ray Tracing Biagio Cosenza* Universität Innsbruck Austria Carsten Dachsbacher Karlsruhe Institute of.

Slides:



Advertisements
Similar presentations
Visible-Surface Detection(identification)
Advertisements

Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Spectral Analysis of Function Composition and Its Implications for Sampling in Direct Volume Visualization Steven Bergner GrUVi-Lab/SFU Torsten Möller.
Comparative Visualization for Wave-based and Geometric Acoustics Eduard Deines 1, Martin Bertram 3, Jan Mohring 4, Jevgenij Jegorovs 4, Frank Michel 1,
LOD Map – A Visual Interface for Navigating Multiresolution Volume Visualization Chaoli Wang and Han-Wei Shen The Ohio State University Presented at IEEE.
Exploded Views for Volume Data Stefan Bruckner, M. Eduard Gröller Institute of Computer Graphics and Algorithms Vienna University of Technology.
15.1 Si23_03 SI23 Introduction to Computer Graphics Lecture 15 – Visible Surfaces and Shadows.
GR2 Advanced Computer Graphics AGR
ENV 2006 CS4.1 Envisioning Information: Case Study 4 Focus and Context for Volume Visualization.
13.1 si31_2001 SI31 Advanced Computer Graphics AGR Lecture 13 An Introduction to Ray Tracing.
Efficient Distributed Load Balancing for Parallel Algorithms Biagio Cosenza Dipartimento di Informatica Università di Salerno, Italy Supervisor Prof. Vittorio.
Accelerating Real-Time Shading with Reverse Reprojection Caching Diego Nehab 1 Pedro V. Sander 2 Jason Lawrence 3 Natalya Tatarchuk 4 John R. Isidoro 4.
Simulating Decorative Mosaics Alejo Hausner University of Toronto [SIGGRAPH2001]
Shredder GPU-Accelerated Incremental Storage and Computation
5 August, 2014 Martijn v/d Horst, TU/e Computer Science, System Architecture and Networking 1 Martijn v/d Horst
1 05/10/2014 Computer Graphics Lecture 10 Global Illumination 1: Ray Tracing and Radiosity Taku Komura.
Real-time Shading with Filtered Importance Sampling
Bidirectional Photon Mapping Jiří Vorba Charles University in Prague Faculty of Mathematics and Physics 1.
An Optimized Soft Shadow Volume Algorithm with Real-Time Performance Ulf Assarsson 1, Michael Dougherty 2, Michael Mounier 2, and Tomas Akenine-Möller.
On Dynamic Load Balancing on Graphics Processors Daniel Cederman and Philippas Tsigas Chalmers University of Technology.
Technische Universität München Fakultät für Informatik Computer Graphics SS 2014 Sampling Rüdiger Westermann Lehrstuhl für Computer Graphik und Visualisierung.
Group Meeting Presented by Wyman 10/14/2006
Tarun Bansal*, Karthik Sundaresan+,
CS123 | INTRODUCTION TO COMPUTER GRAPHICS Andries van Dam © 1/16 Deferred Lighting Deferred Lighting – 11/18/2014.
Sven Woop Computer Graphics Lab Saarland University
Minimum Vertex Cover in Rectangle Graphs
Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Ray tracing. New Concepts The recursive ray tracing algorithm Generating eye rays Non Real-time rendering.
Graphics Pipeline.
IIIT Hyderabad Hybrid Ray Tracing and Path Tracing of Bezier Surfaces using a mixed hierarchy Rohit Nigam, P. J. Narayanan CVIT, IIIT Hyderabad, Hyderabad,
Ray Tracing CMSC 635. Basic idea How many intersections?  Pixels  ~10 3 to ~10 7  Rays per Pixel  1 to ~10  Primitives  ~10 to ~10 7  Every ray.
A Coherent Grid Traversal Algorithm for Volume Rendering Ioannis Makris Supervisors: Philipp Slusallek*, Céline Loscos *Computer Graphics Lab, Universität.
Ray Tracing & Radiosity Dr. Amy H. Zhang. Outline  Ray tracing  Radiosity.
Two-Level Grids for Ray Tracing on GPUs
Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Mario Rincón-Nigro PhD Showcase Feb 17 th, 2012.
Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek.
Paper Presentation - An Efficient GPU-based Approach for Interactive Global Illumination- Rui Wang, Rui Wang, Kun Zhou, Minghao Pan, Hujun Bao Presenter.
EFFICIENT RENDERING LARGE TERRAINS USING MULTIRESOLUTION MODELLING AND IMAGE PROCESSING TECHNIQUES Ömer Nebil YAVEROĞLU Department of Computer Engineering.
Real-Time Rendering Paper Presentation Imperfect Shadow Maps for Efficient Computation of Indirect Illumination T. Ritschel T. Grosch M. H. Kim H.-P. Seidel.
Memory-Savvy Distributed Interactive Ray Tracing David E. DeMarle Christiaan Gribble Steven Parker.
Computer Graphics (Fall 2005) COMS 4160, Lecture 16: Illumination and Shading 1
Computer Graphics (Fall 2005) COMS 4160, Lecture 21: Ray Tracing
(conventional Cartesian reference system)
IN4151 Introduction 3D graphics 1 Introduction to 3D computer graphics part 2 Viewing pipeline Multi-processor implementation GPU architecture GPU algorithms.
Enhancing and Optimizing the Render Cache Bruce Walter Cornell Program of Computer Graphics George Drettakis REVES/INRIA Sophia-Antipolis Donald P. Greenberg.
Introduction to Parallel Rendering: Sorting, Chromium, and MPI Mengxia Zhu Spring 2006.
RT08, August ‘08 Large Ray Packets for Real-time Whitted Ray Tracing Ryan Overbeck Columbia University Ravi Ramamoorthi Columbia University William R.
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
Technology and Historical Overview. Introduction to 3d Computer Graphics  3D computer graphics is the science, study, and method of projecting a mathematical.
Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002.
Global Illumination with a Virtual Light Field Mel Slater Jesper Mortensen Pankaj Khanna Insu Yu Dept of Computer Science University College London
Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.
Saarland University, Germany B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek.
Interactive Rendering With Coherent Ray Tracing Eurogaphics 2001 Wald, Slusallek, Benthin, Wagner Comp 238, UNC-CH, September 10, 2001 Joshua Stough.
Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)
1 Perception and VR MONT 104S, Fall 2008 Lecture 21 More Graphics for VR.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
- Laboratoire d'InfoRmatique en Image et Systèmes d'information
Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin.
Pure Path Tracing: the Good and the Bad Path tracing concentrates on important paths only –Those that hit the eye –Those from bright emitters/reflectors.
Ray Tracing Fall, Introduction Simple idea  Forward Mapping  Natural phenomenon infinite number of rays from light source to object to viewer.
02/12/03© 2003 University of Wisconsin Last Time Intro to Monte-Carlo methods Probability.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
Siggraph 2009 RenderAnts: Interactive REYES Rendering on GPUs Kun Zhou Qiming Hou Zhong Ren Minmin Gong Xin Sun Baining Guo JAEHYUN CHO.
Real-Time Soft Shadows with Adaptive Light Source Sampling
Real-Time Ray Tracing Stefan Popov.
Hybrid Ray Tracing of Massive Models
GEARS: A General and Efficient Algorithm for Rendering Shadows
Presentation transcript:

GPU Cost Estimation for Load Balancing in Parallel Ray Tracing Biagio Cosenza* Universität Innsbruck Austria Carsten Dachsbacher Karlsruhe Institute of Technology Germany Ugo Erra Università della Basilicata Italy

GRAPP, Barcelona (Spain), February, 2013 Outline Motivation Parallel Ray Tracing Generating the Cost Map Exploiting the Cost Map for Load Balancing – SAT Adaptive Tiling – SAT Sorting Results

GRAPP, Barcelona (Spain), February, 2013 Ray Tracing Algorithms

GRAPP, Barcelona (Spain), February, 2013 Ray Tracing Algorithms Toasters, Whitted 8. 4M primary rays Kalabasha Temple, path tracing 134.2M primary rays

GRAPP, Barcelona (Spain), February, 2013 Motivation Load balacing is the major challenge of parallel ray tracing on distributed memory system

GRAPP, Barcelona (Spain), February, 2013 Motivation Load balacing is the major challenge of parallel ray tracing on distributed memory system Cost mapRay traced image

GRAPP, Barcelona (Spain), February, 2013 Motivation Can we generate a cost map before rendering? Can we exploit it to improve load balancing? Cost mapRay traced image

GRAPP, Barcelona (Spain), February, 2013 Where do we have expensive areas?

GRAPP, Barcelona (Spain), February, 2013 Previous work Ray tracing on distributed memory systems – DSM systems (DeMarle et al. 2005, Ize et al. 2011) – Hybrid CPU/GPU system (Budge et al. 2009) – Massive models (Wald et al. 2004, Dietrich et al. 2007) Rendering cost evaluation – Profiling (Gillibrand et al. 2006) – Scene geometry decomposition (Reinhard et al. 1998) – Estimate primitive intersections (Mueller et al. 1995)

GRAPP, Barcelona (Spain), February, 2013 Parallel Ray Tracing Image-space parallelization Scenes that cannot be interactively rendered on a single machine Target hardware – 1 Master/visualization node (with GPU) – 16 Worker nodes (multi-core CPUs)

GRAPP, Barcelona (Spain), February, 2013 Exploiting Parallelism in Ray Tracing Our approach – Vectorial parallelism Intel SSE instruction set (ray packets) – Multi-threading parallelism pthread – Distributed memory parallelism MPI

GRAPP, Barcelona (Spain), February, 2013 Exploiting Parallelism in Ray Tracing Our approach – Vectorial parallelism Intel SSE instruction set (ray packets) – Multi-threading parallelism pthread – Distributed memory parallelism MPI critical for performance!

GRAPP, Barcelona (Spain), February, 2013 Our approach 1.Compute a per-pixel, image-based estimate of the rendering cost, called cost map 2.Use the cost map for subdivision and/or scheduling in order to balance the load between workers 3.A dynamic load balancing scheme improves balancing after the initial tiles assignment

GRAPP, Barcelona (Spain), February, 2013 Generate the Cost Map Image-space Sampling diffuse specular multiple reflections

Cost Map Generation Algorithm For each pixel 1.Add a basic shader cost for the hit surface 2.If reflective, compute sampling 1.Compute the sampling pattern 2.For each sample, do a visibility check by using the reflection cone 1.If visible, add the cost of the reflected surface 2.If the hit surfaces is reflective, add a penalty cost 3.If the reflection cone of the hit surface hit the first one, add a penalty cost 3.Gather samples contributions 3.Use a edge detection filter to raise the cost of border areas Ray-packets falling there need to be split and have a higher cost (remarkable only for Whitted)

GRAPP, Barcelona (Spain), February, 2013 Sampling Pattern At rst, an uniformly distributed set of points is generated (a) According to the shading properties, the pattern is scaled (b) and translated to the origin (c) Lastly, it is transformed according to the projection of the reection vector in the image plane (d)

GRAPP, Barcelona (Spain), February, 2013 Cost Map Results (1) A comparison of the real cost (left) and our GPU-based cost estimate (right)

Cost Map Results (2)

Limitations and Error Analysis Off-screen geometry problem (under-estimation due to secondary rays falling out of the screen) Low sampling rate (under-estimation) Cost estimation error analysis – (+) Cornell scene: 86% of the predictions fall in the first approximation interval (+/5% of the real packet time) – (-) Ekklesiasterion scene: 57%

GRAPP, Barcelona (Spain), February, 2013 Exploiting the Cost Map Our goal is to improve load balancing How can we use a cost map estimation for that? We proposed two approaches – SAT Adaptive Tiling – SAT Sorting

SAT Adaptive Tiling Adaptive subdivision of the image into tiles of roughly equal cost

SAT Adaptive Tiling We use a Summed Area Table (SAT) for fast cost evaluation of a tile – SAT allows us to compute the sum of values in a rectangular region of an image in constant time Details in (Crow & Franklin in SIGGRAPH 84) and (Hensley et al. 2005) for the GPU implementation SAT Adaptive Tiling – Adaptive subdivision of the image – Weighted kd-tree split using the SAT to locate the optimal splits – Details given in the paper

GRAPP, Barcelona (Spain), February, 2013 SAT Sorting Regular (non adaptive) subdivision Use SAT to sort tasks (=tiles) by estimate Improve dynamic load balancing – Schedule more expensive tasks at first – Cheaper tiles later – Dynamic load balancing with work stealing Ensure fine-grained balancing

GRAPP, Barcelona (Spain), February, 2013 Dynamic Load Balancing with Work Stealing Tile Deque In-frame steals Stealing protocol optimizations Task prefetching master workers work stealing tile assignment

GRAPP, Barcelona (Spain), February, 2013 Ray Packet and Tile Two task definitions – Tile for distributed node – Packet for single node ray tracer Tile Buffer – Threads work on more tiles – Improved multi threading scalability

Results Performance with different techniques

GRAPP, Barcelona (Spain), February, 2013 Results Scalability Scalability for up to 16 workers measured for the Cornell Box and path tracing. Timings are in fps.

GRAPP, Barcelona (Spain), February, 2013 Results Work steal (tile transfers) Lower number of tile transfers for adaptive tiling The initial tile assignment provided by the adaptive tiling is more balanced than the regular one Work stealing does not work well with such kind of (almost balanced) workload Average number of tile transfers performed during our test with 4 workers

GRAPP, Barcelona (Spain), February, 2013 Comments SAT Adaptive Tiling – Needs a very accurate cost map estimate to have an improvement in performance How much a tile is more expensive than another one? – Does not fit well with dynamic load balancing SAT Sorting – It works also with a less accurate Cost Map Which ones are the more expensive tiles? – Well fit with the work stealing algorithm – Best performance

GRAPP, Barcelona (Spain), February, 2013 Conclusion Cost map – A per-pixel, image-based estimate of the rendering cost Cost map exploitation for parallel ray tracing – Two strategies Cost map generation and exploitation as two decoupled phases – We can mix them with other approaches

Thanks for your attention Papers web page Acknowledgements FWF, BMWF, DAAD, HPC-Europa2 Intel Visual Computing Institute

Backup slides

Cost Map Generation Algorithm Sampling pattern – Wider sampling pattern for Lambertian and glossy surfaces for path tracing – The pattern collapses to a line for Whitted-style ray tracing and for path tracing using a perfect mirror material. Sample gathering – By summing up the sample contributions, in path tracing We suppose that secondary rays spread along a wide area – By taking their maximum, with Whitted ray tracing All samples belong to one secondary ray and we conservatively estimate the cost by taking the maximum. Edge detection – Ray-packet splitting raises the cost because of the loss of coherence between rays, and this happens especially in geometric edges – We experienced that this extra cost is significant only with Whitted ray tracing Implementation uses deferred shading, with a 2nd sampling pass

Cost Map Generation Algorithm Code for pixel Pi 1.//All the data of the hit surface on the pixel Pi are available 2.cost i basic material cost of the hit surface; 3.if P i is reflective then 4. //Determinate the sampling pattern S, at the point P i, toward the reflection vector R 5. S = compute_sampling_pattern(P i,R); 6. // For each samples, calculate cost contribute 7. for each sample S j in S do 8. sample j 0; 9. if visibility_check(S j,P i ) then 10. increase samplej; 11. if secondary reflection check(S j,P i ) then 12. increase sample j ; 13. end 14. end 15. end 16. //Samples gathering 17. cost i = cost i + gather(S); 18.end 19.return cost i;

Error Analysis Error distribution of the estimation. Each packet-based rendering time is subtract from the cost map estimate, for the same corresponding packet of pixels. The x-axis shows the difference in error intervals, from negative values (left, over-estimation) to positive ones (right, under-estimation). The y-axis plots error occurrences for each error interval.