Siggraph 2009 RenderAnts: Interactive REYES Rendering on GPUs Kun Zhou Qiming Hou Zhong Ren Minmin Gong Xin Sun Baining Guo JAEHYUN CHO.

Slides:



Advertisements
Similar presentations
Accelerating Real-Time Shading with Reverse Reprojection Caching Diego Nehab 1 Pedro V. Sander 2 Jason Lawrence 3 Natalya Tatarchuk 4 John R. Isidoro 4.
Advertisements

Ray Tracing Depth Maps Using Precomputed Edge Tables Kevin Egan Rhythm & Hues Studios.
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Graphics Pipeline.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
Topology-Caching for Dynamic Particle Volume Raycasting Jens Orthmann, Maik Keller and Andreas Kolb, University of Siegen.
Ray Tracing CMSC 635. Basic idea How many intersections?  Pixels  ~10 3 to ~10 7  Rays per Pixel  1 to ~10  Primitives  ~10 to ~10 7  Every ray.
High-Quality Parallel Depth-of- Field Using Line Samples Stanley Tzeng, Anjul Patney, Andrew Davidson, Mohamed S. Ebeida, Scott A. Mitchell, John D. Owens.
GCAFE 28 Feb Real-time REYES Jeremy Sugerman.
Visibility Culling using Hierarchical Occlusion Maps Hansong Zhang, Dinesh Manocha, Tom Hudson, Kenneth E. Hoff III Presented by: Chris Wassenius.
Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis.
Paper Presentation - Micropolygon Ray Tracing With Defocus and Motion Blur - Qiming Hou, Hao Qin, Wenyao Li, Baining Guo, Kun Zhou Presenter : Jong Hyeob.
Hardware-Accelerated Adaptive EWA Volume Splatting Wei Chen ZJU Liu Ren CMU Matthias Zwicker MIT Hanspeter Pfister MERL.
1Notes  Assignment 1 is out, due October 12  Inverse Kinematics  Evaluating Catmull-Rom splines for motion curves  Wednesday: may be late (will get.
Real-Time Reyes-Style Adaptive Surface Subdivision
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
1Notes. 2Atop  The simplest (useful) and most common form of compositing: put one image “atop” another  Image 1 (RGB) on top of image 2 (RGB)  For.
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
IN4151 Introduction 3D graphics 1 Introduction to 3D computer graphics part 2 Viewing pipeline Multi-processor implementation GPU architecture GPU algorithms.
Chapter 8 Operating System Support
Introduction to Parallel Rendering: Sorting, Chromium, and MPI Mengxia Zhu Spring 2006.
Hardware-Based Nonlinear Filtering and Segmentation using High-Level Shading Languages I. Viola, A. Kanitsar, M. E. Gröller Institute of Computer Graphics.
Computer Organization and Architecture
Anjul Patney University of California, Davis Real-Time Reyes Programmable Pipelines and Research Challenges.
Hardware-Assisted Visibility Sorting for Tetrahedral Volume Rendering Steven Callahan Milan Ikits João Comba Cláudio Silva Steven Callahan Milan Ikits.
Shading Languages By Markus Kummerer. Markus Kummerer 2 / 19 State of the Art Shading.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
CS364 CH08 Operating System Support TECH Computer Science Operating System Overview Scheduling Memory Management Pentium II and PowerPC Memory Management.
Layers and Views of a Computer System Operating System Services Program creation Program execution Access to I/O devices Controlled access to files System.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
Computer graphics & visualization REYES Render Everything Your Eyes Ever Saw.
Lecture 3 : Direct Volume Rendering Bong-Soo Sohn School of Computer Science and Engineering Chung-Ang University Acknowledgement : Han-Wei Shen Lecture.
Interactive Rendering of Meso-structure Surface Details using Semi-transparent 3D Textures Vision, Modeling, Visualization Erlangen, Germany November 16-18,
Computer Graphics An Introduction. What’s this course all about? 06/10/2015 Lecture 1 2 We will cover… Graphics programming and algorithms Graphics data.
Surface displacement, tessellation, and subdivision Ikrima Elhassan.
Matrices from HELL Paul Taylor Basic Required Matrices PROJECTION WORLD VIEW.
Cosc 2150: Computer Organization Chapter 6, Part 2 Virtual Memory.
OpenGL ES Performance (and Quality) on the GoForce5500 Handheld GPU Lars M. Bishop, NVIDIA Developer Technologies.
3D Graphics for Game Programming Chapter IV Fragment Processing and Output Merging.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.
GRAPHICS PIPELINE & SHADERS SET09115 Intro to Graphics Programming.
Interactive Cinematic Shading Where are we? Fabio Pellacini Dartmouth College.
Accelerated Stereoscopic Rendering using GPU François de Sorbier - Université Paris-Est France February 2008 WSCG'2008.
- Laboratoire d'InfoRmatique en Image et Systèmes d'information
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
Ray Tracing using Programmable Graphics Hardware
A Sorting Classification of Parallel Rendering Molnar et al., 1994.
1cs426-winter-2008 Notes. 2 Atop operation  Image 1 “atop” image 2  Assume independence of sub-pixel structure So for each final pixel, a fraction alpha.
1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.
Lecture 30: Visible Surface Detection
Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.
Applications and Rendering pipeline
Memory Management Chapter 7.
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
- Introduction - Graphics Pipeline
Week 2 - Friday CS361.
Visualization Shading
Patrick Cozzi University of Pennsylvania CIS Fall 2013
Parallel Programming By J. H. Wang May 2, 2017.
Graphics Processing Unit
CS451Real-time Rendering Pipeline
CSCE 441: Computer Graphics Hidden Surface Removal
Introduction to Computer Graphics with WebGL
RADEON™ 9700 Architecture and 3D Performance
02 | What DirectX Can Do and Creating the Main Game Loop
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
Frame Buffers Fall 2018 CS480/680.
Presentation transcript:

Siggraph 2009 RenderAnts: Interactive REYES Rendering on GPUs Kun Zhou Qiming Hou Zhong Ren Minmin Gong Xin Sun Baining Guo JAEHYUN CHO

2 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion

3 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion

4 REYES rendering ● “Renders Everything You Ever Saw” ● In 1980s by Carpenter and Cook ● Photo-realistic images ● Main Idea ● Subdivide every primitive into micropolygons ● In use by Pixar ● PhotoRealisticRenderMan ( PRMan )

5 Basic REYES pipeline Modeling Application primitives unshaded micropolygons Bucketing Bound Too Large? Dice Shade Sample Composite & Filter Split No Yes visible points pixels shaded micropolygons

6 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion

7 System overview ● Map all basic REYES stages to the GPU ● Add 3 dynamic scheduling stages ● Support multi-GPU rendering

8 RenderAnts system pipeline

9 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion

10 Bound/Split and Dice

11 Bound/Split and Dice ● Bound/Split ● All input primitives are stored in a queue ● Primitives in queue are bound and split in parallel ● Dice ● Primitives in dicing region are subdivided into micropolygons in parallel

12 Shade

13 Shade ● Main idea ● Translate RenderMan shader instructions to GPU shader instructions ● Use shader compiler ● Each vertex of micropolygons is shaded

14 Shade ● Out-of-core Texture fetch ● Too large to load on GPU memory at one time ● Use CPU-side cache manager ● If not in cache, interrupt GPU then cache reads from disk and copy to GPU

15 Sample

16 Sample ● Main idea ● Each pixel in sampling region is divided into subpixels ● If micropolygon covers sample location of subpixel, compute and store sample point sample point of left micropolygon sample point of right micropolygon

17 Sample ● Compute sample point ● Interpolate color, opacity and depth values of micropolygon at sample location

18 Composite & Filter

19 Composite & Filter ● Composite ● Sort sample points of each subpixel in depth order ● Composite sample points of each subpixel in depth order until meeting the depth of subpixel in parallel ● Filter ● For each pixel, blend color and opacity of subpixels in parallel

20 Advanced features ● Shadow ● Use shadow maps through shadow pass ● Motion blur & Depth-of-field ● Use accumulation buffer ● Assign unique sample time to each subpixel ● Sample subpixel whose sample time is equal to current rendering time

21 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion

22 Dynamic scheduling ● Main idea ● Maximize parallelism at each stage ● Estimate memory requirements at each stage

23 Dicing scheduler

24 Dicing scheduler ● Main factor of memory requirements ● Total data of micropolygons ● Estimate memory requirements ● Total # of micropolygons computed from total # of primitives

25 Dicing scheduler ● Main idea ● Split current bucket into dicing regions ● Until # of primitives in processing region fits available GPU memory ● Use binary space partitioning ( BSP )

26 How to split dicing region? ● Let # of primitive to fit GPU memory = 2 bucket primitive

27 How to split dicing region? ● Let # of primitive to fit GPU memory = 2 bucket subregion bucket primitive

28 How to split dicing region? ● Let # of primitive to fit GPU memory = 2 bucket subregion bucket subregion primitive

29 Shading scheduler

30 Shading scheduler ● Main factor of memory requirements ● Temporary data allocated during shader execution ● Estimate memory requirements ● Different shaders require different sizes of temporary data

31 Shading scheduler ● Main idea ● Split micropolygon list into sublist ● Until # of micropolygons for current shader execution fits available GPU memory ● Do scheduling per shader execution

32 Sampling scheduler

33 Sampling scheduler ● Main factor of memory requirements ● Total data of subpixel framebuffer and sample points ● Estimate memory requirements ● Framebuffer size equals to region size ● Use line scanning process to estimate # of sample points

34 Sampling scheduler ● Main idea ● Split current dicing region into sampling regions ● Until # of sample points in processing region + region size fits available GPU memory ● Use binary space partitioning ( BSP )

35 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion

36 Multi-GPU rendering ● Main idea ● Minimize inter-GPU communication ● Balance workloads among GPUs

37 How to minimize inter-GPU communication? ● GPU maintains a complete list of all primitives in a bucket ● Only transfer region description

38 How to minimize inter-GPU communication? ● Let A, B, C denote each GPU bucket A

39 How to minimize inter-GPU communication? ● Let A, B, C denote each GPU bucket A subregion bucket BA

40 How to minimize inter-GPU communication? ● Let A, B, C denote each GPU bucket A B C subregion bucket BA A

41 How to balance workloads among GPUs? ● Split region under both conditions ● If # of primitives > threshold ● If idle GPU exists

42 How to balance workloads among GPUs? ● Let threshold = 2 subregion bucket BA primitive

43 How to balance workloads among GPUs? ● Let threshold = 2 subregion bucket BA B C subregion A primitive

44 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion

45 Results

46 Rendering Performance

47 Rendering Time on GPU ● Breakdown of the rendering time on GPU ● Initialization time is relatively short ( Data loading from CPU to GPU )

48 Scaled Performance on GPU

49 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion

50 Conclusions ● Advantages ● Faster than CPU-based Rendering ● Performance scalability ● Disadvantages ● Geometry scalability ● Motion/focal blur ● Improved in [Hou et al 2010]

51 Questions & Answers Q & A

52 Finish! Thank You