Siggraph 2009 RenderAnts: Interactive REYES Rendering on GPUs Kun Zhou Qiming Hou Zhong Ren Minmin Gong Xin Sun Baining Guo JAEHYUN CHO
2 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion
3 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion
4 REYES rendering ● “Renders Everything You Ever Saw” ● In 1980s by Carpenter and Cook ● Photo-realistic images ● Main Idea ● Subdivide every primitive into micropolygons ● In use by Pixar ● PhotoRealisticRenderMan ( PRMan )
5 Basic REYES pipeline Modeling Application primitives unshaded micropolygons Bucketing Bound Too Large? Dice Shade Sample Composite & Filter Split No Yes visible points pixels shaded micropolygons
6 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion
7 System overview ● Map all basic REYES stages to the GPU ● Add 3 dynamic scheduling stages ● Support multi-GPU rendering
8 RenderAnts system pipeline
9 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion
10 Bound/Split and Dice
11 Bound/Split and Dice ● Bound/Split ● All input primitives are stored in a queue ● Primitives in queue are bound and split in parallel ● Dice ● Primitives in dicing region are subdivided into micropolygons in parallel
12 Shade
13 Shade ● Main idea ● Translate RenderMan shader instructions to GPU shader instructions ● Use shader compiler ● Each vertex of micropolygons is shaded
14 Shade ● Out-of-core Texture fetch ● Too large to load on GPU memory at one time ● Use CPU-side cache manager ● If not in cache, interrupt GPU then cache reads from disk and copy to GPU
15 Sample
16 Sample ● Main idea ● Each pixel in sampling region is divided into subpixels ● If micropolygon covers sample location of subpixel, compute and store sample point sample point of left micropolygon sample point of right micropolygon
17 Sample ● Compute sample point ● Interpolate color, opacity and depth values of micropolygon at sample location
18 Composite & Filter
19 Composite & Filter ● Composite ● Sort sample points of each subpixel in depth order ● Composite sample points of each subpixel in depth order until meeting the depth of subpixel in parallel ● Filter ● For each pixel, blend color and opacity of subpixels in parallel
20 Advanced features ● Shadow ● Use shadow maps through shadow pass ● Motion blur & Depth-of-field ● Use accumulation buffer ● Assign unique sample time to each subpixel ● Sample subpixel whose sample time is equal to current rendering time
21 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion
22 Dynamic scheduling ● Main idea ● Maximize parallelism at each stage ● Estimate memory requirements at each stage
23 Dicing scheduler
24 Dicing scheduler ● Main factor of memory requirements ● Total data of micropolygons ● Estimate memory requirements ● Total # of micropolygons computed from total # of primitives
25 Dicing scheduler ● Main idea ● Split current bucket into dicing regions ● Until # of primitives in processing region fits available GPU memory ● Use binary space partitioning ( BSP )
26 How to split dicing region? ● Let # of primitive to fit GPU memory = 2 bucket primitive
27 How to split dicing region? ● Let # of primitive to fit GPU memory = 2 bucket subregion bucket primitive
28 How to split dicing region? ● Let # of primitive to fit GPU memory = 2 bucket subregion bucket subregion primitive
29 Shading scheduler
30 Shading scheduler ● Main factor of memory requirements ● Temporary data allocated during shader execution ● Estimate memory requirements ● Different shaders require different sizes of temporary data
31 Shading scheduler ● Main idea ● Split micropolygon list into sublist ● Until # of micropolygons for current shader execution fits available GPU memory ● Do scheduling per shader execution
32 Sampling scheduler
33 Sampling scheduler ● Main factor of memory requirements ● Total data of subpixel framebuffer and sample points ● Estimate memory requirements ● Framebuffer size equals to region size ● Use line scanning process to estimate # of sample points
34 Sampling scheduler ● Main idea ● Split current dicing region into sampling regions ● Until # of sample points in processing region + region size fits available GPU memory ● Use binary space partitioning ( BSP )
35 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion
36 Multi-GPU rendering ● Main idea ● Minimize inter-GPU communication ● Balance workloads among GPUs
37 How to minimize inter-GPU communication? ● GPU maintains a complete list of all primitives in a bucket ● Only transfer region description
38 How to minimize inter-GPU communication? ● Let A, B, C denote each GPU bucket A
39 How to minimize inter-GPU communication? ● Let A, B, C denote each GPU bucket A subregion bucket BA
40 How to minimize inter-GPU communication? ● Let A, B, C denote each GPU bucket A B C subregion bucket BA A
41 How to balance workloads among GPUs? ● Split region under both conditions ● If # of primitives > threshold ● If idle GPU exists
42 How to balance workloads among GPUs? ● Let threshold = 2 subregion bucket BA primitive
43 How to balance workloads among GPUs? ● Let threshold = 2 subregion bucket BA B C subregion A primitive
44 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion
45 Results
46 Rendering Performance
47 Rendering Time on GPU ● Breakdown of the rendering time on GPU ● Initialization time is relatively short ( Data loading from CPU to GPU )
48 Scaled Performance on GPU
49 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion
50 Conclusions ● Advantages ● Faster than CPU-based Rendering ● Performance scalability ● Disadvantages ● Geometry scalability ● Motion/focal blur ● Improved in [Hou et al 2010]
51 Questions & Answers Q & A
52 Finish! Thank You