Download presentation
Presentation is loading. Please wait.
Published byCynthia Ross Modified over 8 years ago
1
Siggraph 2009 RenderAnts: Interactive REYES Rendering on GPUs Kun Zhou Qiming Hou Zhong Ren Minmin Gong Xin Sun Baining Guo JAEHYUN CHO
2
2 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion
3
3 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion
4
4 REYES rendering ● “Renders Everything You Ever Saw” ● In 1980s by Carpenter and Cook ● Photo-realistic images ● Main Idea ● Subdivide every primitive into micropolygons ● In use by Pixar ● PhotoRealisticRenderMan ( PRMan )
5
5 Basic REYES pipeline Modeling Application primitives unshaded micropolygons Bucketing Bound Too Large? Dice Shade Sample Composite & Filter Split No Yes visible points pixels shaded micropolygons
6
6 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion
7
7 System overview ● Map all basic REYES stages to the GPU ● Add 3 dynamic scheduling stages ● Support multi-GPU rendering
8
8 RenderAnts system pipeline
9
9 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion
10
10 Bound/Split and Dice
11
11 Bound/Split and Dice ● Bound/Split ● All input primitives are stored in a queue ● Primitives in queue are bound and split in parallel ● Dice ● Primitives in dicing region are subdivided into micropolygons in parallel
12
12 Shade
13
13 Shade ● Main idea ● Translate RenderMan shader instructions to GPU shader instructions ● Use shader compiler ● Each vertex of micropolygons is shaded
14
14 Shade ● Out-of-core Texture fetch ● Too large to load on GPU memory at one time ● Use CPU-side cache manager ● If not in cache, interrupt GPU then cache reads from disk and copy to GPU
15
15 Sample
16
16 Sample ● Main idea ● Each pixel in sampling region is divided into subpixels ● If micropolygon covers sample location of subpixel, compute and store sample point sample point of left micropolygon sample point of right micropolygon
17
17 Sample ● Compute sample point ● Interpolate color, opacity and depth values of micropolygon at sample location
18
18 Composite & Filter
19
19 Composite & Filter ● Composite ● Sort sample points of each subpixel in depth order ● Composite sample points of each subpixel in depth order until meeting the depth of subpixel in parallel ● Filter ● For each pixel, blend color and opacity of subpixels in parallel
20
20 Advanced features ● Shadow ● Use shadow maps through shadow pass ● Motion blur & Depth-of-field ● Use accumulation buffer ● Assign unique sample time to each subpixel ● Sample subpixel whose sample time is equal to current rendering time
21
21 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion
22
22 Dynamic scheduling ● Main idea ● Maximize parallelism at each stage ● Estimate memory requirements at each stage
23
23 Dicing scheduler
24
24 Dicing scheduler ● Main factor of memory requirements ● Total data of micropolygons ● Estimate memory requirements ● Total # of micropolygons computed from total # of primitives
25
25 Dicing scheduler ● Main idea ● Split current bucket into dicing regions ● Until # of primitives in processing region fits available GPU memory ● Use binary space partitioning ( BSP )
26
26 How to split dicing region? ● Let # of primitive to fit GPU memory = 2 bucket primitive
27
27 How to split dicing region? ● Let # of primitive to fit GPU memory = 2 bucket subregion bucket primitive
28
28 How to split dicing region? ● Let # of primitive to fit GPU memory = 2 bucket subregion bucket subregion primitive
29
29 Shading scheduler
30
30 Shading scheduler ● Main factor of memory requirements ● Temporary data allocated during shader execution ● Estimate memory requirements ● Different shaders require different sizes of temporary data
31
31 Shading scheduler ● Main idea ● Split micropolygon list into sublist ● Until # of micropolygons for current shader execution fits available GPU memory ● Do scheduling per shader execution
32
32 Sampling scheduler
33
33 Sampling scheduler ● Main factor of memory requirements ● Total data of subpixel framebuffer and sample points ● Estimate memory requirements ● Framebuffer size equals to region size ● Use line scanning process to estimate # of sample points
34
34 Sampling scheduler ● Main idea ● Split current dicing region into sampling regions ● Until # of sample points in processing region + region size fits available GPU memory ● Use binary space partitioning ( BSP )
35
35 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion
36
36 Multi-GPU rendering ● Main idea ● Minimize inter-GPU communication ● Balance workloads among GPUs
37
37 How to minimize inter-GPU communication? ● GPU maintains a complete list of all primitives in a bucket ● Only transfer region description
38
38 How to minimize inter-GPU communication? ● Let A, B, C denote each GPU bucket A
39
39 How to minimize inter-GPU communication? ● Let A, B, C denote each GPU bucket A subregion bucket BA
40
40 How to minimize inter-GPU communication? ● Let A, B, C denote each GPU bucket A B C subregion bucket BA A
41
41 How to balance workloads among GPUs? ● Split region under both conditions ● If # of primitives > threshold ● If idle GPU exists
42
42 How to balance workloads among GPUs? ● Let threshold = 2 subregion bucket BA primitive
43
43 How to balance workloads among GPUs? ● Let threshold = 2 subregion bucket BA B C subregion A primitive
44
44 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion
45
45 Results
46
46 Rendering Performance
47
47 Rendering Time on GPU ● Breakdown of the rendering time on GPU ● Initialization time is relatively short ( Data loading from CPU to GPU )
48
48 Scaled Performance on GPU
49
49 Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion
50
50 Conclusions ● Advantages ● Faster than CPU-based Rendering ● Performance scalability ● Disadvantages ● Geometry scalability ● Motion/focal blur ● Improved in [Hou et al 2010]
51
51 Questions & Answers Q & A
52
52 Finish! Thank You
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.