Download presentation
Presentation is loading. Please wait.
Published byMarshall Welch Modified over 9 years ago
1
Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis
2
Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney and John D. Owens SIGGRAPH Asia 2008 (to appear) http://graphics.idav.ucdavis.edu/publications/print_pub?pub_id=952
3
Image courtesy: Pixar
4
Cinematic rendering looks good Image courtesy: Pixar
5
High geometric complexity Image courtesy: Pixar
6
High shading complexity Image courtesy: Pixar
7
Motion blur, Depth-of-field Image courtesy: Pixar
8
Complex lighting Image courtesy: Pixar
9
All this is slow Image courtesy: Pixar
10
Our Goal Make it faster – As much as possible in real time Use the GPU – Massive available parallelism – Fixed-function texture/raster units – High Programmability
11
Enter Reyes Developed in 1980s, offline Pixel-accurate smooth surfaces Eye-space shading Stochastic sampling Order-independent transparency
12
Bound/Split Dice (tessellate) Displace Shade Compose Sample Vertex Shade Geometry Shade Rasterize Fragment Shade Blend Depth-buffer ReyesDirect3D 10
13
Bound/Split Dice (tessellate) Displace Shade Compose Sample High-quality geometry Convenient surface shading Cinematic quality Well-behaved (?) Reyes
14
Bound/Split Dice (tessellate) Displace Shade Compose Sample Reyes Input: Smooth surfaces Output: 0.5 x 0.5 pixel quads (micropolygons)
15
Outline Reyes Subdivision – algorithm Subdivision on GPU – implementation Results Limitations
16
Outline Reyes Subdivision – algorithm Subdivision on GPU – implementation Results Limitations
17
Split-Dice Recursively split a surface till dicing makes sense Uniformly sample it to form a grid of micropolygons
18
Split Bound/Cull Dice Diceable? No: Split Yes: Dice
19
Grid Micropolygon Post-split surface
20
Split-Dice is hard Split – Recursive, serial – Rapid primitive generation/destruction Dice – Huge memory demand
21
Can we do this in parallel?
22
Parallel Split A lot of independent operations Regular computation
23
Can we do this in parallel? A lot of independent operations – Our simplest model 5K primitives 1.2M micropolygons Regular computation – Bound/Cull/Split everything together – Dice everything together
24
Analogy: A Dynamic work queue A A B B C C D D E E F F G G H H I I A A A A B B C C E E E E F F H H Split No Action Cull C C
25
How can we do these efficiently? Creating new primitives – How to dynamically allocate space? Culling unneeded primitives – How to avoid fragmentation?
26
A child primitive is offset by the queue length Our Choice – keep it simple… A A B B C C D D E E F F G G H H I I A A B B C C E E F F H H A A C C E E
27
…and get rid of the holes later A A B B C C E E F F H H A A C C E E A A B B C C E E F F H H A A C C E E Scan-based compact is fast! (Sengupta ‘07) Scan-based compact is fast! (Sengupta ‘07)
28
Storage Issues for Dice Too many micropolygons – Cannot reject early Screen-space buckets (tiles) – In parallel ? – Ideal bucket size ?
29
Outline Reyes Subdivision – algorithm Subdivision on GPU – implementation Results Limitations
30
Platform NVIDIA GeForce 8800 GTX – 16 SIMD Multiprocessors – 16KB shared memory – 768 MB memory (no cache) NVIDIA CUDA 1.1 – Grids/Blocks/Threads – OpenGL interface
31
Implementation Details Input choice: Bicubic Bézier Surfaces – Only affects implementation View Dependent Subdivision every frame – Single CPU-GPU transfer Final micropolygons written to a VBO – Flat-shaded and displayed
32
Kernels Implemented Dice – Regular, symmetric, parallel – 256 threads per patch – Primitive information in shared memory Bound/Split – Hard to ensure efficiency
33
Bound/Split: Efficiency Goals Memory Coherence – Off-chip memory accesses must be efficient Computational Efficiency – Hardware SIMD must be maximally utilized
34
Memory Coherence during Split Compact work-queue after each iteration – Primitives always contiguous in memory Structure-Of-Arrays representation – Primitive attributes adjacent in memory 99.5% of all accesses fully coalesced
35
SIMD utilization during split Intra-Primitive parallelism – Independent control points – Negligible divergence Vectorized Split – 16 Threads per patch 90.16% of all branches SIMD coherent 32 threads (1 warp)
36
Outline Reyes Subdivision – algorithm Subdivision on GPU – implementation Results Limitations
37
Results - Killeroo 14426 grids 5 levels of subdivision Bound/Split: 6.99 ms Dice: 7.21ms 29.69 frames/second (19.92 with 16x AA) Killeroo model courtesy: Headus Inc.
38
Results - Teapot 4823 grids 11 levels of subdivision Bound/Split: 3.46 ms Dice: 2.42 ms 60.07 frames/second (30.02 with 16x AA)
39
Results – Random scenes Subdivision time proportional to number of micropolygons Dice Split Total
40
Render time Breakdown Rendering lots of small triangles Should ideally be zero: CUDA limitation
41
Results - Screen-space buckets Acceptable performance, Small memory footprint Acceptable performance, Small memory footprint
42
Outline Reyes Subdivision – algorithm Subdivision on GPU – implementation Results Limitations
43
Can’t split and dice in parallel Uniform dicing Cracks
44
Limitations – Uniform dicing
45
Limitations - Cracks
47
Conclusions Breadth-first recursive subdivision – Suits GPUs – Works fast Dicing Programmable tessellation is fast – 500M micropolygons/sec It is time to experiment with alternate graphics pipelines
48
Reyes in real-time rendering Visually superior to polygon pipeline Regular, highly parallel workload Extremely well-studied Good candidate for a real-time system?
49
Future Work Cracks, Displacement mapping Rest of Reyes – Offline quality shading – Interactive lighting – Parallel Stochastic Sampling (Wei 2008) – A-buffer (Myers 2007)
50
Thanks to Feedback and suggestions – Per Christensen, Charles Loop, Dave Luebke, Matt Pharr and Daniel Wexler Financial support – DOE Early Career PI Award – National Science Foundation – SciDAC Institute for Ultrascale Visualization Equipment support from NVIDIA
51
Teapot Video Real-time footage
52
Killeroo Video Real-time footage
53
BACKUP SLIDES Real-Time Reyes-Style Adaptive Surface Subdivision
54
CUDA Thread Structure Image courtesy: NVIDIA CUDA Programming Guide, 1.1
55
CUDA Memory Architecture Image courtesy: NVIDIA CUDA Programming Guide, 1.1
56
Displacement Mapping Fairly simple if cracks can be avoided – Displace adjacent grids together
57
Shading & Lighting Interactive preview – Lpics / Lightspeed Shaders – Intelligent textures – File access Shadows
58
Composite/Filter Stuff Stochastic sampling – 20x speedup on a GPU (Wei 2008) A-buffer
59
Special Effects Motion Blur and DOF Global Illumination Ambient Occlusion
60
Bound/Split Dice (tessellate) Displace Shade Compose Sample Vertex Shade Geometry Shade Rasterize Fragment Shade Blend Depth-buffer ReyesDirect3D 10 Geometry
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.