Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia Eurographics Symposium on Rendering th June - Leuven, Belgium
HDR and Tone Mapping Clamped to [0,1] Compressed
Advances in graphics hardware n Physically-based rendering on the GPU (Purcell et al, 2003) n High dynamic range texture mapping (Debevec et al, 2001)
System Overview n Interactive tone mapping system for an OpenGL application tone mapping systemapplication HDR image LDR image Frame buffer Display callback
Interface to the application tmInitialize(); // Initialize the system tmEnable(); // Retarget GL calls n Draw geometry tmCompress(); // Compress output tmDisable(); // Restore app context tone mapping system application
Choosing a tone mapping operator n Photographic Tone Reproduction for High Contrast Images (Reinhard et al, 2002) n Global operator is a simple transfer function scaled luminance 0 1
Choosing a tone mapping operator n Local operator n Digital analog to ‘burning’ and ‘dodging’ local area luminance Center-surround
n Global operator is simple and fast to compute n Only one global computation n We can dynamically choose the number of zones Why use this tone mapping operator?
Variable number of zones: 3 3 Zones
Variable number of zones: 4 3 Zones
Variable number of zones: 5 3 Zones
Variable number of zones: 6 3 Zones
Variable number of zones: 7 3 Zones
Variable number of zones: 8 3 Zones
System block diagram
Implementation n Target architecture n ATI Radeon 9800 (R350) n Data storage n Floating-point off-screen buffers (pbuffers) n Multiple rendering surfaces (GL_AUXi) n Algorithms n ARB fragment and vertex assembly n Generate fragments with image-sized quads n Data representation n Vector vs. scalar organization
Global operator block diagram
Implementation: global operator n Simple luminance transform n Store luminance and log luminance in separate channels HDR image Luminance Log luminance Mipmap reduction LDR image Single buffer luminance log luminance
Implementation: global operator HDR image Luminance Log luminance Mipmap reduction LDR image Single buffer Single rendering surface log luminance channel log average luminance
Implementation: global operator HDR image Luminance Log luminance Mipmap reduction LDR image Single buffer operator shader texture 0 texture 1 texture 2
Local operator block diagram
Implementation: GPU-based convolutions n Transform n-vector product into multiple 4-vector products filter luminance + + …………
Vectorizing the luminance n Output 4 pixels at the same time n Useful for expensive algorithms n Requires a conversion back to scalar form. Stacked domain
n A simple method for luminance vectorization: Vectorizing the luminance R G B A luminance
n A simple method for luminance vectorization: Vectorizing the luminance R G B A luminance
n A simple method for luminance vectorization: Vectorizing the luminance R G B A luminance
n A simple method for luminance vectorization: Vectorizing the luminance R G B A luminance
n A simple method for luminance vectorization: n Preserves spatial locality Vectorizing the luminance R G B A luminance
filter image Example:1 x n inner product stacked image GPU-based convolutions
filter image stacked image GPU-based convolutions Pass 1
filter image stacked image GPU-based convolutions Pass 1Pass 2 +
filter image stacked image GPU-based convolutions Pass 1Pass 2Pass 3 ++
GPU-based convolutions n Compute multiple 4-vector products per pass n Less shader and texture switching stacked image ++ Single render pass
GPU-based convolutions n Compute multiple 4-vector products per pass n Less shader and texture switching stacked image ++ Single render pass
GPU-based convolutions n Compute multiple 4-vector products per pass n Less shader and texture switching stacked image ++ Single render pass
GPU-based convolutions n Compute multiple 4-vector products per pass n Less shader and texture switching stacked image ++ Single render pass
GPU-based convolutions n Compute multiple 4-vector products per pass n Less shader and texture switching stacked image ++ Single render pass
GPU-based convolutions n Advantages : n Handles large kernels n Efficient memory access n No transform back to scalar values 21 x 21 kernel ~ 10 ms 41 x 41 kernel ~ 16 ms 11 x 11 kernel ~ 6 ms 512 X 512 image
System block diagram
Calculating adaptation zones on the GPU luminance 0 Buffer 0Buffer 1 FRONT BACK 1 filtered
Calculating adaptation zones on the GPU luminance 2 Buffer 0Buffer 1 FRONT BACK 1 filtered
Calculating adaptation zones on the GPU luminance 2 Buffer 0 FRONT BACK 3 Buffer 1 filtered
Calculating adaptation zones on the GPU luminance 4 FRONT BACK 3 Buffer 0Buffer 1 filtered
Image size Frames per second 16 bit floats 32 bit floats Performance: global operator
Performance: local operator Number of zones 16 bit floats 32 bit floats Frames per second
Performance comparison: CPU vs. GPU
Results: Accuracy n Comparison with CPU: 512 x 512 image ImageRMS % error Scaled luminance0.022 % Convolution (5 x 5)0.026 % Convolution (49 x 49)0.032 % Final image1.051 %
False-color zone images CPUGPU
Compressed: 2 zonesClamped [0,1] Images generated at ~30Hz
Compressed: 2 zonesClamped [0,1]
Compressed: 2 zonesClamped [0,1] Images generated at ~30Hz
Compressed: 2 zonesClamped [0,1] Images generated at ~30Hz
Compressed: 2 zonesClamped [0,1] Images generated at ~30Hz
Compressed: 2 zonesClamped [0,1] Images generated at ~30Hz
Conclusion and Future Work n Summary n System for interactively compressing HDR output from an OpenGL application n Complex tone mapping operator on the GPU n Future Work n Other tone mapping operators n Further optimizations n Non-invasive implementation