Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight
HDR and Tone Mapping Clamped to [0,1]Compressed
Advances in graphics hardware –Physically-based rendering on the GPU (Purcell et al, 2003) –High dynamic range texture mapping (Debevec et al, 2001)
System Overview Interactive tone mapping system for an OpenGL application tone mapping system application LDR image Frame buffer Display callback HDR image
Interface to the application –tmInitialize(); // Initialize the system –tmEnable(); // Retarget GL calls Draw geometry tmCompress(); // Compress output –tmDisable(); // Restore app context tone mapping system application
Choosing a tone mapping operator Photographic Tone Reproduction for High Contrast Images (Reinhard et al, 2002) –Global operator is a simple transfer function scaled luminance 0 1
Choosing a tone mapping operator Local operator –Digital analog to ‘burning’ and ‘dodging’ local area luminance Center-surround
Global operator is simple and fast to compute Only one global computation We can dynamically choose the number of zones Why use this tone mapping operator?
Variable number of zones: 3
Variable number of zones: 4
Variable number of zones: 5
Variable number of zones: 6
Variable number of zones: 7
Variable number of zones: 8
System block diagram
Implementation Target architecture –ATI Radeon 9800 (R350) Data storage –Floating-point off-screen buffers (pbuffers) –Multiple rendering surfaces (GL_AUXi)
Implementation Algorithms –ARB fragment and vertex assembly –Generate fragments with image-sized quads Data representation –Vector vs. scalar organization
Global operator block diagram
Implementation: global operator –Simple luminance transform –Store luminance and log luminance in separate channels HDR image Luminance Log luminance Mipmap reduction LDR image Single pbuffer luminance log luminance
Implementation: global operator Single rendering surface log luminance channel log average luminance HDR image Luminance Log luminance Mipmap reduction LDR image Single pbuffer
Implementation: global operator operator shader texture 0 texture 1 texture 2 HDR image Luminance Log luminance Mipmap reduction LDR image
Local operator block diagram
Implementation: GPU-based convolutions Transform n-vector product into multiple 4-vector products filter luminance + + …………
Vectorizing the luminance –Output 4 pixels at the same time –Useful for expensive algorithms –Requires a conversion back to scalar form. Stacked domain
A simple method for luminance vectorization: Vectorizing the luminance R G B A luminance
A simple method for luminance vectorization: Vectorizing the luminance R G B A luminance
A simple method for luminance vectorization: Vectorizing the luminance R G B A luminance
A simple method for luminance vectorization: Vectorizing the luminance R G B A luminance
A simple method for luminance vectorization: Preserves spatial locality Vectorizing the luminance R G B A luminance
filter image Example:1 x n inner product stacked image GPU-based convolutions
filter image stacked image GPU-based convolutions Pass 1
filter image stacked image GPU-based convolutions Pass 1Pass 2 +
filter image stacked image GPU-based convolutions Pass 1Pass 2Pass 3 ++
GPU-based convolutions Compute multiple 4-vector products per pass –Less shader and texture switching stacked image ++ Single render pass
GPU-based convolutions Compute multiple 4-vector products per pass –Less shader and texture switching stacked image ++ Single render pass
GPU-based convolutions Compute multiple 4-vector products per pass –Less shader and texture switching stacked image ++ Single render pass
GPU-based convolutions Compute multiple 4-vector products per pass –Less shader and texture switching stacked image ++ Single render pass
GPU-based convolutions Compute multiple 4-vector products per pass –Less shader and texture switching stacked image ++ Single render pass
GPU-based convolutions Advantages : –Handles large kernels –Efficient memory access –No transform back to scalar values 21 x 21 kernel ~ 10 ms 41 x 41 kernel ~ 16 ms 11 x 11 kernel ~ 6 ms 512 X 512 image:
System block diagram
Calculating adaptation zones luminance 0 Buffer 0Buffer 1 FRONT BACK 1 filtered
Calculating adaptation zones luminance 2 Buffer 0Buffer 1 FRONT BACK 1 filtered
Calculating adaptation zones luminance 2 Buffer 0 FRONT BACK 3 Buffer 1 filtered
Calculating adaptation zones luminance 4 FRONT BACK 3 Buffer 0Buffer 1 filtered
Image size Frames per second 16 bit floats 32 bit floats Performance: global operator
Performance: local operator Number of zones 16 bit floats 32 bit floats Frames per second
Performance comparison: CPU vs. GPU
Results: Accuracy Comparison with CPU: 512 x 512 image ImageRMS % error Scaled luminance0.022 % Convolution (5 x 5)0.026 % Convolution (49 x 49)0.032 % Final image1.051 %
False-color zone images CPUGPU
Compressed: 2 zonesClamped [0,1] Images generated at ~30Hz
Compressed: 2 zonesClamped [0,1]
Compressed: 2 zonesClamped [0,1] Images generated at ~30Hz
Compressed: 2 zonesClamped [0,1] Images generated at ~30Hz
Compressed: 2 zonesClamped [0,1] Images generated at ~30Hz
Compressed: 2 zonesClamped [0,1] Images generated at ~30Hz
Conclusion and Future Work Summary –System for interactively compressing HDR output from an OpenGL application –Complex tone mapping operator on the GPU Future Work –Other tone mapping operators –Further optimizations –Non-invasive implementation