GPU-based Visualization Algorithms Han-Wei Shen Associate Professor Department of Computer Science and Engineering The Ohio State University
A process of converting numerical data into visual images The images should contain useful information to help the scientist to obtain understanding about his/her data Scientific Visualization
Applications Large Scale Time-Dependent Simulations Richtmyer-Meshkov Turbulent Simulation (LLNL) 2048x2048x1920 grid per time step (7.7 GB) Run 27,000 time steps output size > 2 TB LLNL IBM ASCI system
Applications Oak Ridge Terascale Supernova Initiative (TSI) 640x640x640 floats > 1000 time steps Total size > 1 TB NASA’s turbo pump simulation Multi-zones Moving meshes 300+ time steps Total size > 100GB ORNL TSI data NASA turbo pump
Current Research Projects Time-Varying Data Visualization Flow Visualization View Dependent Algorithms Parallel Rendering
Time-Varying Data Visualization Key - Data are huge (~100 TBs) Research: Spatio-Temporal Multiresolution Hierarchy Feature Tracking High Dimensional Rendering
Flow Visualization Key – visualize the dynamics Research Texture synthesis and animation Streamline placements
View Dependent Algorithms Key – Give the user the best view with a minimal effort Research Occlusion culling Automatic view selection
Parallel Rendering Key – have an optimal utilization of computation resources (CPU and storage) Research Large format display Dynamic Load Balancing
Computer Graphics Technology Has advanced at an amazing speed
The Programmable GPU GPU = vertex shader (vertex program) + fragment shader (fragment program, pixel program) Vertex shader replaces per-vertex transform & lighting Fragment shader replaces texture stages Fragment testing after the fragment shader Flexibility to do framebuffer pixel blending vertices primitives Transform And Lighting Clipping Vertex Shader Primitive Assembly And Rasterization Texture Stages Fragment Testing Fragment Shader
GPU-based Wavelet Reconstruction Wavelets are useful for multiresolution analysis and compression of 3D volumetric datasets. Previous 3D wavelet solutions are mostly implemented by convolution operators or by software. Our work reconstructs 3D wavelets using the GPUs.
Wavelet Theory Wavelets are defined on basis functions that filter a set of original values (A values) into low-frequency coefficients (L values) and high-frequency coefficients (H values). L values are also known as averages, and H values as details. A0 A1 A2 A3 A4 A5... H0 H1 H2... L0 L1 L2...
2D Wavelet Transform For two- or three-dimensional data, wavelets are applied successively on each dimension, which creates 4 or 8 coefficient bricks respectively 2d x 2d HL X transform 2 x (2d x d) HH LH HL LL Y transform 4 x (d x d)
3D Wavelet Transform A volume of (2d) 3 voxels will be transformed into 8 of d 3 bricks of coefficients x y z HL X transform HH LH HL LL Y transform Z transform HHHHHL LHHLHL HLHHLL LLHLLL
3D Wavelet Reconstruction Reconstruct the original volume of (2d) 3 from the 8 d 3 bricks of coefficients HHHHHL LHHLHL HLHHLL LLHLLL Z reconstruction x y z HL X reconstruction HH LH HL LL Y reconstruction
3D Wavelet Reconstruction A straightforward implementation of 3D wavelet reconstructions involves a large number of texture copying Render-to-texture feature is not available for 3D textures More efficient algorithm is needed to take advantage of the GPUs
Tileboards Tileboard: flatten a 3D brick into 2d tiles LLL = x y z
Tileboards Tileboard: flatten a 3D brick into 2d tiles Merge HLL, HLH, HHL, HHH into a RGBA texture LLL = x y z HHHHHL LHHLHL HLHHLL LLHLLL
HHH HHL HLH Tileboards Tileboard: flatten a 3D brick into 2d tiles Merge HLL, HLH, HHL, HHH into a 2D RGBA texture LLL = x y z HHHHHL LHHLHL HLHHLL LLHLLL HLL
LHH LHL LLH Tileboards Tileboard: flatten a 3D brick into 2d tiles Merge LLL, LLH, LHL, LHH into a single 2D RGBA texture LLL = x y z HHHHHL LHHLHL HLHHLL LLHLLL
H- and L-Tileboard Pack the 8 coefficient bricks into H- and L- Tileboards
Reconstruction The use of tileboards allows us to retrieve 4 coefficients at a single texture lookup H-Tileboard L-Tileboard (2 2D RGBA textures) Evaluating wavelet reconstruction formula for each fragment Proxy polygon 2d of 2d x 2d tiles In pbuffer
Reconstruction Details Z reconstruction: combine HHH and LHH, HHL and LHL, HLH and LLH, HLL and LLL Z reconstruction HHHHHL LHHLHL HLHHLL LLHLLL
Reconstruction Details Z reconstruction: combine HHH and LHH, HHL and LHL, HLH and LLH, HLL and LLL Z reconstruction HHHHHL LHHLHL HLHHLL LLHLLL H Tileboard L Tileboard R G B A
Reconstruction Details Z reconstruction: combine RGBA from H- and L- Tileboard (z reconstruction – H** and L**) Harr wavelets: O RGBA = (H RGBA + L RGBA)/ sqrt(2) (even z) O RGBA = (H RGBA - L RGBA)/ sqrt(2) (odd z) +
Reconstruction Details Y reconstruction: combine HH and LH, HL and LL HH LH HL LL Y reconstruction HHH LHH HHL LHL HLH LLH HLL LLL
Reconstruction Details Y reconstruction: combine HH and LH, HL and LL HH + LH = A + G HL + LL = R + B HH LH HL LL
Reconstruction Details X reconstruction: combine H and L HL x y z
Reconstruction Details Z reconstruction O RGBA = (H RGBA + L RGBA)/sqrt(2) (even z) O RGBA = (H RGBA - L RGBA)/sqrt(2) (odd z) Y reconstruction O H = O A + O G O L = O R + O B X reconstruction O = O H + O L +
Reconstruction Details Z reconstruction O RGBA = (H RGBA + L RGBA)/sqrt(2) (even z) O RGBA = (H RGBA - L RGBA)/sqrt(2) (odd z) Y reconstruction O H = O A + O G O L = O R + O B X reconstruction O = O H + O L Single Fragment Pass +
Pseudocode float4 haar( float2 c : TEX0, // Coords in output tileboard space uniform samplerRECT LTileboard,// L-Tileboard uniform samplerRECT HTileboard) : COLOR// H-Tileboard { float3 d = CoordsTile2Dto3D(c);// Coords in 3D brick space float2 e = Coords3DtoTile2D(d / 2);// Coords in L- and H-tileboard space float4 L = texRECT(LTileboard, e); // Fetch (LLL, LLH, LHL, LHH) float4 H = texRECT(HTileboard, e);// Fetch (HLL, HLH, HHL, HHH) float4 RZ = L + H * ChooseSign(d.z); // Reconstruct in Z float2 RY = RZ.rg + RZ.ba * ChooseSign(d.y); // Reconstruct in Y float RX = RY.r + RY.g * ChooseSign(d.x); // Reconstruct in X return Color(RX); // return A value } float ChooseSign(float x) { return 1 – 2 * fmod(x, 2); } // 1 or -1
Rendering The goal is NOT to read out the reconstructed data from the pbuffer 3D volume rendering is performed using the reconstructed tileboard directly Reconstructed Tileboard 3D volume slicing and rendering Final image
Results Both Harr and Daubechies wavelets were implemented Experiments were done on 3.0 GHz Xeon processor with nVidia Quadro FX 3400 card
CPU v.s. GPU CPUGPUSpeedup Harr Daubechies Visible woman data set:480^3 Brick size: 64x64x64 (in seconds)
Brick Sizes v.s. Reconstruction Time Brick sizeHarrDaubechies 128^ ^ ^ (in msec) Time includes uploading and reconstruction
Drop coefficient bricks Coefficients can be dropped to trade quality for speed # of coefficient bricks HarrDaubechies Reconstruction time for the visible woman data using different numbers of coefficient bricks (in seconds)
Drop coefficient bricks Dropping bricks affects image quality, which is more severe with Haar than with Daubechies wavelets. Harr Daubechies
Multiresolution Rendering Multiresolution can be achieved by feeding the reconstructed tileboard to the next resolution level.
Conclusions We have devised an algorithm that can successfully utilize GPUs to reconstruct 3D wavelet coefficients. We have also embedded our implementation in multiresolution data hierarchies.
Ongoing Efforts Encode and reconstruct of time-varying data Parallel algorithms for visualizing large scale data