A Multiresolution Volume Rendering Framework for Large-Scale Time- Varying Data Visualization Chaoli Wang 1, Jinzhu Gao 2, Liya Li 1, Han-Wei Shen 1 1 The Ohio State University 2 Oak Ridge National Laboratory
Introduction Large-scale numerical simulation –Richtmyer-Meshkov Instability (RMI) LLNL 2,048 * 2,048 * 1,920 grid 960 (8 * 8 * 15) nodes of the IBM-SP system 7.5 GB per time step, output 274 time steps Goal –Data exploration –Quick overview, detail on demand Approach –Multiresolution data representation –Error-controlled parallel rendering
Challenge Compact hierarchical data representation Allow specifying different spatial and temporal resolutions for rendering Long chains of parent-child node dependency Data dependency among processors Balance the workload for parallel rendering
Algorithm Overview The algorithm flow for large-scale time-varying data visualization
Wavelet-Based Time Space Partitioning Tree The WTSP tree –Space-time hierarchical data structure to organize time-varying data –An octree (spatial hierarchy) of binary trees (temporal hierarchy) –Originate from the TSP tree [Shen et al. 1999] –Borrow the idea of the wavelet tree [Guthe et al. 2002]
Wavelet-Based Time Space Partitioning Tree WTSP tree construction –Two-stage block-wise wavelet transform and compression process –Build a spatial hierarchy in the form of an octree for each time step –Merge the same octree nodes across time into binary time trees
Hierarchical Spatial and Temporal Error Metric se ( T ) = Σ i =0..7 MSE ( T, T i ) + MAX { se ( T i )| i =0..7 } te ( T ) = MSE ( T, T l ) + MSE ( T, T r ) + MAX { te ( T l ), te ( T r )} Based on MSE calculation Compare the error of each block with its children
Alleviate data dependency EVERY-K scheme Storing Reconstructed Data for Space-Time Tradeoff h o = 6, h t = 4 k o = 2, k t = 2
WTSP Tree Partition and Data Distribution Eliminate dependency among processors Distribution units h o = 6, h t = 4 k o = 2, k t = 2
WTSP Tree Partition and Data Distribution Space-filling curve traversal –Neighboring blocks of similar spatial-temporal resolution should be evenly distributed to different processors –Space-filling curve preserves locality, always visits neighboring blocks first –Traverse the volume to create a one-dimensional ordering of the blocks
WTSP Tree Partition and Data Distribution Error-guided bucketization –Data blocks with similar spatial and temporal errors should be distributed to different processors –Create buckets with different spatial-temporal error intervals
WTSP Tree Partition and Data Distribution Error-guided bucketization –Bucketize the distribution units when performing hierarchical space-filling curve traversals –Distribute units in each bucket in a round-robin fashion
WTSP tree traversal –User specifies time step and tolerances of both spatial and temporal errors –Traverse octree skeleton and the binary time trees for each encountered octree node –A sequence of data blocks is identified in back-to-front order for rendering Run-Time Rendering
Data block reconstruction –Get low-pass filtered subblock from its parent node –Decode high-pass filtered wavelet coefficients –Perform inverse 3D wavelet transform –Reduce reconstruction time from O ( c 1 h o + c 2 h o h t ) to O ( c 1 k o + c 2 k o k t ), where c 1 = time to perform an inverse 3D wavelet transform c 2 = time to perform an inverse 1D wavelet transform h o = the height of the octree h t = the height of the time tree k o = # of levels in an octree node group k t = # of levels in a time tree node group Run-Time Rendering
Parallel Volume Rendering –Each processor renders the data blocks identified by the WTSP tree traversal and assigned to it during the data distribution stage –Cache reconstructed data for subsequent frames –Screen tiles partition –Image composition
Results data (type)RMI (byte) range (threshold)[0, 255] (0) volume (size)1024 * 1024 * 960 * 32 (30 GB) block (size)64 * 64 * 32 (128 KB) tree depth6 (octree) and 6 (time tree) wavelet transformHaar with lifting (both space and time) Data sets and wavelet transforms data (type)SPOT (float) range (threshold)[0.0, ] (0.005) volume (size)512 * 512 * 256 * 30 (7.5 GB) block (size)32 * 32 * 16 (64 KB) tree depth6 (octree) and 6 (time tree) wavelet transformDaubechies 4 (space) and Haar (time)
Results Testing environment –A PC cluster consisting of GHz Pentium 4 processors connected by Dolphin networks Performance –Software raycasting –96.53% parallel CPU utilization, or a speedup of times for 32 processors
Results Data distribution with EVERY-K scheme ( k o = 2, k t = 2) SPOT data set RMI data set
Results Rendering balance result SPOT data set RMI data set
Results The timing result with output image resolution data setRMISPOT ( se, te, t )(50, 10, 29)(0.05, 0.01, 23) number of blocks6,2184,840 wavelet reconstruction15.637s4.253s software raycasting10.810s2.715s image composition0.118s0.070s overhead3.093s1.719s total time29.658s8.757s difference time2.043s0.241s
Results Rendering of RMI data set at selected time steps 1 st 5368 th th 1,31732 th 1,625
Results Rendering of SPOT data set at selected time steps 1 st 2,55812 th 2,74321 th 2,39230 th 2,461
Results Multiresolution volume rendering RMI data set, 11 th time step SPOT data set, 5 th time step
Conclusion & Future Work Multiresolution volume rendering framework for large-scale time-varying data visualization –Hierarchical WTSP tree data representation –Data partition and distribution scheme –Parallel volume rendering algorithm Future work –Utilize graphics hardware for wavelet reconstruction and rendering speedup –Incorporate optimal feature-preserving wavelet transforms for feature detection
Acknowledgements Funding agencies –NSF ITR grant ACI –NSF Career Award CCF –DOE Early Career Principal Investigator Award DE- FG02-03ER25572 Data sets –Mark LLNL –John NCAR Testing environment –Jack Dongarra and Clay UTK –Don Stredney and Dennis OSC