Download presentation
Presentation is loading. Please wait.
1
Wavelet Compression for In Situ Data Reduction
Scientific Achievement Wavelet compression is now available in the VTK-m library. The algorithm was re-thought within the VTK-m framework, and was shown to be efficient on both CPUs and GPUs. Significance and Impact The growing divide between compute and I/O will require data reduction to occur in situ on the next generation of supercomputers. Wavelet compression is an important reduction technique, since it balances between storage costs and integrity. With our efforts, any stakeholder using VTK-m can now apply wavelet compression. Citation S. Li, N. Marsaglia, V. Chen, C. Sewell, J. Clyne, and H. Childs. “Achieving Portable Performance For Wavelet Compression Using Data Parallel Primitives.” In Proceedings of EuroGraphics Symposium on Parallel Graphics and Visualization (EGPGV), pages 73–81, Barcelona, Spain, June 2017. Relative performance of different hash functions on NVIDIA Tesla P100. Performance of wavelet compression (in seconds) for VTK-m and for a CPU-specific implementation (VAPOR) for data sets ranging from 2563 to Overall, our hardware-agnostic approach had comparable performance to hardware-specific approaches. Results with GPU comparisons were similarly even. Scientific visualization algorithms often produce meshes comprising many small cells of data that in aggregate describe a full volume or surface. These many small cell units can allow these algorithms to engage the high degree of parallelism required to make full use of modern accelerators like GPU’s and Xeon Phi’s. However, when these elements are created independently and in parallel, shared features of these elements are duplicated, which leads to redundant data and loss of connectivity information. A known parallel technique to find duplicate entities is to generate identifiers for elements that are the same if and only if the two elements are duplicate. One such identifier is the spatial coordinates although using an index-based approach is generally faster and more robust. In either case, a parallel sort readily finds duplicate indices. However, the sort is slow because the indices are large (192 bits is common). In this work we experiment with using 32-bit hash values, which are not guaranteed to be unique but sort much faster. We resolve hash collisions in a later step. Resolving hash collisions takes time, but much less time than that saved in the sort operation. Smaller hash values also provide the opportunity to repalce the sort with a hash table, which sometimes increases the performance further. Different hash functions can make the performance of the algorithm behave differently, so we experimented with 3 distinct versions of hashing. In our experiments, we found that we had up to 7.8x speed up in the external faces algorithm by using hash functions rather than indices.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.