Download presentation
Presentation is loading. Please wait.
Published byOmar Temple Modified over 10 years ago
1
Floating-Point Data Compression at 75 Gb/s on a GPU Molly A. O’Neil and Martin Burtscher Department of Computer Science
2
Introduction Scientific simulations on HPC clusters Run on interconnected compute nodes Produce and transfer lots of floating-point data Data storage and transfer are expensive and slow Compute nodes have multiple cores but only one link Interconnects are getting faster Lonestar: 40 Gb/s InfiniBand Speeds of up to 100 Gb/s soon Floating-Point Data Compression at 75 Gb/s on a GPU Texas Advanced Computing Center March 2011
3
Introduction (cont.) Compression Reduced storage, faster transfer Only useful when done in real time Saturate network with compressed data Requires compressor tailored to hardware capabilities GFC algorithm for IEEE 754 double-precision data Designed specifically for GPU hardware (CUDA) Provides reasonable compression ratio and operates above throughput of emerging networks Floating-Point Data Compression at 75 Gb/s on a GPU Charles Trevelyan for http://plus.maths.org/ March 2011
4
Lossless Data Compression Dictionary-based (Lempel-Ziv family) [gzip, lzop] Variable-length entropy coders (Huffman, AC) Run-length encoding [fax] Transforms (Burrows-Wheeler) [bzip2] Special-purpose FP compressors [FPC, FSD, PLMI] Prediction and leading-zero suppression None of these offer real-time speeds for state-of-the-art networks Floating-Point Data Compression at 75 Gb/s on a GPUMarch 2011
5
GFC Algorithm GPUs require 1000s of parallel activities, but… compression is a generally serial operation Floating-Point Data Compression at 75 Gb/s on a GPU Divide data into n chunks, processed in parallel Best perf: choose n to match max number of resident warps Each chunk composed of 32-word subchunks One double per warp thread Use previous subchunk to provide prediction values March 2011
6
Dimensionality Many scientific data sets display dimensionality Interleaved coordinates from multiple dimensions Optional dimensionality parameter to GFC Determines index of previous subchunk to use as the prediction Floating-Point Data Compression at 75 Gb/s on a GPUMarch 2011
7
GFC Algorithm (cont.) Floating-Point Data Compression at 75 Gb/s on a GPUMarch 2011
8
GPU Optimizations Low thread divergence (few if statements) Some short enough to be predicated Coalesce memory accesses by packing/unpacking data in shared memory (for CC < 2.0) Very little inter-thread communication and synchronization Prefix sum only Warp-based implementation Floating-Point Data Compression at 75 Gb/s on a GPU gamedsforum.ca March 2011
9
Evaluation Method Systems Two quad-core 2.53 GHz Xeons NVIDIA FX 5800 GPU (CC 1.3) 13 datasets: real-world data (19 – 277 MB) Observational data, simulation results, MPI messages Comparisons Compression ratio vs. 5 compressors in common use Throughput vs. pFPC (fastest known CPU compressor) Floating-Point Data Compression at 75 Gb/s on a GPUMarch 2011
10
Compression Ratio 1.188 (range: 1.01 – 3.53) Low (FP data), but in line with other algos Largely independent of number of chunks When done in real- time, compression at this ratio can greatly speed up MPI apps 3% – 98% speed-up [Ke et al., SC’04] Floating-Point Data Compression at 75 Gb/s on a GPUMarch 2011
11
Throughput C: 75 – 87 Gb/s Mean: 77.9 Gb/s D: 90 – 121 Gb/s Mean: 96.6 Gb/s 4x faster than pFPC on 8 cores (2 CPUs) Improvement over pFPC’s compression ratio vs. performance trend Floating-Point Data Compression at 75 Gb/s on a GPUMarch 2011
12
NEW: Fermi Throughput Fermi improvements: Faster, simpler memory accesses Hardware support for count- leading-zeros op Compression ratio: 1.187 C: 119 – 219 (HM: 167.5 Gb/s) D: 169 – 219 (HM: 180.3 Gb/s) Compresses over 9.5x faster than pFPC on 8 x86 cores Floating-Point Data Compression at 75 Gb/s on a GPUMarch 2011
13
Summary GFC algorithm Chunks up data, each warp processes a chunk iteratively by 32-word subchunks No communication required between warps Minimum 75 Gb/s – 90 Gb/s (encode-decode) throughput on GTX-285, and 119 Gb/s – 169 Gb/s on Fermi, with a compression ratio of 1.19 CUDA source code is freely available at http://www.cs.txstate.edu/~burtscher/research/GFC/ Floating-Point Data Compression at 75 Gb/s on a GPUMarch 2011
14
Conclusions GPU can compress much faster than PCIe bus can transfer the data But… PCIe bus will become faster CPU-GPU increasingly on single die GPU-to-GPU, GPU-to-NIC transfers coming? GFC is the first compressor with the potential to deliver real-time FP data compression for current and emerging network speeds Floating-Point Data Compression at 75 Gb/s on a GPU AMD NVIDIA March 2011
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.