Download presentation
Presentation is loading. Please wait.
Published byDorcas Pierce Modified over 9 years ago
1
pFPC: A Parallel Compressor for Floating-Point Data Martin Burtscher 1 and Paruj Ratanaworabhan 2 1 The University of Texas at Austin 2 Cornell University
2
pFPC: A Parallel Compressor for Floating-Point DataMarch 2009 Introduction Scientific programs Often produce and transfer lots of floating-point data (e.g., program output, checkpoints, messages) Large amounts of data Are expensive and slow to transfer and store FPC algorithm for IEEE 754 double-precision data Compresses linear streams of FP values fast and well Single-pass operation and lossless compression
3
Introduction (cont.) Large-scale high-performance computers Consist of many networked compute nodes Compute nodes have multiple CPUs but only one link Want to speed up data transfer Need real-time compression to match link throughput pFPC: a parallel version of the FPC algorithm Exceeds 10 Gb/s on four Xeon processors pFPC: A Parallel Compressor for Floating-Point DataMarch 2009
4
pFPC: A Parallel Compressor for Floating-Point DataMarch 2009 Sequential FPC Algorithm [DCC’07] Make two predictions Select closer value XOR with true value Count leading zero bytes Encode value Update predictors
5
pFPC: Parallel FPC Algorithm pFPC operation Divide data stream into chunks Logically assign chunks round-robin to threads Each thread compresses its data with FPC Key parameters Chunk size & number of threads pFPC: A Parallel Compressor for Floating-Point DataMarch 2009
6
pFPC: A Parallel Compressor for Floating-Point DataMarch 2009 Evaluation Method Systems 3.0 GHz Xeon with 4 processors Others in paper Datasets Linear streams of real-world data (18 – 277 MB) 3 observations: error, info, spitzer 3 simulations: brain, comet, plasma 3 messages: bt, sp, sweep3d
7
Compression Ratio vs. Thread Count Configuration Small predictor Chunk size = 1 Compression ratio Low (FP data) Other algos worse Fluctuations Due to multi- dimensional data pFPC: A Parallel Compressor for Floating-Point DataMarch 2009
8
Compression Ratio vs. Chunk Size Configuration Small predictor 1 to 4 threads Compression ratio Flat for 1 thread Steep initial drop Chunk size Larger is better for history-based pred. pFPC: A Parallel Compressor for Floating-Point DataMarch 2009
9
Throughput on Xeon System Throughput increases with chunk size Loop overhead, false sharing, TLB performance Throughput scales with thread count Limited by load balance and memory bandwidth pFPC: A Parallel Compressor for Floating-Point DataMarch 2009 CompressionDecompression
10
Summary pFPC algorithm Chunks up data and logically assigns chunks in round-robin fashion to threads Reaches 10.9 and 13.6 Gb/s throughput with a compression ratio of 1.18 on a 4-core 3 GHz Xeon Portable C source code is available on-line http://users.ices.utexas.edu/~burtscher/research/pFPC/ pFPC: A Parallel Compressor for Floating-Point DataMarch 2009
11
Conclusions For best compression ratio, thread count should equal to or be small multiple of data’s dimension Chunk size should be one For highest throughput, chunk size should at least match system’s page size (and be page aligned) Larger chunks also yield higher compression ratios with history-based predictors Parallel scaling is limited by memory bandwidth Future work should focus on improving compression ratio without increasing the memory bandwidth pFPC: A Parallel Compressor for Floating-Point DataMarch 2009
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.