PFPC: A Parallel Compressor for Floating-Point Data Martin Burtscher 1 and Paruj Ratanaworabhan 2 1 The University of Texas at Austin 2 Cornell University.

pFPC: A Parallel Compressor for Floating-Point Data Martin Burtscher 1 and Paruj Ratanaworabhan 2 1 The University of Texas at Austin 2 Cornell University

pFPC: A Parallel Compressor for Floating-Point DataMarch 2009 Introduction  Scientific programs  Often produce and transfer lots of floating-point data (e.g., program output, checkpoints, messages)  Large amounts of data  Are expensive and slow to transfer and store  FPC algorithm for IEEE 754 double-precision data  Compresses linear streams of FP values fast and well  Single-pass operation and lossless compression

Introduction (cont.)  Large-scale high-performance computers  Consist of many networked compute nodes  Compute nodes have multiple CPUs but only one link  Want to speed up data transfer  Need real-time compression to match link throughput  pFPC: a parallel version of the FPC algorithm  Exceeds 10 Gb/s on four Xeon processors pFPC: A Parallel Compressor for Floating-Point DataMarch 2009

pFPC: A Parallel Compressor for Floating-Point DataMarch 2009 Sequential FPC Algorithm [DCC’07]  Make two predictions  Select closer value  XOR with true value  Count leading zero bytes  Encode value  Update predictors

pFPC: Parallel FPC Algorithm  pFPC operation  Divide data stream into chunks  Logically assign chunks round-robin to threads  Each thread compresses its data with FPC  Key parameters  Chunk size & number of threads pFPC: A Parallel Compressor for Floating-Point DataMarch 2009

pFPC: A Parallel Compressor for Floating-Point DataMarch 2009 Evaluation Method  Systems  3.0 GHz Xeon with 4 processors  Others in paper  Datasets  Linear streams of real-world data (18 – 277 MB)  3 observations: error, info, spitzer  3 simulations: brain, comet, plasma  3 messages: bt, sp, sweep3d

Compression Ratio vs. Thread Count  Configuration  Small predictor  Chunk size = 1  Compression ratio  Low (FP data)  Other algos worse  Fluctuations  Due to multi- dimensional data pFPC: A Parallel Compressor for Floating-Point DataMarch 2009

Compression Ratio vs. Chunk Size  Configuration  Small predictor  1 to 4 threads  Compression ratio  Flat for 1 thread  Steep initial drop  Chunk size  Larger is better for history-based pred. pFPC: A Parallel Compressor for Floating-Point DataMarch 2009

Throughput on Xeon System  Throughput increases with chunk size  Loop overhead, false sharing, TLB performance  Throughput scales with thread count  Limited by load balance and memory bandwidth pFPC: A Parallel Compressor for Floating-Point DataMarch 2009 CompressionDecompression

Summary  pFPC algorithm  Chunks up data and logically assigns chunks in round-robin fashion to threads  Reaches 10.9 and 13.6 Gb/s throughput with a compression ratio of 1.18 on a 4-core 3 GHz Xeon  Portable C source code is available on-line http://users.ices.utexas.edu/~burtscher/research/pFPC/ pFPC: A Parallel Compressor for Floating-Point DataMarch 2009

Conclusions  For best compression ratio, thread count should equal to or be small multiple of data’s dimension  Chunk size should be one  For highest throughput, chunk size should at least match system’s page size (and be page aligned)  Larger chunks also yield higher compression ratios with history-based predictors  Parallel scaling is limited by memory bandwidth  Future work should focus on improving compression ratio without increasing the memory bandwidth pFPC: A Parallel Compressor for Floating-Point DataMarch 2009

PFPC: A Parallel Compressor for Floating-Point Data Martin Burtscher 1 and Paruj Ratanaworabhan 2 1 The University of Texas at Austin 2 Cornell University.

Similar presentations

Presentation on theme: "PFPC: A Parallel Compressor for Floating-Point Data Martin Burtscher 1 and Paruj Ratanaworabhan 2 1 The University of Texas at Austin 2 Cornell University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

PFPC: A Parallel Compressor for Floating-Point Data Martin Burtscher 1 and Paruj Ratanaworabhan 2 1 The University of Texas at Austin 2 Cornell University.

Similar presentations

Presentation on theme: "PFPC: A Parallel Compressor for Floating-Point Data Martin Burtscher 1 and Paruj Ratanaworabhan 2 1 The University of Texas at Austin 2 Cornell University."— Presentation transcript:

Similar presentations

About project

Feedback