Lossless Compression of Meteorological Data in GRIB Format R. Lorentz Fraunhofer Institute for Scientific Computation and Algorithms (SCAI) Germany
Seite 2 Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo Data Compression What is it? What is it good for in the context of WIS? a) Reducing archive size b) Speeding up data transfer Who needs it? a) Archive of size above ~ 100 Terabyte b) Frequent transfer of large blocks of data ~ 1 GB
Seite 3 Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo Data Compression Some numbers Lossless Data compression: e.g., Zip programs: compression factors for text: 2 – 3 for simulation data: 1 – 1.2 Lossy data compression e.g., Jpeg, Mpeg: compression factors for pictures: ~ 10 – 100 (not suitable for floating point numbers)
Seite 4 Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo Data Compression Disadvantages of data compression 1.Costs resources compression and decompression take time, say 20 MB/sec for a 3 Ghz Linux PC 2. Software must be integrated into the production run
Seite 5 Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo Data Compression Example Compression of meteorological data for the German Weather Service 1.This is lossless compression of LME data in GRIB1 format compression factor ~ 2,5 archive size: 3.5 Petabyte 2 Data on rectangular grids 3. Compression factor is most important
Seite 6 Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo Data Compression Data Formats Meteorological GRIB 1, 2: has built-in compression BUFR: has compression option General purpose HDF5 Netcdf
Seite 7 Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo Data Compression GRIB1 Grid Types 1.Function values, rectangular grid 2.Function values, global triangular grid (GRIB2) 3.Function values, global Gaussian grid (topologically equivalent to a rectangular grid) 4.Function values, thinned Gaussian grid (global) 5.Spectral coefficients, both simple and complex packing
Seite 8 Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo Data Compression How does lossless compression work? For grid data: Neighboring grid points have similar values => store only the differences Heuristic conclusions: the higher the grid resolution, the better the compression the smoother the functions (observables) the better the compression
Seite 9 Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo Data Compression Some Numbers Computed with GRIBZip, a commercial program developed at SCAI Average compression factors over all GRIB files of a forecast. Rectangular grids, function values (LME model), resolution 7 km: 2D: K = 2.65 Rectangular grids, LMK model: resolution 2.8 km 2D: K = 2.75 Global triangular grids (GME model), resolution 40 km 2D: K = 2.38
Seite 10 Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo Data Compression More Numbers Calculated with experimental programs (work in progress): Gaussian grids, resolution 63 km (DKRZ, Max Planck Institute), K = 3.1 Thinned Gaussian grids, resolution 39 km (ECMWF), K = 2.34 Spectral data, simple packing, highest frequency 213 (DKRZ), K = 1.99 Spectral data, complex packing (ECMWF), not possible??
Seite 11 Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo Data Compression 3D Compression Compressing several layers of data together Typical examples Local grid: LME data For 2D: K = 2.7 For 3d: K = 3.17 Global grid: GME data For 2D: K = 1.97 For 3d: K = 2.59 X One GRIB record Several GRIB records
Seite 12 Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo Data Compression 3D Compression Comments: 3D data (in GRIB format) is relatively hard to compress => 3D compression is particularly effective Improvement of the compression factor by 0.5 to 1.0 Harder to implement Is it worth it? => depends on the proportion of 3D data.
Seite 13 Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo Data Compression My Message Compression is possible and saves resources 1.For archiving 2.When transferring data Work initiated as a research cooperation between the DWD (German Weather Service) and SCAI. Work done together with R. Iza-Teran, M. Rettenmeier.