1 N-bit and ScaleOffset filters MuQun Yang National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Urbana, IL 61801.

Slides:



Advertisements
Similar presentations
Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.
Advertisements

The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF-Java Products Peter Cao The HDF Group The 13 th HDF and HDF-EOS Workshop.
The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF5 Advanced Topics Elena Pourmal The HDF Group The 13 th HDF and HDF-EOS.
HDF5 OPeNDAP Project Update and Demo MuQun Yang and Hyo-Kyung Lee (The HDF Group) James Gallagher (OPeNDAP, Inc.)
1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.
Chapter 9 Formatted Input/Output Acknowledgment The notes are adapted from those provided by Deitel & Associates, Inc. and Pearson Education Inc.
1 Error Analysis Part 1 The Basics. 2 Key Concepts Analytical vs. numerical Methods Representation of floating-point numbers Concept of significant digits.
Floating Point Numbers
DM_PPT_NP_v01 SESIP_0715_JP Indexing HDF5: A Survey Joel Plutchak The HDF Group Champaign Illinois USA This work was supported by NASA/GSFC under Raytheon.
The HDF Group July 8, 2014HDF 2014 ESIP Summer Meeting HDF Product Designer Aleksandar Jelenak, H. Joe Lee, Ted Habermann The.
11 Chapter 3 DECISION STRUCTURES CONT’D. 22 FORMATTING FLOATING-POINT VALUES WITH THE DecimalFormat CLASS We can use the DecimalFormat class to control.
Information Representation (Level ISA3) Floating point numbers.
HDF 1 HDF5 Advanced Topics Object’s Properties Storage Methods and Filters Datatypes HDF and HDF-EOS Workshop VIII October 26, 2004.
HDF Windows Support MuQun Yang, Xuan Bai, Elena Pourmal, Barbara Jones, Pedro Vincent, Robert E. McGrath National Center for Supercomputing Applications.
Numeric precision in SAS. Two aspects of numeric data in SAS The first is how numeric data are stored (how a number is represented in the computer). –
1 Lecture 5 Floating Point Numbers ITEC 1000 “Introduction to Information Technology”
Fixed-Point Arithmetics: Part II
NASA EOS DATA COMPRESSION WITH HDF5 SCALEOFFSET FILTER This work was funded by the NASA Earth Science Technology Office under NASA award AIST and.
Sep , 2010HDF/HDF-EOS Workshop XIV1 HDF5 Advanced Topics Neil Fortner The HDF Group The 14 th HDF and HDF-EOS Workshop September 28-30, 2010.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
1 Introduction to HDF5 Data Model, Programming Model and Library APIs HDF and HDF-EOS Workshop VIII October 26, 2004.
Number Systems Spring Semester 2013Programming and Data Structure1.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
HDF Converting between HDF4 and HDF5 MuQun Yang, Robert E. McGrath, Mike Folk National Center for Supercomputing Applications University of Illinois,
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
Matlab Training Session 10: Loading Binary Data Course Website: Training Sessions.htm.
Round-off Errors and Computer Arithmetic. The arithmetic performed by a calculator or computer is different from the arithmetic in algebra and calculus.
HDF Dimension Scales in HDF5 HDF-EOS Workshop IX San Francisco, CA November 30 - December 2, 2005 Pedro Vicente Nunes THG/NCSA Champaign-Urbana, IL HDF.
Support for NPP/NPOESS by The HDF Group Mike Folk The HDF Group HDF and HDF-EOS Workshop XII October 17, 2008 Oct HDF and HDF-EOS Workshop XII1.
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct HDF and.
October 15, 2008HDF and HDF-EOS Workshop XII1 What will be new in HDF5?
HDF5 OPeNDAP Project Update and Demo MuQun Yang and Hyo-Kyung Lee (The HDF Group) James Gallagher (OPeNDAP, Inc.) 1HDF and HDF-EOS Workshop XII10/17/2008.
Update on HDF5 1.8 The HDF Group HDF and HDF-EOS Workshop X November 28, 2006HDF.
The HDF Group HDF/HDF-EOS Workshop XV1 Tools to Improve the Usability of NASA HDF Data Kent Yang and Joe Lee The HDF Group April 17, 2012.
Pointers: Basics. 2 What is a pointer? First of all, it is a variable, just like other variables you studied  So it has type, storage etc. Difference:
The HDF Group Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal, Peter Cao The HDF Group November 5, 2009 November 3-5,
The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF5 Advanced Topics Elena Pourmal The HDF Group The 13 th HDF and HDF-EOS.
1 Error Handling Interface HDF-EOS Workshop IX Quincey Koziol and Ray Lu 30 Nov 2005.
HDF Windows Support MuQun Yang, Xuan Bai, Elena Pourmal, Barbara Jones, Pedro Vincent, Robert E. McGrath National Center for Supercomputing Applications.
HDF5 OPeNDAP Project Update and Demo MuQun Yang and Hyo-Kyung Lee (The HDF Group) James Gallagher (OPeNDAP, Inc.) 1 HDF and HDF-EOS Workshop XII10/17/2008.
HDF5 OPeNDAP Project Update and Demo MuQun Yang and Hyo-Kyung Lee (The HDF Group) James Gallagher (OPeNDAP, Inc.) 1HDF and HDF-EOS Workshop XII, Aurora,
September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 Introduction to HDF5 Command-line Tools.
The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015.
10/16/2012Annual HDF briefing1 HDF OPeNDAP support Kent Yang, Joe Lee, Mike Folk The HDF Group Oct. 16, 2012.
Introduction to Computer Organization & Systems Topics: Types in C: floating point COMP C Part III.
The HDF Group 10/17/15 1 HDF5 vs. Other Binary File Formats Introduction to the HDF5’s most powerful features ICALEPCS 2015.
11/8/2007HDF and HDF-EOS Workshop XI, Landover, MD1 Software to access HDF5 Datasets via OPeNDAP MuQun Yang, Hyo-Kyung Lee The HDF Group.
 Variables are nothing but reserved memory locations to store values. This means that when you create a variable you reserve some space in memory. 
Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal The HDF Group Annual HDF Briefing to ESDIS March 31, 2009 March Annual HDF Briefing.
The HDF Group Introduction to HDF5 Session 7 Datatypes 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Basic Data Types อ. ยืนยง กันทะเนตร คณะเทคโนโลยีสารสนเทศและการสื่อสาร มหาวิทยาลัยพะเยา Chapter 4.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
HDF5 OPeNDAP Project Update and Demo MuQun Yang and Hyo-Kyung Lee (The HDF Group) James Gallagher (OPeNDAP, Inc.) 1HDF and HDF-EOS Workshop XII, Aurora,
CHAPTER 5: Representing Numerical Data
HDF and HDF-EOS Workshop XII
Adding CF Attributes to an HDF5 File
Programming and Data Structure
Number Representation
Parallel HDF5 Introductory Tutorial
Numbers in a Computer Unsigned integers Signed magnitude
HDF5 Metadata and Page Buffering
TMF1414 Introduction to Programming
DATA HANDLING.
Lecture 13 Input/Output Files.
Access HDF5 Datasets via OPeNDAP’s Data Access Protocol (DAP)
Peter Cao The HDF Group November 28, 2006
HDF5 Tools Updates and Discussions
LabVenture! Redesigning to Maximize Transfer Inside/Outside School
This material is based upon work supported by the National Science Foundation under Grant #XXXXXX. Any opinions, findings, and conclusions or recommendations.
Presentation transcript:

1 N-bit and ScaleOffset filters MuQun Yang National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Urbana, IL contact:

2 N-Bit Filter Outline ► Definition ► An usage example ► Limitations

3 N-Bit datatype A user-defined HDF5 datatype that can use less bits than predefined datatype. Illustration of an N-Bit datatype on a little-endian machine base type: integer offset: 4 precision: 16 | byte 3 | byte 2 | byte 1 | byte 0 | |????????|????SPPP|PPPPPPPP|PPPP????| S - sign bit P - significant bit ? - padding bit Without using N-bit filter, HDF5 saves the padding bits of N-bit datatype, no disk space is saved.

4 N-bit filter ► When using N-bit filter for N-bit datatype, all padding bits will be chopped off during compression, and will be stored on disk like: |SPPPPPPP|PPPPPPPP|SPPPPPPP|PPPPPPPP|

5 Enable N-Bit filter ► Create a dataset creation property list ► Set chunking (and specify chunk dimensions) ► Set up use of the N-Bit filter ► Create dataset specifying this property list ► Close property list

6 N-Bit filter: usage example /*Define dataset datatype (N-Bit), and set precision, offset */ datatype = H5Tcopy(H5T_NATIVE_INT); datatype = H5Tcopy(H5T_NATIVE_INT); precision = 17; precision = 17; H5Tset_precision(datatype,precision); H5Tset_precision(datatype,precision); offset = 4; offset = 4; if(H5Tset_offset(datatype,offset); if(H5Tset_offset(datatype,offset); /* Set the dataset creation property list for N-Bit */ chunk_size[0] = CH_NX; chunk_size[0] = CH_NX; chunk_size[1] = CH_NY; chunk_size[1] = CH_NY; properties = H5Pcreate (H5P_DATASET_CREATE); properties = H5Pcreate (H5P_DATASET_CREATE); H5Pset_chunk (properties, 2, chunk_size); H5Pset_chunk (properties, 2, chunk_size); H5Pset_nbit (properties); H5Pset_nbit (properties); /* Create a new dataset with N-Bit datatype */ /* Create a new dataset with N-Bit datatype */ dataset = H5Dcreate (file, DATASET_NAME, datatype, dataspace, properties); dataset = H5Dcreate (file, DATASET_NAME, datatype, dataspace, properties);

7 N-bit filter restrictions ► Only compresses N-Bit datatype or field derived from integer or floating-point ► Assumes padding bits of zero ► No handling if fill value datatype differs from dataset datatype

8 ScaleOffset Filter Outline ► Definition ► How does ScaleOffset filter work ► Why uses ScaleOffset filter ► Usage examples ► Performance with EOS data ► Limitations

9 ScaleOffset filter ► ► ScaleOffset compression performs a scale and/or offset operation on each data value and truncates the resulting value to a minimum number of bits before storing it. The datatype is either integer or floating-point. ► offset in Scale-Offset compression means the minimum value of a set of data values ► If a fill value is defined for the dataset, the filter will ignore it when finding the minimum value

10 How ScaleOffset works An example for Integer Type ► Maximum is 7065; Minimum is 2970 The "span" = Max-Min+1 = 4096 ► ► Case 1: No fill value is defined. Minimum number of bits per element to store = ceiling(log2(span)) = 12 ► ► Case2: Fill value is defined in this array. Minimum number of bits per element to store = ceiling(log2(span+1)) = 13

11 How ScaleOffset works An example for Integer Type (cont.) ► ► Compression: 1. Subtract minimum value from each element 2. Pack all data with minimum number of bits ► ► Decompression: 1. Unpack all data 2. Add minimum value to each element ► ► Outcome: 1. Save about 60% disk space for this case

12 How ScaleOffset works An example for Floating-point Type ► ► D-scaling factor: the number of decimal precision to keep for the filter output ► ► Floating-point data: { , , , } ► ► D-scaling factor: 2 ► ► Preparation for Compression 1. Calculate the minimum value = Subtract the minimum value Modified data: {5.102, 0, 1.086, 6.185} 3. Scale the data by multiplying 10 ^ D-scaling factor = 100 Modified data: {510.2, 0, 108.6, 618.5} 4. Round the data to integer Modified data: {510, 0, 109, 619}

13 How ScaleOffset works An example for Floating-point Type (cont.) ► ► Compression and Decompression: using ScaleOffset for integer ► ► Restoration after decompression 1. Divide each value by 10^ D-scaling factor 2. Add the offset The floating point data { , , , } ► ► Outcome: 1. Lossy compression 2. Compression ratio will depend on D-scaling factor

14 Scale-Offset filter compresses floating-point data Scale-Offset filter compresses floating-point data ► GRiB data packing method ► The Scale-Offset compression of floating-point data is lossy ► Two scaling methods: D-scaling D-scaling E-scaling E-scaling Currently only D-scaling is implemented

15 Why ScaleOffset Filter? ► ► Internal HDF5 filter ► ► Easy to understand   Integer: lossless (by default)   Floating-point: GRIB lossy compression ► ► Easy to control floating compression   D-scaling factor ► ► Easy to estimate the compression ratio

16 H5Pset_scaleoffset API ► plist_id IN: Dataset creation property list identifier ► scale_type IN: Flag indicating compression method  H5Z_SO_FLOAT_DSCALE (0) Floating-point type  H5Z_SO_INT (2) Integer type ► scale_factor IN: Flag indicating compression method  If scale_type is H5Z_SO_FLOAT_DSCALE, decimal scale factor  If scale_type is H5Z_SO_INT, scale_factor denotes minimum-bits, should be a positive integer or H5Z_SO_INT_MINBITS_DEFAULT H5Pset_scaleoffset (hid_t plist_id, H5Z_SO_scale_type_t scale_type, int scale_factor) int scale_factor)

17 Integer example /* Set the fill value of dataset */ /* Set the fill value of dataset */ fill_val = 10000; fill_val = 10000; H5Pset_fill_value(properties, H5T_NATIVE_INT, &fill_val); H5Pset_fill_value(properties, H5T_NATIVE_INT, &fill_val); /* Set parameters for Scale-Offset compression*/ /* Set parameters for Scale-Offset compression*/ H5Pset_scaleoffset (properties, H5Z_SO_INT H5Z_SO_INT_MINBITS_DEFAULT); H5Pset_scaleoffset (properties, H5Z_SO_INT H5Z_SO_INT_MINBITS_DEFAULT); /* Create a new dataset */ /* Create a new dataset */ dataset = H5Dcreate (file, DATASET_NAME, H5T_NATIVE_INT, dataspace, properties); dataset = H5Dcreate (file, DATASET_NAME, H5T_NATIVE_INT, dataspace, properties);

18 Floating-point example fill_val = ; fill_val = ; /* Set the fill value of dataset */ /* Set the fill value of dataset */ H5Pset_fill_value(properties, H5T_NATIVE_FLOAT, H5Pset_fill_value(properties, H5T_NATIVE_FLOAT, &fill_val); &fill_val); /* Set parameters for Scale-Offset compression; /* Set parameters for Scale-Offset compression; use D-scaling method, use D-scaling method, set decimal scale factor to 3 */ set decimal scale factor to 3 */ H5Pset_scaleoffset (properties,H5Z_SO_FLOAT_DSCALE, 3); H5Pset_scaleoffset (properties,H5Z_SO_FLOAT_DSCALE, 3); /* Create a new dataset */ /* Create a new dataset */ dataset = H5Dcreate (file, DATASET_NAME, H5T_NATIVE_FLOAT, dataset = H5Dcreate (file, DATASET_NAME, H5T_NATIVE_FLOAT, dataspace, properties); dataspace, properties);

19

20 Limitations   Compressed floating-point data range is limited by the size of corresponding unsigned integer type.   Long double is not supported.

21 Thank you This work was funded by the NASA Earth Science Technology Office under NASA award AIST and UCAR Subaward No S Other support was provided based upon the Cooperative Agreement with the National Aeronautics and Space Administration (NASA) under NASA grant NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NASA.