HDF 1 HDF5 Advanced Topics Object’s Properties Storage Methods and Filters Datatypes HDF and HDF-EOS Workshop VIII October 26, 2004.

Slides:



Advertisements
Similar presentations
The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF5 Advanced Topics Elena Pourmal The HDF Group The 13 th HDF and HDF-EOS.
Advertisements

Chapter 11: File System Implementation
NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.
Ceng Operating Systems
HDF4 and HDF5 Performance Preliminary Results Elena Pourmal IV HDF-EOS Workshop September
File System Implementation
1 Memory Management in Representative Operating Systems.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.
The HDF Group April 17-19, 2012HDF/HDF-EOS Workshop XV1 Introduction to HDF5 Barbara Jones The HDF Group The 15 th HDF and HDF-EOS Workshop.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 8: Main Memory.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
Segmentation & O/S Input/Output Chapter 4 & 5 Tuesday, April 3, 2007.
HDF5 A new file format & software for high performance scientific data management.
Sep , 2010HDF/HDF-EOS Workshop XIV1 HDF5 Advanced Topics Neil Fortner The HDF Group The 14 th HDF and HDF-EOS Workshop September 28-30, 2010.
The HDF Group Parallel HDF5 Design and Programming Model May 30-31, 2012HDF5 Workshop at PSI 1.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
1 Introduction to HDF5 Data Model, Programming Model and Library APIs HDF and HDF-EOS Workshop VIII October 26, 2004.
Chapter 3.5 Memory and I/O Systems. 2 Memory Management Memory problems are one of the leading causes of bugs in programs (60-80%) MUCH worse in languages.
May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II.
October 15, 2008HDF and HDF-EOS Workshop XII1 What will be new in HDF5?
1 N-bit and ScaleOffset filters MuQun Yang National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Urbana, IL
1 HDF5 Life cycle of data Boeing September 19, 2006.
HDF Hierarchical Data Format Nancy Yeager Mike Folk NCSA University of Illinois at Urbana-Champaign, USA
The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF5 Advanced Topics Elena Pourmal The HDF Group The 13 th HDF and HDF-EOS.
Operating Systems Lecture 14 Segments Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of Software Engineering.
September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 Introduction to HDF5 Command-line Tools.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015.
March 9, th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics.
The HDF Group 10/17/15 1 HDF5 vs. Other Binary File Formats Introduction to the HDF5’s most powerful features ICALEPCS 2015.
Intro to Parallel HDF5 10/17/151ICALEPCS /17/152 Outline Overview of Parallel HDF5 design Parallel Environment Requirements Performance Analysis.
April 28, 2008LCI Tutorial1 Parallel HDF5 Tutorial Tutorial Part IV.
May 30-31, 2012 HDF5 Workshop at PSI May The HDF5 Virtual File Layer (VFL) and Virtual File Drivers (VFDs) Dana Robinson The HDF Group Efficient.
Virtual Memory Pranav Shah CS147 - Sin Min Lee. Concept of Virtual Memory Purpose of Virtual Memory - to use hard disk as an extension of RAM. Personal.
The HDF Group 10/17/151 Introduction to HDF5 ICALEPCS 2015.
Memory Management Chapter 5 Advanced Operating System.
C Programming Day 2. 2 Copyright © 2005, Infosys Technologies Ltd ER/CORP/CRS/LA07/003 Version No. 1.0 Union –mechanism to create user defined data types.
File Systems May 12, 2000 Instructor: Gary Kimura.
The HDF Group Introduction to HDF5 Session 7 Datatypes 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
- 1 - Overview of Parallel HDF Overview of Parallel HDF5 and Performance Tuning in HDF5 Library NCSA/University of Illinois at Urbana- Champaign.
The HDF Group Introduction to HDF5 Session ? High Performance I/O 1 Copyright © 2010 The HDF Group. All Rights Reserved.
The HDF Group Introduction to HDF5 Session Three HDF5 Software Overview 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Memory Management.
Chapter 11: File System Implementation
Chapter 12: File System Implementation
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
CS703 - Advanced Operating Systems
HDF5 Metadata and Page Buffering
Introduction to HDF5 Tutorial.
File System Implementation
What NetCDF users should know about HDF5?
HDF and HDF-EOS Workshop XII
Chapter 11: File System Implementation
Operating System Concepts
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
So far… Text RO …. printf() RW link printf Linking, loading
Introduction to HDF5 Mike McGreevy The HDF Group
File-System Structure
Hierarchical Data Format (HDF) Status Update
File System Implementation
Buddy Allocation CS 161: Lecture 5 2/11/19.
COMP755 Advanced Operating Systems
CSE 542: Operating Systems
Presentation transcript:

HDF 1 HDF5 Advanced Topics Object’s Properties Storage Methods and Filters Datatypes HDF and HDF-EOS Workshop VIII October 26, 2004

HDF 2Topics General Introduction to HDF5 properties HDF5 Dataset properties I/O and Storage Properties (filters) HDF5 File properties I/O and Storage Properties (drivers) Datatypes Compound Variable Length Reference to object and dataset region

HDF 3 General Introduction to HDF5 Properties

HDF 4 Properties Definition Mechanism to control different features of the HDF5 objects –Implemented via H5P Interface (‘Property lists’) –HDF5 Library sets objects’ default features –HDF5 ‘Property lists’ modify default features At object creation time (creation properties) At object access time (access or transfer properties)

HDF 5 Properties Definitions A property list is a list of name-value pairs –Values may be of any datatype A property list is passed as an optional parameters to the HDF5 APIs Property lists are used/ignored by all the layers of the library, as needed

HDF 6 Type of Properties Predefined and User defined property lists Predefined: –File creation –File access –Dataset creation –Dataset access Will cover each of these

HDF 7 Properties (Example) HDF5 File H5Fcreate(…,creation_prop_id,…) Creation properties (how file is created?) –Library’s defaults no user’s block predefined sizes of offsets and addresses of the objects in the file (64-bit for DEC Alpha, 32-bit on Windows) –User’s settings User’s block 32-bit sizes on 64-bit platform Control over B-trees for chunking storage (split factor)

HDF 8 Properties (Example) HDF5 File H5Fcreate(…,access_prop_id) Access properties or drivers (How is file accessed? What is the physical layout on the disk?) –Library defaults STDIO Library (UNIX fwrite, fread ) –User’s defined MPI I/O for parallel access Family of files (100 Gb HDF5 represented by 50 2Gb UNIX files) Size of the chunk cache

HDF 9 Properties (Example) HDF5 Dataset H5Dcreate(…,creation_prop_id) Creation properties (how dataset is created) –Library’s defaults Storage: Contiguous Compression: None Space is allocated when data is first written No fill value is written –User’s settings Storage: Compact, or chunked, or external Compression Fill value Control over space allocation in the file for raw data –at creation time –at write time

HDF 10 Properties (Example) HDF5 Dataset H5Dwrite (…,access_prop_id) Access (transfer) properties –Library defaults 1MB conversion buffer Error detection on read (if was set during write) MPI independent I/O for parallel access –User defined MPI collective I/O for parallel access Size of the datatype conversion buffer Control over partial I/O to improve performance

HDF 11 Properties Programming model Use predefined property type –H5P_FILE_CREATE –H5P_FILE_ACCESS –H5P_DATASET_CREATE –H5P_DATASET_ACCESS Create new property instance –H5Pcreate –H5Pcopy –H5*get_access_plist; H5*get_create_plist Modify property (see H5P APIs) Use property to modify object feature Close property when done –H5Pclose

HDF 12 Properties Programming model General model of usage: get plist, set values, pass to library hid_t plist = H5Pcreate(copy)(predefined_plist); OR hid_t plist = H5Xget_create(access)_plist(…); H5Pset_foo( plist, vals); H5Xdo_something( Xid, …, plist); H5Pclose(plist);

HDF 13 HDF5 Dataset Creation Properties and Predefined Filters HDF5 Dataset Creation Properties and Predefined Filters

HDF 14 Dataset Creation Properties Storage –Contiguous (default) –Compact –Chunked –External Filters applied to raw data –Compression –Checksum Fill value Space allocation for raw data in the file

HDF 15 Dataset Creation Properties Storage Layouts Storage layout is important for I/O performance and size of the HDF5 files Contiguous (default) Used when data will be written/read at once H5Dcreate(…,H5P_DEFAULT) Compact Used for small datasets (order of O(bytes)) for better I/O Raw data is written/read at the time when dataset is open File is less fragmented To create a compact dataset follow the ‘Properties programming model’

HDF 16 Creating Compact Dataset Create a dataset creation property list Set property list to use compact storage layout Create dataset with the above property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_layout(plist, H5D_COMPACT); dset_id = H5Dcreate (…, “Compact”,…, plist); H5Pclose(plist);

HDF 17 Creating chunked Dataset Chunked layout is needed for –Extendible datasets –Compression and other filters –To improve partial I/O for big datasets Better subsetting access time; extendible chunked Only two chunks will be written/read

HDF 18 Creating Chunked Dataset Create a dataset creation property list Set property list to use chunked storage layout Create dataset with the above property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_chunk(plist, rank, ch_dims); dset_id = H5Dcreate (…, “Chunked”,…, plist); H5Pclose(plist);

HDF 19 Dataset Creation Properties Compression and other I/O Pipeline Filters HDF5 provides a mechanism (“I/O filters”) to manipulate data while transferring it between memory and disk H5Z and H5P interfaces HDF5 predefined filters (H5P interface) –Compression (gzip, szip) –Shuffling and checksum filters User defined filters (H5Z and H5P interfaces) –Example: Bzip2 compression

HDF 20 Compression and other I/O Pipeline Filters (continued) Currently used only with chunked datasets Filters can be combined together –GZIP + shuffle+checksum filters –Checksum filter + user define encryption filter Filters are called in the order they are defined on writing and in the reverse order on reading User is responsible for “filters pipeline sanity” –GZIP +SZIP + shuffle doesn’t make sense –Shuffle + SZIP does

HDF 21 Creating compressed Dataset Compression –Improves transmission speed –Improves storage efficiency –Requires chunking –May increase CPU time needed for compression Compressed Memory File

HDF 22 Creating compressed datasets Create a dataset creation property list Set chunking (and specify chunk dimensions) Set compression method Create dataset with the above property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_chunk (plist, ndims, chkdims); H5Pset_deflate (plist, level); /*GZIP */ OR H5Pset_szip (plist, options-mask, numpixels);/*SZIP*/ dset_id = H5Dcreate (file_id, “comp-data”, “H5T_NATIVE_FLOAT,space_id, plist);

HDF 23 Creating external Dataset Dataset’s raw data is stored in an external file Easy to include existing data into HDF5 file Easy to export raw data if application needs it Disadvantage: user has to keep track of additional files to preserve integrity of the HDF5 file Metadata for “A” Dataset “A ” HDF5 file External file Raw data for “A ” Raw data can be stored in external file

HDF 24 Creating External Dataset Create a dataset creation property list Set property list to use external storage layout Create dataset with the above property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_external(plist, “raw_data.ext”, offset, size); dset_id = H5Dcreate (…, “Chunked”,…, plist); H5Pclose(plist);

HDF 25 Example of External Files This example shows how a contiguous, one-dimensional dataset is partitioned into three parts and each of those parts is stored in a segment of an external file. plist = H5Pcreate (H5P_DATASET_CREATE); HPset_external (plist, “raw.data”, 3000, 1000); H5Pset_external (plist, “raw.data”, 0, 2500); H5Pset_external (plist, “raw.data”, 4500, 1500);

HDF 26 Checksum Filter HDF5 includes the Fletcher32 checksum algorithm for error detection. It is automatically included in HDF5 To use this filter you must add it to the filter pipeline with H5Pset_filter. Checksum value Memory

HDF 27 Enabling Checksum Filter Create a dataset creation property list Set chunking (and specify chunk dimensions) Add the filter to the pipeline Create your dataset specifying this property list Close property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_chunk (plist, ndims, chkdims); H5Pset_filter (plist, H5Z_FILTER_FLETCHER32, 0, 0, NULL); H5Dcreate (…,”Checksum”,…,plist) H5Pclose(plist);

HDF 28 Shuffling filter Predefined HDF5 filter Not a compression; change of byte order in a stream of data Example – Hexadecimal form –0x01 0x17 0x2B Big-endian machine –0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x17 0x00 0x00 0x00 0x2B Shuffling –0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01 0x17 0x2B

HDF B B B

HDF 30 Enabling Shuffling Filter Create a dataset creation property list Set chunking (and specify chunk dimensions) Add the filter to the pipeline Define compression filter Create your dataset specifying this property list Close property list plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_chunk (plist, ndims, chkdims); H5Pset_shuffle(plist); H5Pset_deflate(plist,level); H5Dcreate (…,”BetterComp”,…,plist) H5Pclose(plist);

HDF 31 Effect of data shuffling (H5Pset_shuffle + H5Pset_deflate) File sizeTotal timeWrite Time No Shuffle102.9MB Shuffle67.34MB Compression combined with shuffling provides Better compression ratio Better I/O performance Write 4-byte integer dataset 256x256x1024 (256MB) Using chunks of 256x16x1024 (16MB) Values: random integers between 0 and 255

HDF 32 HDF5 Dataset Access (Transfer) Properties HDF5 Dataset Access (Transfer) Properties

HDF 33 Dataset Access/Transfer Properties Improve performance H5Pset_buffer –Sets the size of the datatype conversion buffer during I/O –Size should be large enough to hold the slice along the slowest changing dimension –Example: Hyperslab 100x200x300, buffer 200x300 H5Pset_hyper_vector_size –Sets the number of hyperslab offset and length pairs –Improves performance for partial I/O

HDF 34 Dataset Access/Transfer Properties H5Pset_edc_check –For datasets created with error detection filter enabled –Enables error checking during read operation –H5Z_ENABLE_EDC (default) –N5Z_DISABLE_EDC H5Pset_dxpl_mpio –Sets data transfer mode for parallel I/O –H5FD_MPIO_INDEPENDENT (default) –H5FD_MPIO_COLLECTIVE

HDF 35 User-defined Filters

HDF 36 Standard Interface for User-defined Filters H5Zregister : Register filter so that HDF5 knows about it H5Zunregister: Unregister a filter H5Pset_filter: Adds a filter to the filter pipeline H5Pget_filter: Returns information about a filter in the pipeline H5Zfilter_avail: Check if filter is available

HDF 37 File Creation Properties File Creation Properties

HDF 38 File Creation Properties H5Pset_userblock –User block stores user-defined information (e.g ASCII text to describe a file) at the beginning of the file –Cat my.txt hdf5.h5 > myhdf5.h5 –Sets the size of the user block –512 bytes, 1024 bytes, 2^N H5Pset_sizes –Sets the byte size of the offsets and lengths used to address objects in the file H5Pset_sym_k –Controls the rank of groups B-trees for groups –Default is 16 H5Pset_istore_k –Controls the rank of groups B-trees for chunked datasets –Default is 32

HDF 39 File Access Properties File Access Properties

HDF 40 File Access Properties (Performance) H5Pset_cache –Sets metadata cache and raw data chunk parameters –Improper size will degrade performance H5Pset_meta_block_size –Reduces the number of small objects in the file –Block of metadata is written in a single I/O operation (default 2K) –VFL driver has to set H5FD_AGGREGATE_METADATA H5Pset_sieve_buffer –Improves partial I/O –Need a picture VFL layer: file drivers

HDF 41 File Access Properties (Physical storage and Usage of Low-level I/O Libraries) VFL layer: file drivers Define physical storage of the HDF5 file –Memory driver (HDF5 file in the application’s memory) –Stream driver (HDF5 file written to a socket) –Split(multi) files driver –Family driver Define low level I/O library –MPI I/O driver for parallel access –STDIO vs. SEC2

HDF 42 Files needn’t be files - Virtual File Layer VFL: A public API for writing I/O drivers memory mpiostdio Hid_t Files Memory “File” Handle I/O drivers network Network VFL: Virtual File I/O Layer “Storage” split family SRB SRB Repository

HDF 43 Split Files Allows you to split metadata and data into separate files May reside on different file systems for better I/O Disadvantage: User has to keep track of the files Dataset “A” Dataset “B” Data A Data B Metadata file Raw data file HDF5 file

HDF 44 Creating Split Files Create a file access property list Set up file access property list to use split files Create the file with this property list Close the property plist = H5Pcreate (H5P_FILE_ACCESS); H5Pset_fapl_family(plist, “.met”, H5P_DEFAULT,”.dat”, H5P_DEFAULT); file = H5Fcreate (H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist); H5Pclose(plist);

HDF 45 File Families Allows you to access files larger than 2GB on file systems that don't support large files Any HDF5 file can be split into a family of files and vice versa A family member size must be a power of two

HDF 46 Creating a File Family Create a file access property list Set up file access property list to use file family Create the file with this property list plist = H5Pcreate (H5P_FILE_ACCESS); H5Pset_fapl_family (plist, family_size, H5P_DEFAULT); file = H5Fcreate (H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist); H5Pclose(plist);

HDF 47 HDF5 Datatypes HDF5 Datatypes

HDF 48 Datatypes A datatype is –A classification specifying the interpretation of a data element –Specifies for a given data element the set of possible values it can have the operations that can be performed how the values of that type are stored –May be shared between different datasets in one file

HDF 49 HDF5 datatypes Atomic types –standard integer & float –user-definable scalars (e.g. 13-bit integer) –bitfields –variable length types (e.g. strings) –pointers - references to objects/dataset regions –enumeration - names mapped to integers

HDF 50 General Operations on HDF5 Datatypes Create –H5Tcreate creates a datatype of the HT_COMPOUND, H5T_OPAQUE, and H5T_ENUM classes Copy –H5Tcopy creates another instance of the datatype; can be applied to any datatypes Commit –H5Tcommit creates an Datatype Object in the HDF5 file; comitted datatype can be shared between different datatsets Open –H5Topen opens the datatypes stored in the file Close –H5Tclose closes datatype object

HDF 51 Programming model for HDF5 Datatypes Use predefined HDF5 types –No need to close OR –Create Create a datatype (by copying existing one or by creating from the one of H5T_COMPOUND(ENAUM,OPAQUE) classes) Create a datatype by queering datatype of a dataset –Open committed datatype from the file (Optional) Discover datatype properties (size, precision, members, etc.) Use datatype to create a dataset/attribute, to write/read dataset/attribute, to set fill value (Optional) Save datatype in the file Close

HDF 52 HDF5 Compound Datatypes Compound types –Comparable to C structs –Members can be atomic or compound types –Members can be multidimensional –Can be written/read by a field or set of fields –Non all data filters can be applied (shuffling, SZIP) –H5Tcreate(H5T_COMPOUND), H5Tinsert calls to create a compound datatype –See H5Tget_member* functions for discovering properties of the HDF5 compound datatype

HDF 53 Data Time Data Time HDF5 Fixed and Variable length array storage

HDF 54 HDF5 Variable Length Datatypes Programming issues Each element is represented by C struct typedef struct { size_t length; void *p; } hvl_t; Base type can be any HDF5 type

HDF 55 HDF5 Variable Length Datatypes Global heap Dataset with variable length datatype Raw data

HDF 56 HDF Information HDF Information Center – HDF Help address HDF users mailing list