Presentation is loading. Please wait.

Presentation is loading. Please wait.

September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 HDF5 Tutorial 37 th SPEEDUP Workshop on HPC Albert Cheng, Elena Pourmal The HDF Group.

Similar presentations


Presentation on theme: "September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 HDF5 Tutorial 37 th SPEEDUP Workshop on HPC Albert Cheng, Elena Pourmal The HDF Group."— Presentation transcript:

1 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 HDF5 Tutorial 37 th SPEEDUP Workshop on HPC Albert Cheng, Elena Pourmal The HDF Group

2 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial2 Outline 8:00 – 9:00 Introduction to HDF5 data, programming models and tools 9:00 – 9:30 Advanced features 10:00 – 12:00 Introduction to Parallel HDF5 13:15 – 14:15 Caching and buffering in HDF5 14:45 – 16:45 New features in HDF5 1.8.0

3 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial3 Introduction to HDF5 Data, Programming Models and Tools

4 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial4 What is HDF?

5 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial5 HDF is… A file format for managing any kind of data Software system to manage data in the format Designed for high volume or complex data Designed for every size and type of system Open format and software library, tools There are two HDF’s: HDF4 and HDF5 Today we focus on HDF5

6 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial6 HDF5 The Format

7 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial7 An HDF5 “file” is a container… lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 palette …into which you can put your data objects

8 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial8 Structures to organize objectspalette Raster image 3-D array 2-D array Raster image lat | lon | temp ----|-----|----- 12 | 23 | 3.1 12 | 23 | 3.1 15 | 24 | 4.2 15 | 24 | 4.2 17 | 21 | 3.6 17 | 21 | 3.6Table “/” (root) “/foo” “Groups” “Datasets”

9 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial9 HDF5 model Groups – provide structure among objects Datasets – where the primary data goes Data arrays Rich set of datatype options Flexible, efficient storage and I/O Attributes, for metadata Everything else is built essentially from these parts.

10 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial10 HDF5 The Software

11 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial11 HDF5 Software HDF5 I/O Library HDF5 I/O Library Tools, Applications, Libraries Tools, Applications, Libraries HDF5 File HDF5 File

12 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial12 Users of HDF5 Software “Virtual file layer” (VFL) HDF5 Application Programming Interface Tools & Applications Tools & Applications “HDF5 File” “HDF5 File” File system, MPI-IO, SAN, other layers Modules to adapt I/O to specific features of system, or do I/O in some special way. “File” could be on parallel system, in memory, network, collection of files, etc. Applications, tools use this API to create, read, write, query, etc. Power users (consumers) Most data consumers are here. Scientific/engineering applications. Domain-specific libraries/API, tools.

13 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial13 HDF5 Philosophy A single platform with multiple uses One general formatOne general format One library, withOne library, with Options to adapt I/O and storage to data needsOptions to adapt I/O and storage to data needs Layers on top and belowLayers on top and below Ability to interact well with other technologiesAbility to interact well with other technologies Attention to past, present, future compatibilityAttention to past, present, future compatibility

14 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial14 Who uses HDF5?

15 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial15 Who uses HDF5? Applications that deal with big or complex data Over 200 different types of apps 2+million product users world-wide Academia, government agencies, industry

16 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial16 Applications with large amounts of data

17 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial17 NASA EOS remote sense data HDF format is the standard file format for storing data from NASA's Earth Observing System (EOS) mission. Petabytes of data stored in HDF and HDF5 to support the Global Climate Change Research Program.

18 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial18 Large simulations A simulation can have billions of elements Each element can have dozens of associated values

19 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial19 Large images 25-80Å resolution 4k x 4k x 500 images now 8k x 8k x 1k images (soon 256 GB) Electron tomography

20 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial20 It is not just about size

21 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial21 Data complexity Thanks to Mark Miller, LLNL

22 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial22 Complex relationships within data Contig Summaries Discrepancies Contig Qualities Coverage Depth Read quality Aligned bases Contig Reads Percent match Trace SNP Score

23 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial23 High speed, multi-stream, multi-modal data collection Analyze and query specific parameters by time, space Different views of data Flight test

24 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial24 HDF5 Data Model

25 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial25 HDF5 model (recap) Groups – provide structure among objects Datasets – where the primary data goes Data arrays Rich set of datatype options Flexible, efficient storage and I/O Attributes, for metadata Other objects Links (point to data in a file or in another HDF5 file) Datatypes (can be stored for complex structures and reused by multiple datatsets)

26 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial26 HDF5 Dataset DataMetadata Dataspace 3 Rank Dim_2 = 5 Dim_1 = 4 Dimensions Time = 32.4 Pressure = 987 Temp = 56 Attributes Chunked Compressed Dim_3 = 7 Storage info IEEE 32-bit float Datatype

27 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial27 HDF5 Dataspace Two roles Dataspace contains spatial info about a dataset stored in a file Rank and dimensions Permanent part of dataset definition Dataspace describes application’s data buffer and data elements participating in I/O Rank = 2 Dimensions = 4x6 Rank = 1 Dimensions = 12

28 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial28 HDF5 Datatype Datatype – how to interpret a data element Permanent part of the dataset definition Two classes: atomic and compound Can be stored in a file as an HDF5 object (HDF5 committed datatype) Can be shared among different datasets

29 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial29 HDF5 Datatype HDF5 atomic types include normal integer & float user-definable (e.g., 13-bit integer) variable length types (e.g., strings) references to objects/dataset regions enumeration - names mapped to integers array HDF5 compound types Comparable to C structs (“records”) Members can be atomic or compound types

30 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial30 Record int8int4int16 2x3x2 array of float32 Datatype: HDF5 dataset: array of records Dimensionality: 5 x 3 3 5

31 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial31 Special storage options for dataset Better subsetting access time; extendable chunked Improves storage efficiency, transmission speed compressed Arrays can be extended in any direction extendable Metadata for Fred Dataset “Fred” File A File B Data for Fred Metadata in HDF5 file, raw data in a binary file external

32 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial32 HDF5 Attribute Attribute – data of the form “name = value”, attached to an object by application Operations similar to dataset operations, but … Not extendible No compression or partial I/O Can be overwritten, deleted, added during the “life” of a dataset

33 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial33 HDF5 Group A mechanism for organizing collections of related objects Every file starts with a root group Similar to UNIX directories Can have attributes “/”

34 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial34 “/” X temp / (root) /X /Y /Y/temp /Y/bar/temp Path to HDF5 object in a file Y bar

35 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial35 Shared HDF5 objects /A/P /B/R /C/P “/” A B C P R P

36 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial36 HDF5 Data Model Example ENSIGHT Automotive crash simulation

37 September 9, 200837SPEEDUP Workshop - HDF5 Tutorial

38 Automotive crash simulation September 9, 200838SPEEDUP Workshop - HDF5 Tutorial

39 Automotive crash simulation September 9, 200839SPEEDUP Workshop - HDF5 Tutorial

40 Solid modeling September 9, 200840SPEEDUP Workshop - HDF5 Tutorial

41 Solid modeling September 9, 200841SPEEDUP Workshop - HDF5 Tutorial

42 HDF5mesh September 9, 200842SPEEDUP Workshop - HDF5 Tutorial

43 April 28, 2008LCI Tutorial43 Mesh Example, in HDFView September 9, 200843SPEEDUP Workshop - HDF5 Tutorial

44 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial44 HDF5 Software

45 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial45 HDF5 software stack Tools & Applications HDF File HDF I/O Library

46 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial46 Virtual file I/O (C only)  Perform byte-stream I/O operations (open/close, read/write, seek)  User-implementable I/O (stdio, network, memory, etc.) Virtual file I/O (C only)  Perform byte-stream I/O operations (open/close, read/write, seek)  User-implementable I/O (stdio, network, memory, etc.) Library internals Performs data transformations and other prep for I/O Configurable transformations (compression, etc.) Library internals Performs data transformations and other prep for I/O Configurable transformations (compression, etc.) Structure of HDF5 Library Object API (C, Fortran 90, Java, C++)  Specify objects and transformation properties  Invoke data movement operations and data transformations Object API (C, Fortran 90, Java, C++)  Specify objects and transformation properties  Invoke data movement operations and data transformations

47 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial47 Write – from memory to disk memorydisk

48 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial48 Partial I/O (b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array memorydisk (a) Hyperslab from a 2D array to the corner of a smaller 2D array memory disk Move just part of a dataset

49 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial49 (c) A sequence of points from a 2D array to a sequence of points in a 3D array. memorydisk (d) Union of hyperslabs in file to union of hyperslabs in memory. Partial I/O memorydisk Move just part of a dataset

50 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial50 Layers – parallel example Application Parallel computing system (Linux cluster) Compute node I/O library (HDF5) Parallel I/O library (MPI-I/O) Parallel file system (GPFS) Switch network/I/O servers Compute node Disk architecture & layout of data on disk I/O flows through many layers from application to disk.

51 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial51 Virtual file I/O (C only) Library internals Virtual I/O layer Object API (C, Fortran 90, Java, C++)

52 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial52 Virtual file I/O layer A public API for writing I/O drivers Allows HDF5 to interface to disk, the network, memory, or a user-defined device Network File FamilyMPI I/OMemory Virtual file I/O drivers Memory Stdio File Family File “Storage”

53 Applications & Domains September 9, 2008SPEEDUP Workshop - HDF5 Tutorial53 Storage File on parallel file system File Split metadata and raw data files User-defined device ? Across the network or to/from another application or library HDF5 format HDF5 data model & API Simulation, visualization, remote sensing… Examples: Thermonuclear simulations Product modeling Data mining tools Visualization tools Climate models Common domain-specific data models HDF5 virtual file layer (I/O drivers) MPI I/O Split Files Stdio Custom Stream HDF5 serial & parallel I/O UDMSAFH5PartHDF-EOSIDL Domain-specific APIs LANLLLNL, SNLGrids COTSNASA

54 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial54 Portability & Robustness Runs almost anywhere Linux and UNIX workstations Windows, Mac OS X Big ASC machines, Crays, VMS systems TeraGrid and other clusters Source and binaries available from http://www.hdfgroup.org/HDF5/release/index.html QA Daily regression tests on key platforms Meets NASA’s highest technology readiness level

55 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial55 Other Software The HDF Group HDFView Java tools Command-line utilities Web browser plug-in Regression and performance testing software Parallel h5diff 3 rd Party (IDL, MATLAB, Mathematica, PyTables, HDF Explorer, LabView) Communities (EOS, ASC, CGNS) Integration with other software (iRODS, OPeNDAP)

56 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial56 Creating an HDF5 File with HDFView

57 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial57 A B “/” (root) Example: Create this HDF5 File 4x6 array of integers Storm

58 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial58 Demo Demonstrate the use of HDFView to create the HDF5 file Use h5dump to see the contents of the HDF5 file Use h5import to add data to the HDF5 file Use h5repack to change properties of the stored objects Use h5diff to compare two files

59 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial59 Introduction to HDF5 Programming Model and APIs

60 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial60 Virtual file I/O API (C only)  Perform byte-stream I/O operations (open/close, read/write, seek)  User-implementable I/O (stdio, mpi-io, memory, etc.) Virtual file I/O API (C only)  Perform byte-stream I/O operations (open/close, read/write, seek)  User-implementable I/O (stdio, mpi-io, memory, etc.) Library internals Performs data transformations and other prep for I/O Configurable transformations (compression, etc.) Library internals Performs data transformations and other prep for I/O Configurable transformations (compression, etc.) Structure of HDF5 Library (recap) Object API (C, Fortran 90, Java, C++)  Specify objects and transformation properties  Invoke data movement operations and data transformations Object API (C, Fortran 90, Java, C++)  Specify objects and transformation properties  Invoke data movement operations and data transformations

61 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial61 Goals of HDF5 Library Provide flexible API to support a wide range of operations on data. Support high performance access in serial and parallel computing environments. Be compatible with common data models and programming languages. Because of these goals, the HDF5 API is rich and large

62 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial62 Operations Supported by the API Create groups, datasets, attributes, linkages Create complex data types Assign storage and I/O properties to objects Perform complex subsetting during read/write Use variety of I/O “devices” (parallel, remote, etc.) Transform data during I/O Query about file and structure and properties Query about object structure, content, properties

63 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial63 Characteristics of the HDF5 API For flexibility, the API is extensive 300+ functions This can be daunting… but there is hope A few functions can do a lot Start simple Build up knowledge as more features are needed Library functions are categorized by object type “H5Lite” API supports basic capabilities Victronix Swiss Army Cybertool 34

64 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial64 The General HDF5 API Currently C, Fortran 90, Java, and C++ bindings. C routines begin with prefix H5? ? is a character corresponding to the type of object the function acts on Example APIs: H5D :Dataset interface e.g., H5Dread H5F : File interface e.g., H5Fopen H5S : dataSpace interface e.g., H5Sclose

65 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial65 Compiling HDF5 Applications h5cc – HDF5 C compiler command Similar to mpicc h5fc – HDF5 F90 compiler command Similar to mpif90 h5c++ – HDF5 C++ compiler command To compile: % h5cc h5prog.c % h5fc h5prog.f90

66 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial66 Compile option: -show -show: displays the compiler commands and options without executing them % h5cc –show Sample_c.c gcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include -UH5_DEBUG_API -DNDEBUG -I/home/packages/szip/static/encoder/Linux2.6-gcc/include -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_POSIX_SOURCE -D_BSD_SOURCE -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -c Sample_c.c gcc -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -L/home/packages/szip/static/encoder/Linux2.6-gcc/lib Sample_c.o -L/home/packages/hdf5_1.6.6/Linux_2.6/lib /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5_hl.a /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5.a -lsz -lz -lm -Wl,-rpath -Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib

67 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial67 General Programming Paradigm Properties of object are optionally defined Creation properties Access property lists Default values used if none are defined Object is opened or created Object is accessed, possibly many times Object is closed

68 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial68 Order of Operations An order is imposed on operations by argument dependencies For Example: A file must be opened before a dataset -because- the dataset open call requires a file handle as an argument. Objects can be closed in any order.

69 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial69 HDF5 Defined Types For portability, the HDF5 library has its own defined types: hid_t: object identifiers (native integer) hsize_t: size used for dimensions (unsigned long or unsigned long long) hssize_t: for specifying coordinates and sometimes for dimensions (signed long or signed long long) herr_t: function return value hvl_t: variable length datatype For C, include hdf5.h in your HDF5 application.

70 Example: Create this HDF5 File September 9, 2008SPEEDUP Workshop - HDF5 Tutorial70 A B “/” (root) 4x6 array of integers

71 Example: Step by Step September 9, 2008SPEEDUP Workshop - HDF5 Tutorial71 A 4x6 array of integers B “/” (root)

72 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial72 “/” (root) Example: Create a File

73 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial73 Steps to Create a File 1.Decide any special properties the file should have Creation properties, like size of user block Access properties, such as metadata cache size 2.Create property lists, if necessary 3.Create the file 4.Close the file and the property lists, as needed

74 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial74 Code: Create a File hid_t file_id; file_id = H5Fcreate("file.h5",H5F_ACC_TRUNC, H5P_DEFAULT,H5P_DEFAULT); H5F_ACC_TRUNC flag – removes existing file H5P_DEFAULT flags – create regular UNIX file and access it with HDF5 SEC2 I/O file driver

75 Example: Add a Dataset September 9, 2008SPEEDUP Workshop - HDF5 Tutorial75 4x6 array of integers A “/” (root)

76 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial76 Dataset Components DataMetadata Dataspace 3 Rank Dim_2 = 5 Dim_1 = 4 Dimensions Time = 32.4 Pressure = 987 Temp = 56 Attributes Chunked Compressed Dim_3 = 7 Storage info IEEE 32-bit float Datatype

77 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial77 Dataset Creation Property List Dataset creation property list: information on how to store data in a file Chunked Chunked & compressed

78 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial78 Steps to Create a Dataset 1.Define dataset characteristics Dataspace – 4x6 Datatype – integer Properties (if needed) 2.Decide where to put it – “root group” Obtain location identifier 3.Decide link or path – “A” 4.Create link and dataset in file 5.(Eventually) Close everything A “/” (root)

79 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial79 1 hid_t file_id, dataset_id, dataspace_id; 2 hsize_t dims[2]; 3 herr_t status; 4 file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); 5 dims[0] = 4; 6 dims[1] = 6; 7 dataspace_id = H5Screate_simple (2, dims, NULL); 8 dataset_id = H5Dcreate(file_id,”A",H5T_STD_I32BE, dataspace_id, H5P_DEFAULT); 9 status = H5Dclose (dataset_id); 10 status = H5Sclose (dataspace_id); 11 status = H5Fclose (file_id); Code: Create a Dataset Terminate access to dataset, dataspace, file Create a dataspace rank current dims Create a dataset dataspace datatype property list (default) pathname

80 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial80 A B “/” (root) Example: Create a Group 4x6 array of integers file.h5

81 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial81 Steps to Create a Group 1.Decide where to put it – “root group” Obtain location identifier 2.Decide link or path – “B” 3.Create link and group in file Specify number of bytes to store names of objects to be added to group (as a hint) – or use default. 4.(Eventually) Close the group.

82 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial82 Code: Create a Group hid_t file_id, group_id;... /* Open “file.h5” */ file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR, H5P_DEFAULT); /* Create group "/B" in file. */ group_id = H5Gcreate(file_id,"/B",0); /* Close group and file. */ status = H5Gclose(group_id); status = H5Fclose(file_id);

83 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial83 HDF5 Information HDF Information Center http://www.hdfgroup.org HDF Help email address help@hdfgroup.org HDF users mailing lists news@hdfgroup.org hdf-forum@hdfgroup.org

84 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial84 Questions?


Download ppt "September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 HDF5 Tutorial 37 th SPEEDUP Workshop on HPC Albert Cheng, Elena Pourmal The HDF Group."

Similar presentations


Ads by Google