Elena Pourmal The HDF Group

Slides:



Advertisements
Similar presentations
The HDF Group Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal The HDF Group HDF/HDF-EOS Workshop XIV September 30, 2010.
Advertisements

The HDF Group HDF Group Support for NPP/JPSS Mike Folk, Elena Pourmal, Larry Knox, Albert Cheng The HDF Group The 15 th HDF and HDF-EOS.
® Page 1 Intel Compiler Lab – Intel Array Visualizer HDF Workshop VI December 5, 2002 John Readey
HDF4 and HDF5 Performance Preliminary Results Elena Pourmal IV HDF-EOS Workshop September
Note: This brochure is designed to be printed. You should test print on regular paper to ensure proper positioning before printing on card stock. You may.
Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal, Peter Cao The HDF Group June 30, NPOESS Data Formats Working Group.
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
HDF5 Tools Update Peter Cao - The HDF Group November 6, 2007 This report is based upon work supported in part by a Cooperative Agreement.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
Upgrade to Real Time Linux Target: A MATLAB-Based Graphical Control Environment Thesis Defense by Hai Xu CLEMSON U N I V E R S I T Y Department of Electrical.
April 6, 2010GMQS Meeting1 Optional Feature Support in HDF5 Tools Albert Cheng The HDF Group.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1.
December 1, 2005HDF & HDF-EOS Workshop IX P eter Cao, NCSA December 1, 2005 Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration.
1 HDF-EOS Status and Development Larry Klein, Abe Taaheri, and Cid Praderas L-3 Communications Government Services, Inc. November 30, 2005.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
Page 1 Status of HDF-EOS, Related Software, and Tools Abe Taaheri, Raytheon IIS HDF & HDF-EOS Workshp XIII Riverdale, MD November 4, 2009.
HDF 1 New Features in HDF Group Revisions HDF and HDF-EOS Workshop IX November 30, 2005.
The HDF Group HDF5 Tools Updates Peter Cao, The HDF Group September 28-30, 20101HDF and HDF-EOS Workshop XIV.
October 15, 2008HDF and HDF-EOS Workshop XII1 What will be new in HDF5?
1 HDF5 Life cycle of data Boeing September 19, 2006.
The HDF Group Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal, Peter Cao The HDF Group November 5, 2009 November 3-5,
September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 Introduction to HDF5 Command-line Tools.
The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015.
The HDF Group HDF Group Support for NPP/JPSS Mike Folk, Elena Pourmal, Larry Knox, Albert Cheng The HDF Group DEWG Meeting June 19, 2012.
The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15.
The HDF Group HDF5 Overview Elena Pourmal The HDF Group 1 10/17/15ICALEPCS 2015.
The HDF Group Overview of nagg Presentation and Demo for DEWG September 25, 2012 DEWG nagg tutorial1September 25, 2012 Larry Knox.
The HDF Group New Elements and Lessons Learned for New Mission HDF5 Products Ideas for new mission HDF5 data products 1July 8, 2013 Larry.
Threads. Readings r Silberschatz et al : Chapter 4.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 1.
The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
The HDF Group Introduction to HDF5 Session ? High Performance I/O 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Introduction to Operating Systems Concepts
DISCOVERING COMPUTERS 2018 Digital Technology, Data, and Devices
INTRO. To I.T Razan N. AlShihabi
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
HDF and HDF-EOS Workshop XII
Lab 4 HW/SW Compression and Decompression of Captured Image
Hierarchical Data Formats (HDF) Update
© 2002, Cisco Systems, Inc. All rights reserved.
Chapter 4: Threads.
CSE775 - Distributed Objects, Spring 2006
Chapter 11: File System Implementation
Moving from HDF4 to HDF5/netCDF-4
Distributed Shared Memory
Single Writer/Multiple Reader (SWMR)
Chapter 12: File System Implementation
HDF5 New Features October 8, 2017
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
HDF5 Metadata and Page Buffering
Process Management Presented By Aditya Gupta Assistant Professor
Chapter 9 – Real Memory Organization and Management
Efficiently serving HDF5 via OPeNDAP
Chapter 4 Threads.
The Client/Server Database Environment
Chapter 2: Operating-System Structures
Chapter 4: Threads.
Threads and Data Sharing
Chapter 2: The Linux System Part 2
HDF5 Virtual Dataset Elena Pourmal Copyright 2017, The HDF Group.
Chapter 2: System Structures
Chapter 2: The Linux System Part 1
Peter Cao The HDF Group November 28, 2006
Hierarchical Data Format (HDF) Status Update
2018 NRS FIA Management Team Meeting DATIM December 11-12, 2018 — Manhattan, Kansas By Liz Burrill Presented by Scott Pugh.
Chapter 2: Operating-System Structures
HDF5 Tools Updates and Discussions
CSE 542: Operating Systems
Presentation transcript:

Elena Pourmal The HDF Group epourmal@hdfgroup.org HDF Update Elena Pourmal The HDF Group epourmal@hdfgroup.org This work was supported by NASA/GSFC under Raytheon Co. contract number NNG15HZ39C

Outline What’s new in HDF? HDF tools Q & A: Tell us about your needs HDFView nagg ODBC Q & A: Tell us about your needs

HDF5 HDF5 Compression Single writer/multiple reader file access Faster way to write compressed data to HDF5 Community supported compression filters https://github.com/nexusformat/HDF5-External-Filter-Plugins/tree/master/ Single writer/multiple reader file access Virtual Data Set HDF5 JNI is part of the HDF5 source code

Direct chunk write: H5DOwrite_chunk Complexity of data flow when chunk is written by H5Dwrite call vs. simplified patch with the optimized function

Performance results for H5DOwrite_chunk Test result on Linux 2.6, x86_64 Each dataset contained 100 chunks, written by chunks 1 Speed in MB/s 2 Time in seconds

Dynamically loaded filters Problems with using custom filters “Off the shelf” tools do not work with the third-party filters Solution Use 1.8.11 and later and dynamically loaded HDF5 compression filters Maintained library of HDF5 compression filters https://github.com/nexusformat/HDF5-External-Filter-Plugins

Example: Choose compression that works for your data Compression ratio = uncompressed size/compressed size h5repack command was used to apply compression Time was reported with Linux time command SCRIS_npp_d20140522_t0754579_e0802557_b13293_c20140522142425734814_noaa_pop.h5 Original size in bytes Compression ratio with GZIP level 6 (time) Compression ratio with SZIP NN encoding 32 (time) 256,828,584 1.3 (32.2 sec) 1.27 (4.3 sec) H5repack tools was used to apply compression to every dataset in the file -rw-r--r-- 1 epourmal hdf 196611611 Jul 17 17:33 gzip.h5 -rw-r--r-- 1 epourmal hdf 256828584 Jul 17 11:35 orig.h5 -rw-r--r-- 1 epourmal hdf 201924661 Jul 17 17:33 szip.h5 July 15, 2014 JPSS DEWG Telecon

Example (cont): Choose compression that works for your data Compression ratio = uncompressed size/compressed size Dataset name (examples) Dataset size in bytes Compression ratio with GZIP level 6 Compression ratio with SZIP NN encoding 32 ICT_TemperatureConsistency 240 0.667 Cannot be compressed DS_WindowSize 6,480 28.000 54.000 ES_ImaginaryLW 46,461,600 1.076 1.000 ES_NEdNLW 1.169 1.590 ES_NEdNMW 28,317,600 14.970 1.549 ES_NEdNSW 10,562,400 15.584 1.460 ES_RDRImpulseNoise 48,600 124.615 405.000 ES_RealLW 1.158 1.492 SDRFringeCount 97,200 223.448 720.00 GZIP compression alone made the difference only for 2 datasets; modified with shuffle – 3 datasets showed great compression ratios. July 15, 2014 JPSS DEWG Telecon

SWMR: Data access to file being written New data elements… Writer Reader HDF5 File …that can be read by a reader… with no IPC necessary. No communications between the processes and no file locking are required. The processes can run on the same or on different platforms, as long as they share a common file system that is POSIX compliant. The orderly operation of the metadata cache is crucial to SWMR functioning. A number of APIs have been developed to handle the requests from writer and reader processes and to give applications the control of the metadata cache they might need. … are added to a dataset in the file… Copyright © 2015 The HDF Group. All rights reserved.

SWMR Released in HDF5 1.10.0 Restricted to append-data only scenario SWMR doesn’t work on NFS Files are not compatible with HDF5 1.8.* libraries Use h5format_convert tool Converts HDF5 metadata in place No raw data is rewritten

VDS Data stored in multiple files and datasets can be accessed via one dataset (VDS) using standard HDF5 read/write

Collect data one way …. File: a.h5 Dataset /A File: b.h5 Dataset /B Images are stored in four different datasets. They represent a part of the bigger image. File: c.h5 Dataset /C File: d.h5 Dataset /D

Present it in a different way… Whole image Virtual dataset stores a mapping for each quadrant to data stored in the source HDF5 files and datasets a-d.h5. Application can read the whole image from dataset /D using regular H5Dread call. It doesn’t need to know the mapping in order to read data. File: F.h5 Dataset /D

VDS VDS works with SWMR File with VDS cannot be accessed by HDF5 1.8.* libraries Use h5repack tool to rewrite data (1.10.0-patch1)

HDF5 Roadmap for 2016 -2017 May 31 -HDF5 1.10.0-patch1 h5repack, Windows builds, Fortran issues on HPC systems Late summer HDF5 1.10.1 (?) Address issues found in 1.10.0 December HPC features that didn’t make it into 1.10.0 release Maintenance releases of HDF5 1.8 and 1.10 versions (May and November)

HDF4 HDF 4.2.12 (June 2016) Support for latest Intel, PGI and GNU compilers HDF4 JNI included with the HDF4 source code

HDF5 Roadmap for 2016 -2017 December 2016 Summer 2017 Minor bug fixes (if required) Performance improvements Summer 2017 Keep up with computing environments

HDFView HDFView 2.13 (July 2016) HDFView 3.0-alpha Bug fixes Last release based on the HDF5 1.8.* releases HDFView 3.0-alpha New GUI Better internal architecture Based on HDF5 1.10 release Java API wrappers (JNI) API: HDF4, HDF5, version 1.10.1 JNI libraries packaged with appropriate HDF release (like C++, Fortran) HDFView – HDF4/HDF5 file display, creation, editing Better support for complex data types Limited “compound compound”, variable length Beta (alpha?) version with SWT GUI (Look and feel, better memory handling) Will support v1.10.0 To be done: Support for 1.10 features (creating VDS, etc.) Various bug fixes and minor new features Memory model redesign for large datasets / large # of objects

HDFView 3.0 Screenshot

Nagg tool Nagg is a tool for rearranging NPP data granules from existing files to create new files with a different aggregation number or a different packaging arrangement. Release 1.6.2 before July 21, 2016 September 23, 2015 HDF Workshop

Nagg Illustration - IDV visualization 9 input files – 4 granules each in GMODO-SVM07… files September 23, 2015 HDF Workshop

Nagg Illustration - IDV visualization 1 output file –36 granules in GMODO-SVM07… file HDF Workshop September 23, 2015

nagg: Aggregation Example User Request Interval T0 = IDPS Epoch Time January 1, 1958 00:00:00 GMT HDF5 File 1 ……………………………………… HDF5 File M This and the following slide address one of the simplest scenarios to explain nagg’s functionality. User requested a product data for a particular time interval with one granule per file. The user gets HDF5 files with product data with one granule per file and HDF5 files with the corresponding geolocation data. He/she would like to re-aggregate data to have 5 granules per file. (next slide) Each file contains one granule User requests data from the IDPS system for a specific time interval Granules and products are packaged in the HDF5 files according to the request This example shows one granule per file for one product

nagg: Aggregation Example Example: nagg –n 5 –t SATMS SATMS_npp_d2012040*.h5 Nagg copies data to the newly generated file(s). User Request Interval Here is a command that will do it. –n flag indicates number of granules per file, -t indicates the type or product to be re-aggregated. User has to specify the list of files with the granules to aggregate. For convenience one can use wild cards to specify files names. The result of nagg operation will be regarregated files as they would be received by the user from the IDPS system. New product files will co-align with the aggregation bucket start. Therefore sometime the first and the last files in aggregation will not have five granules. On this slide we show that the first HDF5 files will contain 4 granules and the last one only three granules. Geolocation product will be aggregated and packaged with the product data. the tool has –g option to control geolocation data packaging and aggregation. T0 = IDPS Epoch Time January 1, 1958 00:00:00 GMT HDF5 File 1 ……………………………………………… HDF5 File N First file contains 4 granules, the last one contains 3 granules Other files contain 5 granules Produced files co-align with the aggregation bucket start HDF5 files are ‘full’ aggregations (full, relative to the aggregation period) Geolocation granules are aggregated and packaged; see –g option for more control

Possible enhancement Example: nagg –n 5 –v –t SATMS SATMS_npp_d2012040*.h5 Nagg with –v option doesn’t copy data to the newly generated file(s). User Request Interval HDF5 File 1 ……………………………………………… HDF5 File N Each file contains a virtual dataset. First file contains a dataset mapped to 4 granules, the last one contains a virtual dataset mapped to 3 granules Other files contain virtual datasets; each dataset is mapped to 5 granules NO RAW DATA IS REWRITTEN Space savings No I/O performed on raw data

HDF5 ODBC Driver Tap into the USB bus of data (ODBC) Direct access to your HDF5 data from your favorite BI application(s) odbc@hdfgroup.org Join the Beta Tell your friends Send feedback Beta test now Q3 2016 Release Desktop version Certified-for-Tableau Client/server version this Fall

New requirements and features? Tell us your needs (here are some ideas): Multi-threaded compression filters H5DOread_chunk function Full SWMR implementation Performance Backward/forward compatibility Other requests?

This work was supported by NASA/GSFC under Raytheon Co This work was supported by NASA/GSFC under Raytheon Co. contract number NNG15HZ39C