HDF4 and HDF5 Performance Preliminary Results Elena Pourmal IV HDF-EOS Workshop September 19 - 21 2000.

Slides:



Advertisements
Similar presentations
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
Advertisements

1 Projection Indexes in HDF5 Rishi Rakesh Sinha The HDF Group.
The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF5 Advanced Topics Elena Pourmal The HDF Group The 13 th HDF and HDF-EOS.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
VSphere vs. Hyper-V Metron Performance Showdown. Objectives Architecture Available metrics Challenges in virtual environments Test environment and methods.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
Making earth science data more accessible: experience with chunking and compression Russ Rew January rd Annual AMS Meeting Austin, Texas.
NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.
Lecture 1: Overview of Java. What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++ Designed.
Memory Allocation CS Introduction to Operating Systems.
NetCDF4 Performance Benchmark. Part I Will the performance in netCDF4 comparable with that in netCDF3? Will the performance in netCDF4 comparable with.
HDF5 collective chunk IO A Working Report. Motivation for this project ► Found extremely bad performance of parallel HDF5 when implementing WRF- Parallel.
Status of netCDF-3, netCDF-4, and CF Conventions Russ Rew Community Standards for Unstructured Grids Workshop, Boulder
Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal, Peter Cao The HDF Group June 30, NPOESS Data Formats Working Group.
HDF5 Tools Update Peter Cao - The HDF Group November 6, 2007 This report is based upon work supported in part by a Cooperative Agreement.
Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.
HDF 1 HDF5 Advanced Topics Object’s Properties Storage Methods and Filters Datatypes HDF and HDF-EOS Workshop VIII October 26, 2004.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
HDF5 A new file format & software for high performance scientific data management.
Page 1 HDF-EOS Tools Abe Taaheri, Raytheon IIS ESIP Meeting Chapel Hill, NC July 9, 2013.
Sep , 2010HDF/HDF-EOS Workshop XIV1 HDF5 Advanced Topics Neil Fortner The HDF Group The 14 th HDF and HDF-EOS Workshop September 28-30, 2010.
The HDF Group Parallel HDF5 Design and Programming Model May 30-31, 2012HDF5 Workshop at PSI 1.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
1 Introduction to HDF5 Data Model, Programming Model and Library APIs HDF and HDF-EOS Workshop VIII October 26, 2004.
The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1.
HDF Mike Folk National Center for Supercomputing Applications Science Data Processing Workshop February 26-28, 2002 HDF Update HDF.
December 1, 2005HDF & HDF-EOS Workshop IX P eter Cao, NCSA December 1, 2005 Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II.
The HDF Group HDF5 Tools Updates Peter Cao, The HDF Group September 28-30, 20101HDF and HDF-EOS Workshop XIV.
October 15, 2008HDF and HDF-EOS Workshop XII1 What will be new in HDF5?
CPSC 252 Concrete Data Types Page 1 Overview of Concrete Data Types There are two kinds of data types: Simple (or atomic) – represents a single data item.
1 HDF-EOS Status, Related Tools and Issues. 2 Overview.
Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
1 Dynamic Memory Allocation –The need –malloc/free –Memory Leaks –Dangling Pointers and Garbage Collection Today’s Material.
September, 2002 Efficient Bitmap Indexes for Very Large Datasets John Wu Ekow Otoo Arie Shoshani Lawrence Berkeley National Laboratory.
1 HDF5 Life cycle of data Boeing September 19, 2006.
A High performance I/O Module: the HDF5 WRF I/O module Muqun Yang, Robert E. McGrath, Mike Folk National Center for Supercomputing Applications University.
NetCDF Data Model Issues Russ Rew, UCAR Unidata NetCDF 2010 Workshop
View_hdf Kam-Pui Lee Science Applications International Corporation CERES Data Management Team Linda Hunt Computer Sciences Corporation Atmospheric Sciences.
The HDF Group Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal, Peter Cao The HDF Group November 5, 2009 November 3-5,
HDF Hierarchical Data Format Nancy Yeager Mike Folk NCSA University of Illinois at Urbana-Champaign, USA
The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF5 Advanced Topics Elena Pourmal The HDF Group The 13 th HDF and HDF-EOS.
September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 Introduction to HDF5 Command-line Tools.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.
The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015.
March 9, th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics.
FITSIO, HDF4, NetCDF, PDB and HDF5 Performance Some Benchmarks Results Elena Pourmal Science Data Processing Workshop February 27, 2002.
The HDF Group 10/17/15 1 HDF5 vs. Other Binary File Formats Introduction to the HDF5’s most powerful features ICALEPCS 2015.
11/8/2007HDF and HDF-EOS Workshop XI, Landover, MD1 Software to access HDF5 Datasets via OPeNDAP MuQun Yang, Hyo-Kyung Lee The HDF Group.
Intro to Parallel HDF5 10/17/151ICALEPCS /17/152 Outline Overview of Parallel HDF5 design Parallel Environment Requirements Performance Analysis.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
REEM ALMOTIRI Information Technology Department Majmaah University.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
The HDF Group Introduction to HDF5 Session 7 Datatypes 1 Copyright © 2010 The HDF Group. All Rights Reserved.
NetCDF Data Model Details Russ Rew, UCAR Unidata NetCDF 2009 Workshop
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
- 1 - Overview of Parallel HDF Overview of Parallel HDF5 and Performance Tuning in HDF5 Library NCSA/University of Illinois at Urbana- Champaign.
The HDF Group Introduction to HDF5 Session ? High Performance I/O 1 Copyright © 2010 The HDF Group. All Rights Reserved.
HDF and HDF-EOS Workshop XII
Analysis of Sparse Convolutional Neural Networks
Moving from HDF4 to HDF5/netCDF-4
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
What NetCDF users should know about HDF5?
Lecture 10: Buffer Manager and File Organization
Presentation transcript:

HDF4 and HDF5 Performance Preliminary Results Elena Pourmal IV HDF-EOS Workshop September

Why compare? HDF5 emerges as a new standard –proved to be robust –most of the planned features have been implemented in HDF –has a lot of new features compared to HDF4 –time for performance study and tuning Users move their data and applications to HDF5 HDF4 is not “bad,” but has limited capabilities

HDF5 HDF4 Files over 2GB Unlimited number of objects One data model (multidimensional array of structures) || support Thread safe Mounting files Diversity of datatypes (compound, VL, opaque) and operations (create, write, read, delete, shared) “Native” file is portable Modifiable I/O pipe-line (registration of compression methods) Selections (unions and regular blocks) Files less than 2GB Max limit of objects Different data models for SD, GR, RI, Vdatas N/A Only predefined datatypes such as float32, int16, char8 “Native” file is not portable N/A Selections (simple regular subsampling)

What to compare? (short list of common features) File I/O operations –plain read and write –hyperslab selections –regular subsampling –access to large number of objects –storage overhead Data organization in the file and access to it –Vdata vs compound datasets Chunking, unlimited dimensions, compression

Benchmark Environment 440-Mhz UltraSPARC i-IIi –1G memory –Sun OS 5.7 –gettimeofday() Mhz Pentium III Xeon –1G memory –RedHat 6.2 –clock() each measurement was taken 10 times, average and best times were collected

Benchmarks Writing 1Dim and 2Dim datasets of integers Reading 2Dim contiguous hyperslabs of integers Reading 2Dim contiguous hyperslabs of integers with subsampling Reading fixed size hyperslabs of integers from different locations in the dataset Writing and reading Vdatas and Compound Datasets CERES data

Writing 1Dim and 2Dim Datasets

Writing 1Dim Datasets In this test we created one-dimensional arrays of integers with sizes varying from 8Kbytes to 8000 Kbytes in steps of 8Kbytes. We measured the average and best times for writing these arrays into HDF4 and HDF5 files. Test was performed on Solaris platform. Neither HDF4 nor HDF5 performed data conversion.

Writing 1Dim Datasets HDF5 performs about 8 times better than HDF4. System activity affects timing results.

Writing 2Dim Datasets In this test we created two-dimensional arrays with sizes varying from 40 X 40 bytes to 4000 X 4000 bytes in steps of 40 bytes for each dimension. We measured the average and best times for writing these arrays into HDF4 and HDF5 files. The graphs were plotted by averaging the values obtained for the same array size, without considering the shape of the array. Test was performed on Solaris platform. Neither HDF4 nor HDF5 performed data conversion.

Writing 2Dim Datasets HDF4 shows nonlinear growth. HDF5 performs about 10 times better than HDF4.

Reading 2Dim Contiguous Hyperslabs

Reading Contiguous Hyperslabs In this test we created a file with 1000 X 1000 array of integers. Subsequently, we read hyperslabs of different sizes starting from a fixed position in the array and the measurements for read were averaged over 10 runs. HDF , HDF patched and HDF5 development libraries were tested. Test was performed on Solaris platform. Neither HDF4 nor HDF5 performed data conversion.

Reading Hyperslabs For hyperslabs > 1MB, HDF5 becomes more than 3 times slower than HDF4. It also shows nonlinear growth.

Reading Hyperslabs (latest version of the HDF5 development branch) For hyperslabs > 2MB, HDF5 becomes more about 1.5 times slower than HDF4. It still shows nonlinear growth.

Reading contiguous hyperslabs (fixed size) In this test, the size of the hyperslab was fixed to 100x100 elements. The hyperslab was moved, first along the X axis, then along the Y axis, and finally along the diagonal and the read performance was measured. Test was performed on Solaris platform. Neither HDF4 nor HDF5 performed data conversion.

Reading 100x100 Hyperslabs from Different Locations For small hyperslabs HDF5 performs about 3 times better than HDF4.

Reading Hyperslabs with Subsampling

Subsampling Hyperslabs In this test we created a file with 1000x1000 array of integers. Subsequently, we read every second element of the hyperslabs of different sizes starting from a fixed position in the array and the measurements for read were averaged over 10 runs. HDF , and HDF5 development libraries were tested. Test was performed on Solaris platform. Neither HDF4 nor HDF5 performed data conversion.

Reading Each Second Element of the Hyperslabs HDF5 shows nonlinear growth. HDF4 performs about 3 times for the hyperslabs with the size >.5MB

First Attempt to Improve the Performance HDF4 still performs 2 times better for the hyperslabs > 2MB. HDF5 shows nonlinear growth.

Current Behavior (HDF5 development branch) HDF5 growth linear and performs about 10 times better than HDF4.

Vdatas vs Compound Datasets

Vdatas and Compound Datasets In this test we created HDF4 files with Vdata and HDF5 files with compound dataset with sizes from 1000 to number of records: float a; short b;float c[3]; char d; write operation, write with packing data and partial read were tested. Test was performed on Linux platforms. We also looked into data conversion issues.

Conversion does not affect HDF4 performance. It does affect HDF5 ( more than in 15 times) Writing Data (VSwrite and H5Dwrite)

Data packing was added to the previous test. For HDF5 we have very small effect. Writing Data (timing includes packing:VSpack and H5Tpack)

Reading Two Fields Unpacking slows down HDF4 significantly ( about 8 times) HDF5 was reading packed data in this test.

CERES Data File

Structure of CERES file Vgroup CERES_ES8 Vgroup Geolocation Fields Vgroup Data Fields SDS Vdata

Ceres File Used H4toH5 converter to create an HDF5 version of the file – 81MB (HDF4), 80MB (HDF5) –1 min 55 sec on Linux –3 min 56 sec on Solaris Benchmarks –read up to 14 datasets (2148x660 floats) –subsampling: read two columns from the same datasets Benchmark was run on Solaris and Linux platforms

Reading CERES data on big and little - endian machines On Solaris platform, HDF5 was twice faster than HDF4. On Linux (data conversion is on), HDF4 was about faster.

Subsetting CERES Data Current version of HDF5 shows about 3 times better performance.

Conclusion Goal: tune HDF5 and give our users recommendations on its efficient usage Continue to study HDF4 and HDF5 performance –try more platforms: O2K, NT/Windows –try other features (e.g. chunking, compression) –specific HDF5 features (e.g. writing/reading big files, VL datatypes, compound datatypes, selections) Users input is necessary, send us access patterns you use! Results will be