FITSIO, HDF4, NetCDF, PDB and HDF5 Performance Some Benchmarks Results Elena Pourmal Science Data Processing Workshop February 27, 2002.

Slides:



Advertisements
Similar presentations
The HDF Group Parallel HDF5 Developments 1 Copyright © 2010 The HDF Group. All Rights Reserved Quincey Koziol The HDF Group
Advertisements

A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Computer Abstractions and Technology
1 Projection Indexes in HDF5 Rishi Rakesh Sinha The HDF Group.
The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF5 Advanced Topics Elena Pourmal The HDF Group The 13 th HDF and HDF-EOS.
Making earth science data more accessible: experience with chunking and compression Russ Rew January rd Annual AMS Meeting Austin, Texas.
NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.
HDF4 and HDF5 Performance Preliminary Results Elena Pourmal IV HDF-EOS Workshop September
Evaluating current processors performance and machines stability R. Esposito 2, P. Mastroserio 2, F. Taurino 2,1, G. Tortone 2 1 INFM, Sez. di Napoli,
COMPUTER SYSTEMS COMPONENTS 6 th grade. BCSI-1: Students will identify computer system components.  a) Identify and define the key functional components.
NetCDF4 Performance Benchmark. Part I Will the performance in netCDF4 comparable with that in netCDF3? Will the performance in netCDF4 comparable with.
Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal, Peter Cao The HDF Group June 30, NPOESS Data Formats Working Group.
OPeNDAP and the Data Access Protocol (DAP) Original version by Dave Fulker.
University of Illinois at Urbana-ChampaignHDF 9/19/2000 McGrath 9/19/ Transition from HDF4 to HDF5: Issues Robert E. McGrath NCSA University of Illinois.
Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.
HDF 1 HDF5 Advanced Topics Object’s Properties Storage Methods and Filters Datatypes HDF and HDF-EOS Workshop VIII October 26, 2004.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
HDF5 A new file format & software for high performance scientific data management.
Sep , 2010HDF/HDF-EOS Workshop XIV1 HDF5 Advanced Topics Neil Fortner The HDF Group The 14 th HDF and HDF-EOS Workshop September 28-30, 2010.
The HDF Group Parallel HDF5 Design and Programming Model May 30-31, 2012HDF5 Workshop at PSI 1.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
College of Engineering Representing Numbers in a Computer Section B, Spring 2003 COE1361: Computing for Engineers COE1361: Computing for Engineers 1 COE1361:
HDF Mike Folk National Center for Supercomputing Applications Science Data Processing Workshop February 26-28, 2002 HDF Update HDF.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
Array Cs212: DataStructures Lab 2. Array Group of contiguous memory locations Each memory location has same name Each memory location has same type a.
An In-Depth Examination of Java I/O Performance and Possible Tuning Strategies Kai Xu Hongfei Guo
October 15, 2008HDF and HDF-EOS Workshop XII1 What will be new in HDF5?
CISC105 – General Computer Science Class 9 – 07/03/2006.
Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.
March 17, 2006CIP Status Meeting March 17, 2006 Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Project Report at CIP AG Meeting.
 Copyright, HiCLAS1 George Delic, Ph.D. HiPERiSM Consulting, LLC And Arney Srackangast, AS1MET Services
NDE XML Tailoring Service HDF and HDF-EOS Workshop XIII November 4, 2009 Tom Feroli, Peter MacHarrie.
1 HDF5 Life cycle of data Boeing September 19, 2006.
A High performance I/O Module: the HDF5 WRF I/O module Muqun Yang, Robert E. McGrath, Mike Folk National Center for Supercomputing Applications University.
Performance Optimization Getting your programs to run faster.
The HDF Group Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal, Peter Cao The HDF Group November 5, 2009 November 3-5,
NetCDF file generated from ASDC CERES SSF Subsetter ATMOSPHERIC SCIENCE DATA CENTER Conversion of Archived HDF Satellite Level 2 Swath Data Products to.
HDF Hierarchical Data Format Nancy Yeager Mike Folk NCSA University of Illinois at Urbana-Champaign, USA
The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF5 Advanced Topics Elena Pourmal The HDF Group The 13 th HDF and HDF-EOS.
HDF5 UML Figures for Presenters Part I: Class Diagrams Part II: Relationship Diagrams Parts III & IV: The above, with text blocks.
Ohio State University Department of Computer Science and Engineering An Approach for Automatic Data Virtualization Li Weng, Gagan Agrawal et al.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
HDF5 Q4 Demo. Architecture Friday, May 10, 2013 Friday Seminar2.
COS PIPELINE CDR Jim Rose July 23, 2001OPUS Science Data Processing Space Telescope Science Institute 1 of 12 Science Data Processing
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.
The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015.
MP-PIPE for Soybean Proteome Brad Barnes 27/11/15 COMP 5704.
11/8/2007HDF and HDF-EOS Workshop XI, Landover, MD1 Software to access HDF5 Datasets via OPeNDAP MuQun Yang, Hyo-Kyung Lee The HDF Group.
E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen
COMPASS Computerized Analysis and Storage Server Iain Last.
Unit C-Hardware & Software1 GNVQ Foundation Unit C Bits & Bytes.
QEMU, a Fast and Portable Dynamic Translator Fabrice Bellard (affiliation?) CMSC 691 talk by Charles Nicholas.
PIDX PIDX - a parallel API to capture the data models used by HPC application and write it out in an IDX format. PIDX enables simulations to write out.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
The HDF Group Introduction to HDF5 Session Two Data Model Comparison HDF5 File Format 1 Copyright © 2010 The HDF Group. All Rights Reserved.
NetCDF Data Model Details Russ Rew, UCAR Unidata NetCDF 2009 Workshop
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
Measuring Performance Based on slides by Henri Casanova.
The HDF Group Introduction to HDF5 Session ? High Performance I/O 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Chapter 3 Data Representation
Lecture 2: Performance Today’s topics:
Transition from HDF4 to HDF5: Issues
Moving from HDF4 to HDF5/netCDF-4
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
HDF5 Metadata and Page Buffering
What NetCDF users should know about HDF5?
Access HDF5 Datasets via OPeNDAP’s Data Access Protocol (DAP)
Presentation transcript:

FITSIO, HDF4, NetCDF, PDB and HDF5 Performance Some Benchmarks Results Elena Pourmal Science Data Processing Workshop February 27, 2002

Benchmark Environment (software) Software –HDF4 r1.5 –HDF –NetCDF 3.5 –FITSIO version 2.2 –PDB version 8_7_01 –‘System” benchmark uses open, write, read and close UNIX functions. each measurement was taken 10 times, best times were collected

Benchmark Environment (hardware) Mhz Pentium III Xeon (Linux smp) –1G memory NCSA O2K (IRIX64-6.5) –195Mhz MIPS R10000 –14GB memory –Peak performance 390 MFLOPS

Benchmarks Creating and writing contiguous dataset; sizes vary from 2MB to 512MB Reading contiguous dataset; sizes vary from 2MB to 256MB Reading contiguous hyperslab; sizes vary from 1MB to 64MB Reading every second element of the hyperslab; sizes of selections vary from 0.25MB to 16MB Creating and writing up to MB datasets; reading back the dataset created last

Some remarks “dataset” describes array stored in the FITS, HDF4, HDF5, PDB, NetCDF and UNIX binary files, i.e. “dataset” means –“primary array” and “extension” for FITSIO – “variable” for NetCDF –“SDS or scientific data set” for HDF4 –“HDF5 dataset” –“PDB variable” –raw data stored in UNIX binary file gettimofday function was used to measure time Speed calculated as data buffer size over time

Creating and Writing Contiguous Dataset In this test we created a file and stored two dimensional array of short unsigned integers; size of array varied from 2MB and up to 512MB We measured –Total time to create a file create a dataset write a dataset close the dataset and the file –Time to write dataset only

Speed ratios for writing dataset on IRIX

Speed ratios for writing dataset on Linux

Reading Contiguous Dataset In this test we created two dimensional array of short unsigned integers than we read it back; size of array varied from 2MB and up to 512MB We measured –Total time to open a file open a dataset read a dataset close the dataset and the file –Time to read dataset only

Speed ratios for reading dataset on IRIX

Speed ratios for reading dataset on Linux

Reading Contiguous Hyperslab of the Dataset In this test we created two dimensional array of short unsigned integers and than read contiguous hyperslab of the dataset; size of the dataset was up 256 MB and size of the hyperslab varied from 1MB up to 64 MB We measured –Total time to open a file, dataset, select and read hyperslab, close the dataset and the file –Time to read hyperslab only

Speed ratios for reading contiguous hyperslab on IRIX

Reading Every Second Element in the Hyperslab In this test we created 256 MB two dimensional array of short unsigned integers; then we read read back every second element of the selected hyperslab We measured –Total time to open a file and dataset, select and read every second element of the hyperslab, close the file and dataset –Time to read selection only

Speed ratios for reading every second element of contiguous hyperslab on IRIX

Creating and Writing Multiple Datasets In this test we created up to MB two dimensional datasets of short unsigned integers; then we read the last created dataset We measured –Time to create a file create and write N datasets close all datasets and the file –Time to open the file, read N-th dataset and close the file

Conclusions HDF5 is times faster when performs native write/read HDF5 needs some tuning when datatype conversion is used When contiguous subsetting is used, HDF5 performs several times better than FITSIO, HDF4 and PDB and achieves about 80% of NetCDF performance HDF5 is an order of magnitude faster in accessing datasets within the file with many objects