Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.

Slides:



Advertisements
Similar presentations
Refining High Performance FORTRAN Code from Programming Model Dependencies Ferosh Jacob University of Alabama Department of Computer Science
Advertisements

Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. Seminar Data-Intensive Scalable Computing Laboratory (DISCL) Locality-driven High-level.
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
Parallel I/O Performance Study Christian Chilan The HDF Group September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
I/O Analysis and Optimization for an AMR Cosmology Simulation Jianwei LiWei-keng Liao Alok ChoudharyValerie Taylor ECE Department Northwestern University.
Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
HDF4 and HDF5 Performance Preliminary Results Elena Pourmal IV HDF-EOS Workshop September
NetCDF4 Performance Benchmark. Part I Will the performance in netCDF4 comparable with that in netCDF3? Will the performance in netCDF4 comparable with.
HDF5 collective chunk IO A Working Report. Motivation for this project ► Found extremely bad performance of parallel HDF5 when implementing WRF- Parallel.
Status of netCDF-3, netCDF-4, and CF Conventions Russ Rew Community Standards for Unstructured Grids Workshop, Boulder
Introduction to FORTRAN
Developing a NetCDF-4 Interface to HDF5 Data
Introduction to NetCDF4 MuQun Yang The HDF Group 11/6/2007HDF and HDF-EOS Workshop XI, Landover, MD.
Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.
Spreadsheet-Based Decision Support Systems Chapter 22:
NetCDF-4 The Marriage of Two Data Formats Ed Hartnett, Unidata June, 2004.
Developing a NetCDF-4 Interface to HDF5 Data Russ Rew (PI), UCAR Unidata Mike Folk (Co-PI), NCSA/UIUC Ed Hartnett, UCAR Unidata Quincey Kozial, NCSA/UIUC.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
HDF5 A new file format & software for high performance scientific data management.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
Upgrade to Real Time Linux Target: A MATLAB-Based Graphical Control Environment Thesis Defense by Hai Xu CLEMSON U N I V E R S I T Y Department of Electrical.
The HDF Group Parallel HDF5 Design and Programming Model May 30-31, 2012HDF5 Workshop at PSI 1.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
Development of ORBIT Data Generation and Exploration Routines G. Shelburne K. Indireshkumar E. Feibush.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Using HDF5 in WRF Part of MEAD - an alliance expedition.
The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
11/7/2007HDF and HDF-EOS Workshop XI, Landover, MD1 HDF5 Software Process MuQun Yang, Quincey Koziol, Elena Pourmal The HDF Group.
Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
1 HDF5 Life cycle of data Boeing September 19, 2006.
A High performance I/O Module: the HDF5 WRF I/O module Muqun Yang, Robert E. McGrath, Mike Folk National Center for Supercomputing Applications University.
I/O on Clusters Rajeev Thakur Argonne National Laboratory.
NetCDF Data Model Issues Russ Rew, UCAR Unidata NetCDF 2010 Workshop
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
HDF Hierarchical Data Format Nancy Yeager Mike Folk NCSA University of Illinois at Urbana-Champaign, USA
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.
Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest.
Using IOR to Analyze the I/O Performance
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
July 20, Update on the HDF5 standardization effort Elena Pourmal, Mike Folk The HDF Group July 20, 2006 SPG meeting, Palisades, NY.
The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015.
An HDF5-WRF module -A performance report MuQun Yang, Robert E. McGrath, Mike Folk National Center for Supercomputing Applications University of Illinois,
FITSIO, HDF4, NetCDF, PDB and HDF5 Performance Some Benchmarks Results Elena Pourmal Science Data Processing Workshop February 27, 2002.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
ESMF,WRF and ROMS. Purposes Not a tutorial Not a tutorial Educational and conceptual Educational and conceptual Relation to our work Relation to our work.
Parallel NetCDF Rob Latham Mathematics and Computer Science Division Argonne National Laboratory
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
1 Data Management with HDF5 Quincey Koziol Director of Core Software Development and HPC The HDF Group September 10, 2012NASA Digital.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
PIDX PIDX - a parallel API to capture the data models used by HPC application and write it out in an IDX format. PIDX enables simulations to write out.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
Development of a CF Conventions API Russ Rew GO-ESSP Workshop, LLNL
NetCDF Data Model Details Russ Rew, UCAR Unidata NetCDF 2009 Workshop
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
Moving from HDF4 to HDF5/netCDF-4
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
Plans for an Enhanced NetCDF-4 Interface to HDF5 Data
CSCE 990: Advanced Distributed Systems
What NetCDF users should know about HDF5?
Presentation transcript:

Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike Folk, Leon Arber The HDF Group Champaign, IL 61820

2 Outline Introduction to parallel I/O library HDF5,netCDF,netCDF4 Parallel HDF5 and Parallel netCDF performance comparison Parallel netCDF4 and Parallel netCDF performance comparison Collective I/O optimizations inside HDF5 Conclusion

3 I/O process for Parallel Application P0P1P2P3 P0P1P2P3 I/O library File System I/O library P0 becomes bottleneck May exceed system memory May achieve good performance Needs post-processing(?) More work for applications

4 I/O process for Parallel Application P0P1P2P3 Parallel File System Parallel I/O libraries P0P1P2P3 Parallel File System Parallel I/O libraries

5 HDF5 and netCDF provide data formats and programming interfaces HDF5  Hierarchical file structures  Flexible data model  Many Features In-memory compression filters Chunked storage Parallel I/O through MPI-IO NetCDF  Linear data layout A parallel version of netCDF from ANL/Northwestern U. (PnetCDF) provide support for parallel access on top of MPI-IO HDF5 VS netCDF

6 Overview of NetCDF4 Advantages: New features provided by HDF5:  More than one unlimited dimension  Various compression filters  Complicate data type such as struct or array datatype  Parallel IO through MPI-IO NetCDF-user friendly APIs Long-term maintenance and distribution Potential larger user community Disadvantage: Install HDF5 library

7 An example for collective I/O Every processor has a noncontiguous selection. Access requests are interleaved. Write operation with 32 processors, each processor selection has 512K rows and 8 columns (32 MB/proc.)  Independent I/O: 1, s.  Collective I/O: 4.33 s. P0P1P2P3 P0 P1 P2P3 P0P1P2P3 Row 1 Row 2 Row-major data layout … ……

8 Outline Introduction to parallel I/O library HDF5,netCDF,netCDF4 Parallel HDF5 and Parallel netCDF performance comparison Parallel netCDF4 and Parallel netCDF performance comparison Collective I/O optimizations inside HDF5 Conclusion

9 Parallel HDF5 and PnetCDF performance comparison Previous Study:  PnetCDF claims higher performance than HDF5 NCAR Bluesky  Power4 LLNL uP  Power5 PnetCDF vs. HDF

10 HDF5 and PnetCDF performance comparison Benchmark is the I/O kernel of FLASH. FLASH I/O generates 3D blocks of size 8x8x8 on Bluesky and 16x16x16 on uP. Each processor handles 80 blocks and writes them into 3 output files. The performance metric given by FLASH I/O is the parallel execution time. The more processors, the larger the problem size.

11 Previous HDF5 and PnetCDF Performance Comparison at ASCI White (From Flash I/O website)

12 HDF5 and PnetCDF performance comparison Bluesky: Power 4uP: Power 5

13 HDF5 and PnetCDF performance comparison Bluesky: Power 4uP: Power 5

14 ROMS Regional Oceanographic Modeling System Supports MPI and OpenMP I/O in NetCDF History file writer in parallel Data:  60 1D-4D double-precision float and integer arrays

15 PnetCDF4 and PnetCDF performance comparison Fixed problem size = 995 MB Performance of PnetCDF4 is close to PnetCDF

16 ROMS Output with Parallel NetCDF4 The IO performance gets improved as the file size increases. It can provide decent I/O performance for big problem size.

17 Outline Introduction to parallel I/O library HDF5,netCDF,netCDF4 Parallel HDF5 and Parallel netCDF performance comparison Parallel netCDF4 and Parallel netCDF performance comparison Collective I/O optimizations inside HDF5 Conclusion

18 Improvements of collective IO supports inside HDF5 Advanced HDF5 feature: non-regular selections Performance optimizations: chunked storage  Provide several IO options to achieve good collective IO performance  Provide APIs for applications to participate in the optimization process

19 Improvement 1 HDF5 non-regular selections Only one HDF5 IO call Good for collective IO 2-D array with the IO in shaded selections

20 HDF5 chunked storage Performance issue: Severe performance penalties with many small chunk IOs Required for extendable data variables Required for filters Better subsetting access time For more information about chunking:

21 P0 chunk 1chunk 2 chunk 3chunk 4 P1 MPI-IO Collective View Improvement 2: One linked chunk IO One MPI Collective IO call

22 Improvement 3: Multi-chunk IO Optimization Have to keep the option to do collective IO per chunk  Collective IO bugs inside different MPI-IO packages  Limitation of system memory Problem  Bad performance caused by improper use of collective IO P0 P1 P4 P5 P2 P3 P6 P7 P0 P1 P4 P5 P2 P3 P6 P7 Just use independent IO

23 Improvement 4 Problem  HDF5 may not have enough information to make the correct decision about the way to do collective IO Solution  Provide APIs for applications to participate in the decision-making process

24 Decision-making about One linked-chunk IO Multi-chunk IO Decision-making about Collective IO One Collective IO call for all chunksIndependent IO Optional User Input Flow chart of Collective Chunking IO improvements inside HDF5 Collective chunk mode Yes NO Yes NO Collective IO per chunk

25 For the detailed about performance study and optimization inside HDF5:

26 Conclusions HDF5 provides collective IO supports for non-regular selections Supporting collective IO for chunked storage is not trivial. Users can participate in the decision-making process that selects different IO options. I/O Performance is quite comparable when parallel NetCDF and parallel HDF5 libraries are used in similar manners. I/O performance of parallel NetCDF4 is compatible with parallel NetCDF with about 15% slowness in average for the output of ROMS history file. We suspect that the slowness is due to the software management when passing information from parallel NetCDF4 to HDF5.

27 Acknowledgments This work is funded by National Science Foundation Teragrid grants, the Department of Energy's ASC Program, the DOE SciDAC program, NCSA, and NASA.