NetCDF4 Performance Benchmark. Part I Will the performance in netCDF4 comparable with that in netCDF3? Will the performance in netCDF4 comparable with.

Slides:



Advertisements
Similar presentations
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
Advertisements

1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 5 Program Design and Analysis.
MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES D. Colarelli D. Grunwald U. Colorado, Boulder.
The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF5 Advanced Topics Elena Pourmal The HDF Group The 13 th HDF and HDF-EOS.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
areaDetector Developments
Parallel I/O Performance Study Christian Chilan The HDF Group September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1.
Making earth science data more accessible: experience with chunking and compression Russ Rew January rd Annual AMS Meeting Austin, Texas.
Memory System Performance October 29, 1998 Topics Impact of cache parameters Impact of memory reference patterns –matrix multiply –transpose –memory mountain.
Chapter 7: Configuring Disks. 2/24 Objectives Learn about disk and file system configuration in Vista Learn how to manage storage Learn about the additional.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
Chapter 7: Configuring Disks. Configuring File Systems Fat32 –First used with Windows 95 OSR2 –Smaller cluster sizes, more efficient storage up to 32.
HDF4 and HDF5 Performance Preliminary Results Elena Pourmal IV HDF-EOS Workshop September
THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM J. Wilkes, R. Golding, C. Staelin T. Sullivan HP Laboratories, Palo Alto, CA.
© Toni Cortes Improving Application Performance through Swap Compression R. Cervera, T. Cortes, Y. Becerra and S. Lucas.
HDF5 collective chunk IO A Working Report. Motivation for this project ► Found extremely bad performance of parallel HDF5 when implementing WRF- Parallel.
Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.
HDF 1 HDF5 Advanced Topics Object’s Properties Storage Methods and Filters Datatypes HDF and HDF-EOS Workshop VIII October 26, 2004.
July, 2001 High-dimensional indexing techniques Kesheng John Wu Ekow Otoo Arie Shoshani.
Indexed Files Part One - Simple Indexes All of this material is stolen from Dr. Foster's CSCI325 Course Notes.
Some VM Complications Extra memory accesses Page tables are huge
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
HDF5 A new file format & software for high performance scientific data management.
Sep , 2010HDF/HDF-EOS Workshop XIV1 HDF5 Advanced Topics Neil Fortner The HDF Group The 14 th HDF and HDF-EOS Workshop September 28-30, 2010.
The HDF Group Parallel HDF5 Design and Programming Model May 30-31, 2012HDF5 Workshop at PSI 1.
8.4 paging Paging is a memory-management scheme that permits the physical address space of a process to be non-contiguous. The basic method for implementation.
The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1.
WMPI 2006, Austin, Texas © 2006 John C. Koob An Empirical Evaluation of Semiconductor File Memory as a Disk Cache John C. Koob Duncan G. Elliott Bruce.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
1 N-bit and ScaleOffset filters MuQun Yang National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Urbana, IL
1 HDF5 Life cycle of data Boeing September 19, 2006.
Deconstructing Storage Arrays Timothy E. Denehy, John Bent, Florentina I. Popovici, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin,
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
WMPI 2006, Austin, Texas © 2006 John C. Koob An Empirical Evaluation of Semiconductor File Memory as a Disk Cache John C. Koob Duncan G. Elliott Bruce.
Journaled Component Files John Scholes and Richard Smith 13 October, 2008 Or – How to never see FILE DAMAGED again!
HDF5 Q4 Demo. Architecture Friday, May 10, 2013 Friday Seminar2.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.
The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015.
An HDF5-WRF module -A performance report MuQun Yang, Robert E. McGrath, Mike Folk National Center for Supercomputing Applications University of Illinois,
May 30-31, 2012 HDF5 Workshop at PSI May Partial Edge Chunks Dana Robinson The HDF Group Efficient Use of HDF5 With High Data Rate X-Ray Detectors.
March 9, th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics.
FITSIO, HDF4, NetCDF, PDB and HDF5 Performance Some Benchmarks Results Elena Pourmal Science Data Processing Workshop February 27, 2002.
The HDF Group 10/17/15 1 HDF5 vs. Other Binary File Formats Introduction to the HDF5’s most powerful features ICALEPCS 2015.
Intro to Parallel HDF5 10/17/151ICALEPCS /17/152 Outline Overview of Parallel HDF5 design Parallel Environment Requirements Performance Analysis.
April 28, 2008LCI Tutorial1 Parallel HDF5 Tutorial Tutorial Part IV.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
- 1 - Overview of Parallel HDF Overview of Parallel HDF5 and Performance Tuning in HDF5 Library NCSA/University of Illinois at Urbana- Champaign.
IO Best Practices For Franklin Katie Antypas User Services Group NERSC User Group Meeting September 19, 2007.
The HDF Group Introduction to HDF5 Session ? High Performance I/O 1 Copyright © 2010 The HDF Group. All Rights Reserved.
A Comparison of File System Workloads D. Roselli J. Lorch T. Anderson University of California, Berkeley.
HDF5 and casacore Ger van Diepen ASTRON.
Moving from HDF4 to HDF5/netCDF-4
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
HDF5 Metadata and Page Buffering
Head-to-Head Xilinx Virtex-II Pro Altera Stratix 1.5v 130nm copper
Introduction to HDF5 Tutorial.
Efficiently serving HDF5 via OPeNDAP
CSCE 990: Advanced Distributed Systems
What NetCDF users should know about HDF5?
Chapter 8: Main Memory.
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
CSCI206 - Computer Organization & Programming
Figure 11.1 A basic personal computer system
CS 585 Summer 2002 By Robert Moncrief II
Highly Compressed 82MB 1 =---====""- ·-*i.
Presentation transcript:

NetCDF4 Performance Benchmark

Part I Will the performance in netCDF4 comparable with that in netCDF3? Will the performance in netCDF4 comparable with that in netCDF3?

Configurations Dataset Dataset 40 MB: 6 files 40 MB: 6 files 1 MB: 6 files 1 MB: 6 files Storage Layout Storage Layout Contiguous Contiguous Chunked (HDF5 default cache size: 1 MB) Chunked (HDF5 default cache size: 1 MB) Chunked (HDF5 cache size: 64 MB) Chunked (HDF5 cache size: 64 MB) System Cache System Cache

System Cache On On Use all caches and buffers provided by kernel Use all caches and buffers provided by kernel Drop Drop “drop_caches” to read data from disk “drop_caches” to read data from disk “fsync” to write data into disk “fsync” to write data into disk

10 cases Dataset Storage Layout System Cache 1 40 MB contiguouson 2 contiguousdrop 3 chunked (64 MB cache) on 4 40 MB chunked (64 MB cache) drop 5 40 MB chunked (1 MB cache) on 6 40 MB chunked (1 MB cache) drop 7 1 MB contiguouson 8 contiguousdrop 9 chunked (1 MB cache) on 10 1 MB chunked (1 MB cache) drop

Default Hyperslab One big hyperslab is selected One big hyperslab is selected

1. Contiguous layout with cache Dataset Storage Layout System Cache ≈ 40 MB contiguouson

2. Contiguous layout w/o cache Dataset Storage Layout System Cache ≈ 40 MB contiguousdrop

3. Chunked layout with cache Dataset Storage Layout System Cache ≈ 40 MB chunked (HDF5 cache size: 64 MB) on

4. Chunked layout w/o cache Dataset Storage Layout System Cache ≈ 40 MB chunked (HDF5 cache size: 64 MB) drop

5. Chunked layout with cache Dataset Storage Layout System Cache ≈ 40 MB chunked (HDF5 default cache size: 1 MB) on

H5Pset_alloc_time(EARLY) Dataset Storage Layout System Cache ≈ 40 MB chunked (HDF5 default cache size: 1 MB) on H5Pset_alloc_time(EARLY)

6. Chunked layout w/o cache Dataset Storage Layout System Cache ≈ 40 MB chunked (HDF5 default cache size: 1 MB) drop

7. Contiguous layout with cache Dataset Storage Layout System Cache ≈ 1 MB contiguouson

8. Contiguous layout w/o cache Dataset Storage Layout System Cache ≈ 1 MB contiguousdrop

9. Chunked layout with cache Dataset Storage Layout System Cache ≈ 1 MB chunked (HDF5 default cache size: 1 MB) on

10. Chunked layout w/o cache Dataset Storage Layout System Cache ≈ 1 MB chunked (HDF5 default cache size: 1 MB) drop

Part II Can I get better performance with netCDF4? If yes, under what circumstances can I get better performance? Can I get better performance with netCDF4? If yes, under what circumstances can I get better performance?

Non-contiguous Access Logical layout for 2-dimensional arrays Logical layout for 2-dimensional arrays

Non-contiguous Access Physical layout Physical layout Chunk size [16384][1] Chunk size [8192][1] Chunk size [4096][1]

11. Non-contiguous Access Dataset Storage Layout System Cache ≈ 16 MB contiguous; chunked (default chunk cache) drop

12. Chunked layout with cache Dataset Storage Layout System Cache ≈ 40 MB chunked (chunk cache varies) on

13. Compression Dataset Storage Layout System Cache Radar data chunked (default chunk cache) drop

13. Compression Compression ratio Compression ratio DatasetUncompressedCompressed Compression Ratio Tile172,132,8923,432,55921 Tile272,132,8925,129,48214 Tile372,132,8923,069,25423

Part III Can netCDF4 performance be bad? How can I avoid the bad performance? Can netCDF4 performance be bad? How can I avoid the bad performance?

14. Chunk size Too small chunk size is bad Too small chunk size is bad Little bit smaller than  (number of elements) / N  is bad Little bit smaller than  (number of elements) / N  is bad

14. Chunk size chunk0chunk1chunk2chunk3chunk0chunk1chunk2chunk3chunk4chunk5 chunk6chunk7chunk dataset chunk

14. Chunk size Dataset ≈ 64 MB Storage Layout chunked (default chunk cache) System Cache drop

14. Chunk size (more) Dataset ≈ 64 MB Storage Layout chunked (default chunk cache) System Cache drop  n   n  + 1  n  - 1

15. Many Hyperslab selections H5Pcreate() H5Dopen()

15. Many Hyperslab selections

Conclusion The performance in netCDF4 is comparable with that in netCDF3 The performance in netCDF4 is comparable with that in netCDF3 Improvement Improvement Non-contiguous access pattern Non-contiguous access pattern Adjusted cache size Adjusted cache size Compression Compression Pitfall Pitfall Small chunk size Small chunk size Many small hyperslab selections Many small hyperslab selections