Www.hdfgroup.org The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015.

Slides:



Advertisements
Similar presentations
Introduction to Database Systems1 Buffer Management Storage Technology: Topic 2.
Advertisements

Part IV: Memory Management
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
The HDF Group November 3-5, 2009HDF/HDF-EOS Workshop XIII1 HDF5 Advanced Topics Elena Pourmal The HDF Group The 13 th HDF and HDF-EOS.
Chapter 101 Cleaning Policy When should a modified page be written out to disk?  Demand cleaning write page out only when its frame has been selected.
Segmentation and Paging Considerations
Making earth science data more accessible: experience with chunking and compression Russ Rew January rd Annual AMS Meeting Austin, Texas.
Chapter 11: File System Implementation
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Virtual Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
Disk Access Model. Using Secondary Storage Effectively In most studies of algorithms, one assumes the “RAM model”: –Data is in main memory, –Access to.
File System Implementation
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
1 Lecture 14: Virtual Memory Topics: virtual memory (Section 5.4) Reminders: midterm begins at 9am, ends at 10:40am.
1 Virtual Memory vs. Physical Memory So far, all of a job’s virtual address space must be in physical memory However, many parts of programs are never.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Computer Organization and Architecture
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
HDF4 and HDF5 Performance Preliminary Results Elena Pourmal IV HDF-EOS Workshop September
CSCI2413 Lecture 6 Operating Systems Memory Management 2 phones off (please)
NetCDF4 Performance Benchmark. Part I Will the performance in netCDF4 comparable with that in netCDF3? Will the performance in netCDF4 comparable with.
HDF 1 HDF5 Advanced Topics Object’s Properties Storage Methods and Filters Datatypes HDF and HDF-EOS Workshop VIII October 26, 2004.
Review of Memory Management, Virtual Memory CS448.
CMPE 421 Parallel Computer Architecture
The Metadata Cache in HDF5 Changes in the HDF5 metadata cache since
Chapter 5 Operating System Support. Outline Operating system - Objective and function - types of OS Scheduling - Long term scheduling - Medium term scheduling.
The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
HDF 1 New Features in HDF Group Revisions HDF and HDF-EOS Workshop IX November 30, 2005.
April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II.
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
1 HDF5 Life cycle of data Boeing September 19, 2006.
May 30-31, 2012 HDF5 Workshop at PSI May Shared Object Headers Dana Robinson The HDF Group Efficient Use of HDF5 With High Data Rate X-Ray Detectors.
1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?
September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 Introduction to HDF5 Command-line Tools.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
May 30-31, 2012 HDF5 Workshop at PSI May Partial Edge Chunks Dana Robinson The HDF Group Efficient Use of HDF5 With High Data Rate X-Ray Detectors.
March 9, th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics.
The HDF Group 10/17/15 1 HDF5 vs. Other Binary File Formats Introduction to the HDF5’s most powerful features ICALEPCS 2015.
The HDF Group Single Writer/Multiple Reader (SWMR) 110/17/15.
The HDF Group Introduction to HDF5 Session 7 Datatypes 1 Copyright © 2010 The HDF Group. All Rights Reserved.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
The HDF Group Introduction to HDF5 Session ? High Performance I/O 1 Copyright © 2010 The HDF Group. All Rights Reserved.
University of Maryland Baltimore County
Elena Pourmal The HDF Group
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
CS703 - Advanced Operating Systems
Informatica PowerCenter Performance Tuning Tips
HDF5 Metadata and Page Buffering
Introduction to HDF5 Tutorial.
File System Structure How do I organize a disk into a file system?
Some Real Problem What if a program needs more memory than the machine has? even if individual programs fit in memory, how can we run multiple programs?
CSCE 990: Advanced Distributed Systems
Evaluation of Relational Operations
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Computer Architecture
CSE 451: Operating Systems Autumn 2005 Memory Management
CS 3410, Spring 2014 Computer Science Cornell University
Hierarchical Data Format (HDF) Status Update
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CENG 351 Data Management and File Structures
CSE 542: Operating Systems
Presentation transcript:

The HDF Group HDF5 Chunking and Compression Performance tuning 10/17/15 1 ICALEPCS 2015

Goal 10/17/15 To help you with understanding how HDF5 chunking and compression works, so you can efficiently store and retrieve data from HDF5 files 2 ICALEPCS 2015

Problem 10/17/153 SCRIS_npp_d _t _e _b13293_c _noaa_pop.h5 DATASET "/All_Data/CrIS-SDR_All/ES_ImaginaryLW" { DATATYPE H5T_IEEE_F32BE DATASPACE SIMPLE { ( 60, 30, 9, 717 ) / ( H5S_UNLIMITED, H5S_UNLIMITED, H5S_UNLIMITED, H5S_UNLIMITED ) } STORAGE_LAYOUT { CHUNKED ( 4, 30, 9, 717 ) SIZE } Dataset is read once, by contiguous 1x1x1x717 selections, i.e., 717 elements times. The time it takes to read the whole dataset is in the table below: Compressed with GZIP level 6 No compression ~345 seconds~0.1 seconds ICALEPCS 2015

Solutions 10/17/15 Performance may depend on many factors such as I/O access patterns, chunk sizes, chunk layout, chunk cache, memory usage, compression, etc. Solutions discussed next are oriented for a particular use case and access patterns: Reading entire dataset once by a contiguous selection along the fastest changing dimension(s) for a specified file. The troubleshooting approach should be applicable to a wider variety of files and access patterns. 4 ICALEPCS 2015

Solution (Data Consumers) 10/17/155 Increase chunk cache size Tune application to use appropriate HDF5 chunk cache size for each dataset to read For our example dataset, we increased chunk cache size to 3MB - big enough to hold one 2.95 MB chunk Compressed with GZIP level 6 1MB cache (default) No compression 1MB (default) or 3MB cache Compressed with GZIP level 6 3MB cache ~345 seconds~0.09 seconds~0.37 seconds ICALEPCS 2015

Solution (Data Consumers) 10/17/156 Change access pattern Keep default cache size (1MB) Tune the application to use an appropriate HDF5 access pattern We read our example dataset using a selection that corresponds to the whole chunk 4x9x30x717 Compressed with GZIP level 6 Selection 1x1x1`x717 No compression Selection 1x1x1`x717 No compression Selection 4x9x30x717 Compressed with GZIP level 6 Selection 4x9x30x717 ~345 seconds ~0.1 seconds~0.04 seconds~0.36 seconds ICALEPCS 2015

Solution (Data Providers) 10/17/157 Change chunk size Write original files with the smaller chunk size We recreated our example dataset using chunk size 1x30x9x717 (~0.74MB) We used default cache size 1MB Read by 1x1x1x717 selections times Performance improved 1000 times Compressed with GZIP level 6 chunk size 4x9x30x717 No compression Selection 4x9x30x717 No compression chunk size 1x9x30x717 Compressed with GZIP level 6 chunk size 1x9x30x717 ~345 seconds ~0.04 seconds ~0.08 seconds ~0.36 seconds ICALEPCS 2015

Outline HDF5 chunking overview HDF5 chunk cache Case study or how to avoid performance pitfalls Other considerations Compression methods Memory usage 10/17/158ICALEPCS 2015

HDF5 CHUNKING OVERVIEW Reminder 10/17/159ICALEPCS 2015

What is HDF5 chunking? Data is stored in a file in chunks of predefined size Contiguous Chunked ICALEPCS 2015

Why HDF5 chunking? Chunking is required for several HDF5 features -Expanding/shrinking dataset dimensions and adding/”deleting” data -Applying compression and other filters like checksum -Example of the sizes with applied compression for our example file Original sizeGZIP level 6Shuffle and GZIP level MB196.6 MB138.2 MB ICALEPCS 2015

JPSS chunking strategy JPSS uses granule size as chunk size “ES_ImaginaryLW” is stored using 15 chunks with the size 4x30x9x717 …….…….. ICALEPCS 2015

FAQ 10/17/15 Can one change chunk size after a dataset is created in a file? No; use h5repack to change a storage layout or chunking/compression parameters How to choose chunk size? Next slide… 13 ICALEPCS 2015

Pitfall – chunk size Chunks are too small File has too many chunks Extra metadata increases file size Extra time to look up each chunk More I/O since each chunk is stored independently Larger chunks results in fewer chunk lookups, smaller file size, and fewer I/O operations 10/17/1514ICALEPCS 2015

Pitfall – chunk size Chunks are too large Entire chunk has to be read and uncompressed before performing any operations Great performance penalty for reading a small subset Entire chunk has to be in memory and may cause OS to page memory to disk, slowing down the entire system 10/17/1515ICALEPCS 2015

HDF5 CHUNK CACHE 10/17/1516ICALEPCS 2015

HDF5 chunk cache documentation 10/17/ ICALEPCS 2015

HDF5 raw data chunk cache The only raw data cache in HDF5 Chunk cache is per dataset Improves performance whenever the same chunks are read or written multiple times (see next slide) 10/17/1518 ICALEPCS 2015

10/17/1519 Chunked Dataset I/O CB A Datatype conversion is performed before chunked placed in cache on write Datatype conversion is performed after chunked is placed in application buffer Chunk is written when evicted from cache Compression and other filters are applied on eviction or on bringing chunk into cache ABC C HDF5 File Chunk cacheChunked dataset Filter pipeline Application memory space DT conversion ICALEPCS 2015

Example: reading row selection 10/17/1520 Chunks in HDF5 file Chunk cache Application buffer A B C A B C ICALEPCS 2015

CASE STUDY Better to See Something Once Than Hear About it Hundred Times 10/17/1521ICALEPCS 2015

Case study We now look more closely into the solutions presented on the slides Increasing chunk cache size Changing access pattern Changing chunk size 10/17/1522 ICALEPCS 2015

CHUNK CACHE SIZE When chunk doesn’t fit into chunk cache 10/17/1523ICALEPCS 2015

HDF5 library behavior When chunk doesn’t fit into chunk cache: Chunk is read, uncompressed, selected data converted to the memory datatype and copied to the application buffer. Chunk is discarded. 10/17/1524 ICALEPCS 2015

Chunk cache size case study: Before 10/17/1525 Gran_1 Chunk cache Application buffer Gran_1 Chunks in HDF5 file Gran_2 Gran_15 …………… Discarded after every H5Dread H5Dread ICALEPCS 2015

What happens in our case? When chunk doesn’t fit into chunk cache: Chunk size is 2.95MB and cache size is 1MB If read by (1x1x1x717) selection, chunk is read and uncompressed 1080 times. For 15 chunks we perform 16,200 read and decode operations. When chunk does fit into chunk cache: Chunk size is 2.95MB and cache size is 3MB If read by (1x1x1x717) selection, chunk is read and uncompressed only once. For 15 chunks we perform 15 read and decode operations. How to change chunk cache size? 10/17/1526 ICALEPCS 2015

HDF5 chunk cache APIs H5Pset_chunk_cache sets raw data chunk cache parameters for a dataset - H5Pset_chunk_cache (dapl, …); H5Pset_cache sets raw data chunk cache parameters for all datasets in a file -H5Pset_cache (fapl, …); Other parameters to control chunk cache -nbytes – total size in bytes (1MB) -nslots – number of slots in a hash table (521)  w0 – preemption policy (0.75) 10/17/1527 ICALEPCS 2015

Chunk cache size case study: After 10/17/1528 Gran_1 Chunk cache Application buffer Gran_1 Chunks in HDF5 file Gran_2 Gran_15 …………… Chunk stays in cache until all data is read and copied. It is discarded to bring in new chunk. H5Dread ICALEPCS 2015

ACCESS PATTERN What else can be done except changing the chunk cache size? 10/17/1529ICALEPCS 2015

HDF5 Library behavior When chunk doesn’t fit into chunk cache but selection is a whole chunk: If applications reads by the whole chunk (4x30x9x717) vs. by (1x1x1x717) selection, chunk is read and uncompressed once. For 15 chunks we have only 15 read and decode operations (compare with 16,200 before!) Chunk cache is “ignored”. 10/17/1530 ICALEPCS 2015

Access pattern case study 10/17/1531 Chunk cache Application buffer Gran_1 Chunks in HDF5 file Gran_2 Gran_15 …………… All data in chunk is copied to application buffer before chunk is discarded Gran_1 ICALEPCS 2015

CHUNK SIZE Can I create an “application friendly” data file? 10/17/1532ICALEPCS 2015

HDF5 Library behavior When datasets are created with chunks < 1MB Chunk fits into default chunk cache No need to modify reading applications! 10/17/1533 ICALEPCS 2015

Small chunk size case study 10/17/1534 Gran_1 Chunk cache Application buffer Gran_1 Chunks in HDF5 file Gran_2 Gran_15 …………… Chunk fits into cache. Chunk stays in cache until all data is read and copied. It is discarded to bring in new chunk in. ICALEPCS 2015

SUMMARY Points to remember for data consumers and data producers 10/17/1535ICALEPCS 2015

Effect of cache and chunk sizes on read When compression is enabled, the library must always read entire chunk once for each call to H5Dread unless it is in cache. When compression is disabled, the library’s behavior depends on the cache size relative to the chunk size. If the chunk fits in cache, the library reads entire chunk once for each call to H5Dread unless it is in cache. If the chunk does not fit in cache, the library reads only the data that is selected More read operations, especially if the read plane does not include the fastest changing dimension Less total data read 10/17/1536 ICALEPCS 2015

OTHER CONSIDERATIONS 10/17/1537ICALEPCS 2015

Compression methods Choose compression method appropriate for your data HDF5 compression methods GZIP, SZIP, n-bit, scale-offset Can be used with the shuffle filter to get a better compression ratio; for example for “ES_NEdNSW” dataset (uncomp/comp) ratio Applied for all datasets the total file size changes from 196.6MB to 138.2MB (see slide 13) 10/17/1538 “ES_NEdNSW” compressed with GZIP level 6 “ES_NEdNSW” compressed with shuffle and GZIP level ICALEPCS 2015

Word of caution Some data cannot be compressed well. Find an appropriate method or don’t use compression at all to save processing time. Let’s look at compression ratios for the datasets in our example file. 10/17/1539ICALEPCS 2015

Example: Compression ratios 10/17/1540 Dataset nameCompression ratio with GZIP level 6 Compression ratio with shuffle and GZIP level 6 ES_ImaginaryLW ES_ImaginaryMW ES_ImaginarySW ES_NedNLW ES_NedNMW ES_NedNSW ES_RealLW ES_RealMW ES_RealSW Use h5dump –pH filename.h5 to see compression information Compression ratio = uncompressed size/compressed size ICALEPCS 2015

Memory considerations for application s HDF5 allocates metadata cache for each open file 2MB default size; may grow to 32MB depending on the working set Adjustable (see HDF5 User’s Guide, Advanced topics chapter); minimum size 1K HDF5 allocates chunk cache for each open dataset 1MB default Adjustable (see reference on slide 31); can be disabled Large number of open files and datasets increases memory used by application 10/17/1541 ICALEPCS 2015

The HDF Group Thank You! Questions? 10/17/15 42 ICALEPCS 2015