Presentation is loading. Please wait.

Presentation is loading. Please wait.

HDF5 and casacore Ger van Diepen ASTRON.

Similar presentations


Presentation on theme: "HDF5 and casacore Ger van Diepen ASTRON."— Presentation transcript:

1 HDF5 and casacore Ger van Diepen ASTRON

2 casacore C++ library for astronomical data handling
Array templated N-dim arrays (STL-conforming) Record dict-like container Table RDBMS-like storage mechanism; TaQL query language MeasurementSet visibility data storage and access (using Tables) Measures values in frame (direction, position, epoch, ...) Coordinates world coordinates for images Images N-dimensional image cubes with 0 or more masks (also supports HDF5, FITS, Miriad, expressions (LEL)) Used by LOFAR, ASKAP, ALMA, eVLA, MeqTrees, pyrap, pydal See Download casacore.googlecode.com Classes TaQL LEL MS definition

3 Simple C++ classes on top of subset of HDF5 API
HDF5 in casacore Simple C++ classes on top of subset of HDF5 API Core HDF5Hid classes Resource handling (automatically close a hid) HDF5File HDF5DataType Create HDF5 data types for C++ data types HDF5DataSet HDF5 dataset of various types Uses chunking HDF5Record Map a Record (dict) to HDF5 groups

4 HDF5Lattice HDF5Image N-dim array in a HDF5DataSet
N-dim array in a HDF5DataSet Part of casacore Lattice/Image framework Handles iteration, subsetting, etc. Automatically sizes HDF5 chunk cache given an access pattern HDF5Image HDF5Lattice with world coordinates Coordinates stored using HDF5Record Part of casacore Image framework PagedImage, FITSImage, MiriadImage, ImageExpression

5 Performance tLatticePerf
Create a 3-dim float lattice using casacore Tables or HDF5 on MacBook 2 GB RAM Iterate by chunk, plane, and line line is done tile-wise to reduce required cache size Various array and chunk sizes 128^3 16^3 8 MB 256^3 16^3 64 MB 512^3 32^ GB 1024,1024,128 64,64, GB 2048,2048, ,128,4 1 GB 4096,4096, ,128,4 2 GB 128^3 array with chunk sizes 2, 4, 8, 16, 32, 64, 128 (^3) Also compared file sizes

6 128^3 float; chunk 2^x HDF5 CCTS cache HDF5 CCTS cache
File size (KB) getTile (sec) HDF CCTS cache more time for very large chunk

7 getplanexy,xz,yz, getlinex,y,z
HDF CCTS cache CCTS mmap

8 8 MB Creating tLatticePerf_tmp.tab with shape [128, 128, 128] and tile shape [16, 16, 16] CCTS create real user system Creating tLatticePerf_tmp.hdf with shape [128, 128, 128] and tile shape [16, 16, 16] HDF5 create real user system CCTS getTiles real user system HDF5 getTiles real user system CCTS xy getPlane real user system HDF5 xy getPlane real user system CCTS xz getPlane real user system HDF5 xz getPlane real user system CCTS yz getPlane real user system HDF5 yz getPlane real user system CCTS x getLine real user system HDF5 x getLine real user system CCTS y getLine real user system HDF5 y getLine real user system CCTS z getLine real user system HDF5 z getLine real user system

9 64 MB Creating tLatticePerf_tmp.tab with shape [256, 256, 256] and tile shape [16, 16, 16] CCTS create real user system Creating tLatticePerf_tmp.hdf with shape [256, 256, 256] and tile shape [16, 16, 16] HDF5 create real user system CCTS getTiles real user system HDF5 getTiles real user system CCTS xy getPlane real user system HDF5 xy getPlane real user system CCTS xz getPlane real user system HDF5 xz getPlane real user system CCTS yz getPlane real user system HDF5 yz getPlane real user system CCTS x getLine real user system HDF5 x getLine real user system CCTS y getLine real user system HDF5 y getLine real user system CCTS z getLine real user system HDF5 z getLine real user system

10 0.5 GB Creating tLatticePerf_tmp.tab with shape [512, 512, 512] and tile shape [32, 32, 32] CCTS create real user system Creating tLatticePerf_tmp.hdf with shape [512, 512, 512] and tile shape [32, 32, 32] HDF5 create real user system CCTS getTiles real user system HDF5 getTiles real user system CCTS xy getPlane real user system HDF5 xy getPlane real user system CCTS xz getPlane real user system HDF5 xz getPlane real user system CCTS yz getPlane real user system HDF5 yz getPlane real user system CCTS x getLine real user system HDF5 x getLine real user system CCTS y getLine real user system HDF5 y getLine real user system CCTS z getLine real user system HDF5 z getLine real user system

11 0.5 GB Creating tLatticePerf_tmp.tab with shape [1024, 1024, 128] and tile shape [64, 64, 16] CCTS create real user system Creating tLatticePerf_tmp.hdf with shape [1024, 1024, 128] and tile shape [64, 64, 16] HDF5 create real user system CCTS getTiles real user system HDF5 getTiles real user system CCTS xy getPlane real user system HDF5 xy getPlane real user system CCTS xz getPlane real user system HDF5 xz getPlane real user system CCTS yz getPlane real user system HDF5 yz getPlane real user system CCTS x getLine real user system HDF5 x getLine real user system CCTS y getLine real user system HDF5 y getLine real user system CCTS z getLine real user system HDF5 z getLine real user system

12 1 GB Creating tLatticePerf_tmp.tab with shape [2048, 2048, 64] and tile shape [128, 128, 8] CCTS create real user system Creating tLatticePerf_tmp.hdf with shape [2048, 2048, 64] and tile shape [128, 128, 8] HDF5 create real user system CCTS getTiles real user system HDF5 getTiles real user system CCTS xy getPlane real user system HDF5 xy getPlane real user system CCTS xz getPlane real user system HDF5 xz getPlane real user system CCTS yz getPlane real user system HDF5 yz getPlane real user system CCTS x getLine real user system HDF5 x getLine real user system CCTS y getLine real user system HDF5 y getLine real user system CCTS z getLine real user system HDF5 z getLine real user system

13 2 GB Creating tLatticePerf_tmp.tab with shape [4096, 4096, 32] and tile shape [128, 128, 4] CCTS create real user system Creating tLatticePerf_tmp.hdf with shape [4096, 4096, 32] and tile shape [128, 128, 4] HDF5 create real user system CCTS getTiles real user system HDF5 getTiles real user system CCTS xy getPlane real user system HDF5 xy getPlane real user system CCTS xz getPlane real user system HDF5 xz getPlane real user system CCTS yz getPlane real user system HDF5 yz getPlane real user system CCTS x getLine real user system HDF5 x getLine real user system CCTS y getLine real user system HDF5 y getLine real user system CCTS z getLine real user system HDF5 z getLine real user system

14 HDF5 line by line uses a lot of user time
Results HDF5 line by line uses a lot of user time Verified by independent tHDF5 program submitted as bug Due to B-tree lookups? HDF5 uses less system time, but usually slower Files are about equal in size HDF5 slightly larger (more if tiles are smaller) due to B-tree? Note: writes faster because no fsync done for larger data sets kernel’s file cache not used


Download ppt "HDF5 and casacore Ger van Diepen ASTRON."

Similar presentations


Ads by Google