HDF5 Performance Enhancements with the Elimination of Unlimited Dimension Debbie Mao, Daniel Ziskin, Merritt Deeter, Sara Martinez-alonso MOPITT is an instrument flying on NASA's Earth Observing System Terra spacecraft, which was launched in 1999, and has been operational since March 3rd, 2000. MOPITT measures tropospheric carbon monoxide (CO) on the global scale, ~3 days data will almost cover the whole earth.
MOPITT Product Versions Operational Period Data Format MOP02 Size V3 * 2002 - 2009 HDF-EOS2 100 MB V4 * 2009 - 2012 140 MB V5 * 2011 - 2016 240 MB V6 2013 - present HDF-EOS5 250 MB V7 2016 - present 500 MB * no longer available Level 1 swath radiance Level 2 swath retrieved CO Level 3 grid avg retrieved CO, daily & monthly
MOPITT V5 HDF-EOS2 Simple
MOPITT V6 HDF-EOS5 structure met
MOPITT V7 HDF-EOS5 Dimen List More fields
Access to MOPITT Products NASA Data Archives via Reverb ASDC Data Pool ASDC MOPITT Subsetter Tools / Software for MOPITT Products hdfview IDV panoply MOPITT web viewer IDL NCL, …...
MOPITT Data Dimensions Level 1: track(Unlim), stare, pixel,... 1 track = 29 stares 1 stare = 4 pixels Level 2: time(Unlim), pressure,…... Level 3: xDim, yDim, pressure,…...
MOPITT hdf-eos2 to hdf-eos5 Migration Experience Unlimited dimension File open / close Idl: netCDF ~ hdfeos5 Augmentation tool Up left corner → low left corner NCL IDV Chunk size ~ idl reading time Strict data type
Before: AprioriCOMixingRatioProfile Dim: nTime,nPrs,nTwo List: 4 RetrievedCOMixingRatioProfile List: 3
After Elimination of Unlimited Dimension Use Dan's IDL to read and write all info but dimension list Run Augmentation tool to add dimension list After Elimination of Unlimited Dimension
HDF Problem Our QA Scientist noticed something very peculiar: our V7 products was much faster to analyze than V6. V6 % Time elapsed: 1541.6740 seconds. V7 % Time elapsed: 1120.7156 seconds. V7 file size is almost double of V6, but 27.3% quicker to read WHY ???
Experimental Design Compare V7 access time with and without unlimited dimension Experiment #1: use 5 datasets read multiple times Experiment #2: use 5 datasets for multiple days to measure access time. Action: Open file Read data Close file Data Fields: co_mx[nTime][nPrs][nTwo] co_tot[nTime][nTwo] lat, long, time [nTime]
MOPITT HDF5 Reading Time Comparison – Experiment #1
MOPITT HDF5 Reading Time Comparison – Experiment #2
Summary MOPITT has over 17 years data There are more CF attributes in Version 7 products Transition from HDF-EOS2 to HDF-EOS5 is a big job After eliminating Unlimited dimension: Experiment #1: small difference, about 2% Experiment #2: large difference at the first, about 40% the difference becomes smaller over time Conclusion: eliminating the unlimited dimension has an effect on access times in some applications.