Download presentation
Presentation is loading. Please wait.
Published byReynold Lane Modified over 9 years ago
1
www.hdfgroup.org The HDF Group Improving long-term preservation of EOS data by independently mapping HDF4 data objects Mike Folk, Ruth Aydt, Joe Lee, Binh-Minh Ribler, Kent Yang Ruth Duerr, Christopher Lynnes The 14 th HDF and HDF-EOS Workshop September 28-30, 2010 HDF/HDF-EOS Workshop XIV1
2
www.hdfgroup.org Mapping project team members The HDF Group Ruth Aydt Peter Cao Mike Folk Joe Lee Elena Pourmal Tong Qi Binh-Minh Ribler Eunsoo Seo Veer Singh Muqun {Kent} Yang NASA Ruth Duerr (NSIDC) Chris Lynnes (GES- DISC) September 28-30, 2010HDF/HDF-EOS Workshop XIV2
3
www.hdfgroup.org HDF4 files are complex September 28-30, 2010HDF/HDF-EOS Workshop XIV3
4
www.hdfgroup.org How do HDF users avoid having to deal with all of that complexity? September 28-30, 2010HDF/HDF-EOS Workshop XIV4
5
www.hdfgroup.org Through the HDF software libraries, either by using HDF APIs directly, or by using HDF tools that depend on the HDF libraries. But what about the future… September 28-30, 2010HDF/HDF-EOS Workshop XIV5
6
www.hdfgroup.org Over the long term, there is a risk in depending solely on HDF software to access HDF- formatted data. It is possible in the distant future, that the software may not be available. September 28-30, 2010HDF/HDF-EOS Workshop XIV6
7
www.hdfgroup.org “If only we could read HDF data with an independent program that does not rely on the HDF API… A possible approach [would be to create] a map of a data file, [and] utilities to find, assemble and write out SDSes and vdatas.” “Leveraging HDF Utilities” Christopher Lynnes HDF Workshop X. September 28-30, 2010HDF/HDF-EOS Workshop XIV7
8
www.hdfgroup.org User’s view of the HDF4 SD model September 28-30, 2010HDF/HDF-EOS Workshop XIV8
9
www.hdfgroup.org Mapping SDS to file offset/length HDF4 file layout September 28-30, 2010HDF/HDF-EOS Workshop XIV9
10
www.hdfgroup.org Mapping with compressed chunks HDF4 file layout September 28-30, 2010HDF/HDF-EOS Workshop XIV10
11
www.hdfgroup.org Recap Problem The complex byte layout of HDF files makes long-term readability of HDF data dependent on long-term availability of HDF software. Solution Create a map of the layout of data objects in an HDF file, allowing a simple reader to be written to access the data. September 28-30, 2010HDF/HDF-EOS Workshop XIV11
12
www.hdfgroup.org HDF4 mapping workflow HDF4 File HDF4 Mapping File (XML document) hmap linked with HDF4 library hmap linked with HDF4 library Reader program Object Data Groups, Data Objects, Structural and Application Metadata; Locations of Object Data September 28-30, 2010HDF/HDF-EOS Workshop XIV12
13
www.hdfgroup.org Target User Person 20+ years in the future Interested in data stored in HDF4 file Has HDF4 file and companion map file Can “write a program” May not have: HDF4 data model, format, documentation, or software Mapping schema, documentation, or software Will have knowledge of: Basic XML Data representations used today Compression used by HDF4 (JPEG, Szip, etc.) September 28-30, 2010HDF/HDF-EOS Workshop XIV13
14
www.hdfgroup.org Project Phases Phase 1 Categorize HDF4 data held by NASA. Build a prototype XML layout representation Tool to create XML map file for given HDF4 file Tools to read HDF4 data based solely on map files Phase 2 Build a robust version Deploy September 28-30, 2010HDF/HDF-EOS Workshop XIV14
15
www.hdfgroup.org How many HDF4 products? Data CenterHDF4 Products ASF0 GES-DISC236 GHRC54 ASDC63 LP-DAAC67 NSIDC47 ORNL-DAAC2 PO.DAAC22 SDAC0 MrDC95 Total586 September 28-30, 2010HDF/HDF-EOS Workshop XIV15
16
www.hdfgroup.org Data characteristics Product Identification Product Name Data Level Archive Location For HDF-EOS products HDF-EOS version For swath data Number of swaths Maximum number of dimensions Organized by time, space, both, or other Etc. For SDS data Number of SDSs Max number of dimensions Did any SDS have attributes Was any SDS annotated Were dimension scales used Was compression used and if so what kind Was chunking used For Vdata Number of Vdata structures Did any have attributes Did any fields have attributes Etc. September 28-30, 2010HDF/HDF-EOS Workshop XIV16 Product Characteristics Examined
17
www.hdfgroup.org Phase 2 tasks September 28-30, 2010HDF/HDF-EOS Workshop XIV17 A.Investigate integration of mapping schema with existing standards B.Determine HDF-EOS 2 requirements C.Redesign and expand the XML schema D.Implement production quality map writer E.Develop demo map reader F.Deploy tools at select NASA data centers
18
www.hdfgroup.org The HDF Group Task A Investigate integration of mapping schema with existing standards September 28-30, 2010HDF/HDF-EOS Workshop XIV18
19
www.hdfgroup.org Investigate existing standards Investigated: METS, PREMIS, ESML, NcML, and CSML Concluded: Existing standards have different purposes than mapping schema None meet all needs of mapping project Develop new schema tailored to project goals Harmonize with PREMIS Leverage terminology and approaches from all September 28-30, 2010HDF/HDF-EOS Workshop XIV19
20
www.hdfgroup.org The HDF Group Task B Determine HDF-EOS2 requirements September 28-30, 2010HDF/HDF-EOS Workshop XIV20
21
www.hdfgroup.org Categorize HDF-EOS2 data products Created a data pool from NASA data centers GES DISC, NSIDC, LAADS, LP DAAC LaRC, PO.DAAC, GHRC, OBPG, LAADS Detailed description of sample data Reported options for adding HDF-EOS2 contents to the mapping file Documents and reports at wiki: http://wiki.hdfgroup.org/MappingPhase2_TaskB September 28-30, 2010HDF/HDF-EOS Workshop XIV21
22
www.hdfgroup.org The HDF Group Task C Redesign Schema September 28-30, 2010HDF/HDF-EOS Workshop XIV22
23
www.hdfgroup.org Design priorities Mapping files Provide complete access to user-supplied content in NASA’s EOS binary HDF4 files Have enough information to stand on their own Be as simple as possible Mapping schema Describe the Mapping files Used for validation and documentation May not be available to target user September 28-30, 2010HDF/HDF-EOS Workshop XIV23
24
www.hdfgroup.org Representation of HDF4 Objects September 28-30, 2010HDF/HDF-EOS Workshop XIV24
25
www.hdfgroup.org Mapping File – Group & Table (fragment) September 28-30, 201025HDF/HDF-EOS Workshop XIV Represents HDF4 Objects and Relationships Information needed to access and interpret raw data in HDF4 file Select raw data values included to help user verify binary data handled properly AMSR_E_L2_Land_V09_200501180027_D
26
www.hdfgroup.org Status and Plans Status Map file design stabilizing for most HDF4 objects Plans Complete design for Raster Images and Palettes Continue to refine instructions and contents Finalize schema September 28-30, 201026HDF/HDF-EOS Workshop XIV
27
www.hdfgroup.org The HDF Group Task D Implement Writer September 28-30, 2010HDF/HDF-EOS Workshop XIV27
28
www.hdfgroup.org Map Writer Requirements Retrieve information needed from HDF4 file Write out corresponding XML file Quality requirements Completeness – don’t miss any objects in file. Accuracy – don’t give wrong information. September 28-30, 2010HDF/HDF-EOS Workshop XIV28
29
www.hdfgroup.org Writer Status and Plan Status Covers most Vgroup/Vdata/SDS objects. Covers some GR/Annotation objects. Being tested with NASA data. Plans: Increase coverage / accuracy / reliability. September 28-30, 2010HDF/HDF-EOS Workshop XIV29
30
www.hdfgroup.org The HDF Group Task E Implement demo reader September 28-30, 2010HDF/HDF-EOS Workshop XIV30
31
www.hdfgroup.org Demo Reader Requirements Multiplatform command line tool Easy to use clear arguments and output Must validate that objects in the mapping file are actually in the HDF4 file Developed in a well-supported high level language (python) Well documented Available as open source September 28-30, 2010HDF/HDF-EOS Workshop XIV31
32
www.hdfgroup.org Demo Reader Status Status Only Vdata support provided so far Current source code available at https://sourceforge.net/projects/pyhdf https://sourceforge.net/projects/pyhdf Documentation at http://pyhdf.sourceforge.net/http://pyhdf.sourceforge.net/ Plans SDS and RIS support September 28-30, 2010HDF/HDF-EOS Workshop XIV32
33
www.hdfgroup.org The HDF Group Task G Deploy September 28-30, 2010HDF/HDF-EOS Workshop XIV33
34
www.hdfgroup.org Deploy Begin in Jan 2011, complete in April Activities: GES DISC Incorporate into the existing archive ingest system Manage the retrofit into existing metadata files NSIDC Support implementation in NSIDC’s ECS system Other ESDCs Encouraged to join in But deployment to other centers expected subsequent to the project. September 28-30, 2010HDF/HDF-EOS Workshop XIV34
35
www.hdfgroup.org The HDF Group Thank You! September 28-30, 2010HDF/HDF-EOS Workshop XIV35
36
www.hdfgroup.org Acknowledgements This work was supported by cooperative agreement number NNX08AO77A from the National Aeronautics and Space Administration (NASA). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author[s] and do not necessarily reflect the views of the National Aeronautics and Space Administration. September 28-30, 2010HDF/HDF-EOS Workshop XIV36
37
www.hdfgroup.org The HDF Group Questions/comments? September 28-30, 2010HDF/HDF-EOS Workshop XIV37
38
www.hdfgroup.org September 28-30, 2010HDF/HDF-EOS Workshop XIV38
39
www.hdfgroup.org Extra slides September 28-30, 2010HDF/HDF-EOS Workshop XIV39
40
www.hdfgroup.org Mapping File – Array with Attribute (fragment) September 28-30, 201040HDF/HDF-EOS Workshop XIV Represents HDF4 Objects and Relationships Information needed to access and interpret raw data in HDF4 file Select raw data values included to help user verify binary data handled properly; “corners” + 5 random AIRS.2002.08.31.L3.RetStd_H001.v5.0.14.0.G07178195754
41
www.hdfgroup.org Mapping of Array with complex storage September 28-30, 201041HDF/HDF-EOS Workshop XIV Select values included in map file for verification Test file created for project Compression Chunks with Ghost Cells Raw data in HDF4 file; First chunk’s data is not contiguous
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.