Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving long-term preservation of EOS data by independently mapping HDF4 data objects The HDF Group.

Similar presentations


Presentation on theme: "Improving long-term preservation of EOS data by independently mapping HDF4 data objects The HDF Group."— Presentation transcript:

1 Improving long-term preservation of EOS data by independently mapping HDF4 data objects The HDF Group

2 Mapping project team members The HDF Group Ruth Aydt Mike Folk Joe Lee Elena Pourmal Binh-Minh Ribler Muqun {Kent} Yang NASA Ruth Duerr & Luis Lopez(NSIDC) Chris Lynnes (GES DISC) April 6 2011Annual HDF Briefing to NASA2 Raytheon Evelyn Nakamura many others

3 Recap Problem The complex byte layout of HDF files makes long- term readability of HDF data dependent on long- term availability of HDF software. Solution Create a map of the layout of data objects in an HDF file, allowing a simple reader to be written to access the data. Implement tools to create layout maps for EOS data products. Deploy tools at select EOS data centers. April 6 2011Annual HDF Briefing to NASA3

4 April 6 2011Annual HDF Briefing to NASA4

5 HDF4 mapping workflow HDF4 File HDF4 Map File (XML document) h4mapwriter linked with HDF4 library h4mapwriter linked with HDF4 library Reader program Object Data Groups, Data Objects, Structural and Application Metadata; Locations of Object Data April 6 2011Annual HDF Briefing to NASA5

6 PHASE 1 BUILD A PROTOTYPE (COMPLETED IN 2009) April 6 2011Annual HDF Briefing to NASA6

7 PHASE 2 PRODUCTIZE HDF4 MAPPING SCHEMA AND TOOLS FOR DEPLOYMENT April 6 2011Annual HDF Briefing to NASA7

8 Phase 2 tasks April 6 2011Annual HDF Briefing to NASA8 A.Investigate integration of mapping schema with existing standards B.Determine HDF-EOS 2 requirements C.Redesign and expand the XML schema D.Implement production quality map writer E.Develop demo map reader F.Deploy tools at select NASA data centers

9 April 6 2011Annual HDF Briefing to NASA9

10 TASK A INVESTIGATE INTEGRATION OF MAPPING SCHEMA WITH EXISTING STANDARDS April 6 2011Annual HDF Briefing to NASA10

11 Investigate existing standards Investigated: METS, PREMIS, ESML, NcML, and CSML Concluded: Existing standards have different purposes than mapping schema None meet all needs of mapping project Develop new schema tailored to project goals Harmonize with PREMIS Leverage terminology and approaches from all Status: Need to write report Need to include some PREMIS-like data such as HDF4 file size and possibly MD checksum April 6 2011Annual HDF Briefing to NASA11

12 TASK B DETERMINE HDF- EOS2 REQUIREMENTS April 6 2011Annual HDF Briefing to NASA12

13 Background An HDF-EOS2 file is an HDF4 file, so one can create an HDF4 mapping file to archive the HDF-EOS2 file. However, for some HDF-EOS2 files, it may be extremely difficult to retrieve correct geo- location information from the mapping files. For those files, special HDF-EOS2 mapping files may be needed. April 6 2011Annual HDF Briefing to NASA13

14 Categorize HDF-EOS2 data products Created a data pool from NASA data centers GES DISC, NSIDC, LAADS, LP DAAC LaRC, PO.DAAC, GHRC, OBPG Analyzed data and reported options for adding HDF-EOS2 contents to the mapping file Conclusion: No special mapping for HDF- EOS2 needs to be done However, the study uncovered some important shortcomings in certain HDF-EOS products April 6 2011Annual HDF Briefing to NASA14

15 Status and Plans Status: Complete Detailed descriptions of sample data: http://hdfeos.org/zoo/Data_Collection/index.php Documents and reports at wiki: http://wiki.hdfgroup.org/MappingPhase2_TaskB Plans We plan to recommend a future task in which these issues are made known to the project April 6 2011Annual HDF Briefing to NASA15

16 TASK C REDESIGN SCHEMA April 6 2011Annual HDF Briefing to NASA16

17 Design priorities and assumptions Mapping files Provide complete access to user-supplied content in NASA’s EOS binary HDF4 files Have enough information to stand on their own Be as simple as possible Mapping schema Describe the Mapping files Used for validation and documentation May not be available to target user April 6 2011Annual HDF Briefing to NASA17

18 Status and Plans Status All HDF4 objects found in EOS products are now handled by the Mapping schema. Plans Complete schema elements for HDF4 file description information File size, MD checksum (?), HDF4 library version stamp (?) Finalize schema documentation Address any additional HDF4 objects found during remainder of project, either by updating schema and map writer, or with follow-on proposal if substantial amount of effort required. April 6 2011Annual HDF Briefing to NASA18

19 TASK D IMPLEMENT MAP WRITER April 6 2011Annual HDF Briefing to NASA19

20 Map Writer Requirements Retrieve information needed from HDF4 file Write out corresponding XML file Quality requirements Completeness Don’t miss any objects in file Report on objects or features not handled by the writer Accuracy – don’t give wrong information Readability – p rovide adequate instructions in the file April 6 2011Annual HDF Briefing to NASA20

21 Activities 1.Implement functions to facilitate map creation Develop writer requirements based on new XML schema and additional deployment needs Implement new functions as needed Include functions in library as appropriate 2.Implement writer: h4mapwriter Interpret map requirements according to schema Implement writer Package for deployment Support deployment April 6 2011Annual HDF Briefing to NASA21

22 Status and Plans Status 1.Implement functions to facilitate map creation All functions implemented 2.Implement writer Handles all objects Available as alpha-2 release Being tested by GES DISC, NSIDC, Raytheon Plans 1.Functions to facilitate map creation Include in future HDF4 releases 2.Writer Finish HDF4 file description elements Complete testing and documentation Support deployment, fix bugs and add features as needed April 6 2011Annual HDF Briefing to NASA22

23 TASK E IMPLEMENT DEMO READER April 6 2011Annual HDF Briefing to NASA23

24 Demo Reader Requirements Multiplatform command line tool Easy to use clear arguments and output Must validate that objects in the mapping file are actually in the HDF4 file Developed in a well-supported high level language (python) Well documented Available as open source April 6 2011Annual HDF Briefing to NASA24

25 Demo reader activities 1.Develop requirements, based on new schema and identification of additional deployment needs. 2.Design reader, based on requirements, and from a review of the prototype design. 3.Implement and document reader. 4.Test reader on EOS file “zoo” 5.Deposit reader, documentation, and tests in open source repository, probably SourceForge. April 6 2011Annual HDF Briefing to NASA25

26 Demo Reader Status Status Support provided so far for Vdata, SDS, Group, and Attribute Current source code available at http://sourceforge.net/projects/hdf4mapreader/ http://sourceforge.net/projects/hdf4mapreader/ Documentation at http://hdf4mapreader.sourceforge.net/ http://hdf4mapreader.sourceforge.net/ Plans Add raster image (RIS) and palette support April 6 2011Annual HDF Briefing to NASA26

27 TASK G DEPLOY April 6 2011Annual HDF Briefing to NASA27

28 Task G: Deploy Begin in April 2011, complete in June The HDF Group Provide h4mapwriter map generation tool Maintain tool during deployment and validation Assist GES DISC, NSIDC, and Raytheon with deployment and validation Raytheon Validate HDF4 map software in anticipation of future deployment GES DISC and NSIDC: see next slide April 6 2011Annual HDF Briefing to NASA28

29 What about GES DISC and NSIDC? Activities (formerly): GES DISC Incorporate into the existing archive ingest system Manage the retrofit into existing metadata files NSIDC Support implementation in NSIDC’s ECS system Other ESDCs Encouraged to join in But deployment to other centers expected subsequent to the project. Ruth Duerr’s observation: The task for NSIDC is to assist in the ECS implementation at NSIDC, which won't take place until 2012 Task G only includes the work up to the handoff to ECS Thus, what NSIDC does needs to extend after the period of performance of this award is over How do we resolve that issue? April 6 2011Annual HDF Briefing to NASA29

30 BEYOND JULY 15 April 6 2011Annual HDF Briefing to NASA30

31 Future work NSIDC assist in the ECS deployment at NSIDC GES DISC: ? The HDF Group Monitor deployment activities by Raytheon and others to identify Unsupported objects and tags occurring in products Software defects Feature requests As needed, fix defects, add features, and add support for new objects and tags Address performance issues Add h4mapwriter tool and supporting API to regular HDF4 testing regime Perform other services in support of the software as needed All Perform post mortem and identify lessons learned Write paper summarizing the project Investigate HDF5 mapping April 6 2011Annual HDF Briefing to NASA31

32 The End

33 Acknowledgements This work was supported by cooperative agreement number NNX08AO77A from the National Aeronautics and Space Administration (NASA). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author[s] and do not necessarily reflect the views of the National Aeronautics and Space Administration. April 6 2011Annual HDF Briefing to NASA33


Download ppt "Improving long-term preservation of EOS data by independently mapping HDF4 data objects The HDF Group."

Similar presentations


Ads by Google