Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander, NSIDC Choonghwan Lee, THG
Outline Motivation Goals Standards Plans and Status
Motivation Technologies change regularly, organizations come and go, but data must survive But preserving data takes more than just preserving the bits, all the components of an AIP are critical
Illustration =
Project Goals Prototype development of Archive Information Packages for HDF data: For entire data sets For individual “granules” Test usability of digital library standards with geospatial data
Metadata Standards - METS Metadata Encoding and Transmission Standard An initiative of the Digital Library Federation Provides the means to convey the metadata necessary for management of digital objects within a repository exchange of objects between repositories (or between repositories and their users) Designed to facilitate shared development of information management tools/services interoperable exchange of digital materials
METS - A very brief overview Describes the METS document itself e.g., creator or editor Describes the object using some external standard e.g., MARC, FGDC, Dublin Core Describes object creation, storage, intellectual property rights, source info, provenance, etc. e.g., PREMIS Provides an inventory of all of the files that are part of the object described A physical or logical map of the organization of the materials described Allows specification of hyperlinks between parts of the map (mostly useful when preserving websites) Used to associate executable code with parts of the content
ISO Geographic Information - Metadata Purpose Characterize geographic data properly Facilitate organization and management of metadata for geographic data Enable users to efficiently use such data Facilitate discovery, retrieval, and reuse Enable data assessment
ISO entities Identification Constraints Data Quality Maintenance Information Spatial Representation Reference System Content Information Portrayal Catalogue Reference Distribution Metadata Extension Information Application Schema Information
Metadata Standards - PREMIS Provide a core preservation metadata set with broad applicability across the digital preservation community Developed by an OCLC and RLG sponsored international working group Representatives from libraries, museums, archives, government, and the private sector. Maintained by the Library of Congress Based on the OAIS reference model
Current Program Plan NetCDF4 / HDF5 Data METS NSIDC/ ECS HDF4-data ISO H4to H5 ECS to METS (Data Set) CDM/NetCDF4 ECS to METS (Granule) NSIDC/ECS Metadata HDF5-AIP NetCDF4/HDF5-data NetCDF4 / HDF5 Data NSIDC/ ECS HDF4-data H4to H5 NetCDF4/HDF5-data
Data file HDF5 METS Primary Schema Extension Schema | | | |-- | |-- PREMIS | |-- | HDF5 AIP Components Metadata file HDF5 File Level Archive Information Packages
METS Primary Schema Extension Schema | | | |-- | |-- PREMIS | |-- |---- Metadata file Data Set Level Archive Information Package HDF- AIP Contextual Infomation HDF- AIP Contextual Infomation Contextual Infomation Contextual Infomation Contextual Infomation HDF- AIP
File Level AIP Activity Status Development of a map from NSIDC/ECS metadata to METS/PREMIS/ISO completed Implementation underway Issues Auxillary file handling - own AIP or not? o E.g., browse files, processing history, PGE’s o Granules vs files Schema redundancy
Data Set AIP Activities Status Contextual information availability assessed for MODIS data Currently GCSRLTA information requirements are being met Much of the information is available via a variety of websites many of which are dynamically updated Format of the material varies widely Some material should be considered geographic data sets in their own right Much of the material applies to multiple data sets
Data Set AIP Activity Status Local sources of metadata identified ECS Earth Science Data Type (ESDT) definitions NSIDC data set catalog and documentation Data set catalog to ISO metadata translator implemented - to be released operationally soon