Greg Janée topics Fedora NGDA project activities Two study ideas MODIS Preservation as series-of-handoffs
Greg Janée Fedora— what Repository system Features –basic management functionality –programmatic APIs –object model & XML representation thereof –storage subsystem abstraction –inter-object relationships, versions, … Active user community
Greg Janée Fedora— why Avoid re-inventing wheel Value in describing our work as profiles of, additions to a base repository –intellectual value –practical value: contribute back to Fedora community Generic Fedora-ADL connection
Greg Janée Fedora— to do Define “long-term preservation” profile Additions to Fedora –richer model of semantic definitions –support for geospatial data types –Archivas storage driver
Greg Janée DEVELOPMENTRESEARCH Considerations for long-term preservation Best practices Collection development, prioritization, and scope Architectural and economic models Rights issues ADL federation geospatial format/product registry UCSB Stanford other multiple levels of information preservation: bits semantics viewability … prototype archives … NGDA project activities
Greg Janée computing platform semantics terminology provenance provider quality appropriate usage community environment capture object (data + metadata) object (data + metadata) 2005 object (data + metadata) object (data + metadata) 2105 How much is necessary? Capturable? Archivable? Affordable? Study #1 object migrate
Greg Janée Study #2 Survey people who have (tried to) use old geospatial data –What information did they need? –What was missing? –What would have been useful?
Greg Janée MODIS “Moderate Resolution Imaging Spectroradiometer”
Greg Janée MODIS challenges Size –2 petabytes; growing 1 TB/day –not backed up HDF file format –large, complex, long history –controlled/managed by NCSA source of funding –actual format is undocumented accessed through NCSA-provided software libraries reverse engineerable in principle, not in practice
Greg Janée MODIS challenges Raw data format –documented, but not publicly releasable includes satellite controls Attitude/ephemeris data format –“hard to find” Other format issues –packets, nested formats,...
Greg Janée MODIS challenges Calibration/processing algorithms –key to data interpretation –documentation: initially described by “algorithm theoretical basis document” (ATBD) –rapidly outdated, never updated journal articles Fortran/C source code is definitive –certain lookup tables re-calibrated monthly –moving to on-demand computation of products
Greg Janée MODIS challenges Other, related efforts –NASA committee(s) on long-term storage –NASA’s transition of operations to NOAA CLASS
Greg Janée MODIS Questions for you: –Is this within NGDA’s scope? –Is this within LoC’s scope?
Greg Janée Preservation as series of handoffs Chris Rusbridge –no such thing as a 100-year guarantee * “impossible perfection” –instead, a series of 10-year guarantees Jim Frew –“store-and-forward” model –analogous to Internet Greg’s conclusion –handoff/migration ability is key * except by LoC?