Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata
Definition from PADI Preservation metadata is intended to store technical details on the format, structure and use of the digital content, the history of all actions performed on the resource including changes and decisions, the authenticity information such as technical features or custody history, and the responsibilities and rights information applicable to preservation actions.
What is Preservation Metadata? Object stability (OAIS “content data object”) –What elements of the object’s content should be preserved? What is it? What is it for? –What functions of the object should be preserved? –(i.e., how can it remain itself into the future, and what do we mean by “itself”?) Environmental support (OAIS “environment”) –What kind of environmental characteristics does the object need to stay alive (software, hardware)? –(i.e., how do we specify its life support system?)
Object Stability I: Content Authenticity revisited: stability for what? –Access to genuine article –Historical truth –Guarantee of prior art –Intellectual property guarantee Range of attributes needed for each –What does “content” mean? –What is needed for it to remain “the same”
Object Stability II: Functionality Static objects (e.g. text) –Is “content” enough? –Look and feel Dynamic objects (e.g. computer game) –Look and feel (“experiential” elements) –Connectivity –Interactivity
Environmental Support I: Emulation Making it possible to see the object as it was originally seen Making it possible for the object to function as it originally did Providing software support for that to happen –Running the original program (in an environment that emulates the original environment) –Running something that looks like (emulates) the original program
Environmental Support II: Migration Deciding what to migrate (deciding what to lose) Transformations to the object –If reversible, no need to keep original object (this is no longer acceptable: note its connection with space terror) –If not, retention of original object necessary
Documentation requirements for preservation What the object was What the object is What happened in between
OAIS metadata model I
OAIS metadata model II SIP (send), AIP (archive), DIP (disseminate) Parts of an object –Content –Preservation description Reference (unique identifier) Provenance (history in and out of repository) Context (archival bond) Fixity (message digest) –Packaging –Descriptive
OAIS metadata model II What is “representation information”? –How much must be kept? –Monitoring changes What is the “knowledge base”? –Designated User Community –How do ontologies and the Semantic Web fit here? Remember the DUC will need access through automated tools (cf. metadata and software registries) –Where does bootstrapping stop? –DUC as “the public” also means a much broader universe of discourse
NEDLIB I: object layers Significant focus on emulation Part of OAIS “context” here OAIS model dictates layered view of original object (NEDLIB uses format + program) –Physical (storage format + hardware dependencies) –Binary (file system + operating system) –Structure (representation of human-viewable object in digital environment + interpreter) –Object (format of object + routines to interpret) –Application (needed to render the object)
NEDLIB II: adding rest of OAIS Reference (identifier) Fixity (how to know it’s the same) Context (parts) + provenance = change history Most of this summed in: change history –Date –Old version –New version –Tool –Reverse
CEDARS preservation metadata thinking Distributed archives preservation project Development of representation network Formal development of “significant properties” idea –Functionality required by viewers –Always retain original!! Migration on request for end users
National Library of Australia preservation metadata See table containing 25 elements Note that many elements have subelements Note influence of notion of versioning
OCLC/RLG preservation metadata thinking Attempt to provide summary/unfication of state of play See recommended metadata set: mwg/pm_framework.pdf mwg/pm_framework.pdf Ultimately served to underlie OCLC implementation
How does METS fit here?
Harvard Digital Repository Service XML example OCLC/RLG 2001 white paper examples –DRS instruction file Note that the file contains “instructions” in the form of names of actions This instruction file assumes that instructions are executed at ingest –DRS DTD for images Link to the DRS overview description:
DSpace and friends DSpace as a framework Version 1.2 Roadmap SDSC alliance
PREMIS project (OCLC/RLG) PREMIS (Preservation Metadata: Implementation Strategies), 2003 Elements working group for preservation metadata core (report due end 2004): ore_elements.htm ore_elements.htm Implementation subgroup polled for best practices 22 institutions have implemented repositories; only 11 have preservation strategies in place (see summary report RLG Diginews October 15, 2004)
Other ideas Why has there been so little progress? Thinking through loss in the cultural record –Where/how have greatest losses happened? –What happened as a result? –What does it mean to have an adequate record? “Good enough” fixity –Peer-to-peer schemes (LOCKSS) –Using “evidence” concept to restrict authenticity requirements