Ensuring Enduring Access: A Forum on Digital Preservation, July 21, 2009
Scan-converted Broadcast Image Original SSTV Image
“Nafziger, who was in charge of the live TV recordings back in the Apollo years, said they were mostly thought of as data tapes. It wasn't his job to preserve history, he said, just to make sure the footage worked. In retrospect, he said he wished NASA hadn't reused the tapes.” -- Associated Press
Open Archival Information System – Reference Model
PREMIS Data Dictionary for Preservation Metadata
Object Identifier Object Identifier Object Category Object Category Preservation Level Preservation Level Significant Properties Significant Properties Object Characteristics Object Characteristics Creating Application Creating Application Original Name Original Name Storage Storage PREMIS Data Dictionary: Object
Event Identifier Event Identifier Event Type Event Type Event Date/Time Event Date/Time Event Detail Event Detail Event Outcome Event Outcome Linking Agent Identifier Linking Agent Identifier Linking Object Identifier Linking Object Identifier PREMIS Data Dictionary: Event
Agent Identifier Agent Identifier Agent Name Agent Name Agent Type Agent Type PREMIS Data Dictionary: Agent & Rights
Content Sustainability Factors: Content Sustainability Factors: Disclosure Disclosure Adoption Adoption Transparency Transparency Self-documentation Self-documentation External dependencies External dependencies Impact of patents Impact of patents Technical protection mechanisms Technical protection mechanisms Content Quality Content Quality
Repositories Repositories DSpace DSpace DSpace Fedora Fedora Fedora DAITSS DAITSS DAITSS Archivists’ Toolkit Archivists’ Toolkit Archivists’ Toolkit Archivists’ Toolkit Archon Archon Archon Format Registries Format Registries GDFR GDFR GDFR Pronom Pronom Pronom Format Identification Format Identification JHOVE JHOVE JHOVE DROID DROID DROID DuraSpace UDFR
Trustworthy Repositories Audit & Certification (TRAC) Trustworthy Repositories Audit & Certification (TRAC) Trustworthy Repositories Audit & Certification (TRAC) Trustworthy Repositories Audit & Certification (TRAC) Organizational Infrastructure Organizational Infrastructure Governance & Organizational Viability Governance & Organizational Viability Organization Structure & Staffing Organization Structure & Staffing Procedural Accountability & Policy Framework Procedural Accountability & Policy Framework Financial Sustainability Financial Sustainability Contracts, Licenses & Liabilities Contracts, Licenses & Liabilities Digital Object Management Digital Object Management Ingest: Acquisition of Content Ingest: Acquisition of Content Ingest: Creation of the Archival Package Ingest: Creation of the Archival Package Preservation Planning Preservation Planning Archival Storage & Preservation Archival Storage & Preservation Information Management Information Management Access Management Access Management Technologies, Technical Infrastructure & Security Technologies, Technical Infrastructure & Security System Infrastructure System Infrastructure Appropriate Technologies Appropriate Technologies Security Security
Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) Organizational Context -- identify the repository’s role, and chart its goals and objectives Organizational Context -- identify the repository’s role, and chart its goals and objectives Policy & Regulatory Framework – provide evidence that the repository is aware of the societal, ethical, juridical and governance frameworks to which it is subject and that it operates appropriately within them Policy & Regulatory Framework – provide evidence that the repository is aware of the societal, ethical, juridical and governance frameworks to which it is subject and that it operates appropriately within them Activities, Assets & Owners -- develop a conceptual model of what the repository does and how it does it by examining work processes, key assets & staff Activities, Assets & Owners -- develop a conceptual model of what the repository does and how it does it by examining work processes, key assets & staff Identify, Assess & Manage Risks – Based on preceding, identify pertinent risks faced by repository, assess their likelihood and potential impact, and develop plans to eliminate or minimize them Identify, Assess & Manage Risks – Based on preceding, identify pertinent risks faced by repository, assess their likelihood and potential impact, and develop plans to eliminate or minimize them
Too little Too little Between cultural memory sectors Between cultural memory sectors Between content creators, publishers & cultural memory organizations Between content creators, publishers & cultural memory organizations Between cultural memory organizations & users Between cultural memory organizations & users Too much? Too much? The costs of collaboration The costs of collaboration
“I love standards, there are so many of them.” “I love standards, there are so many of them.” E.g., here is a partial list of packaging formats being employed by preservation repositories today: E.g., here is a partial list of packaging formats being employed by preservation repositories today: BagIt BagIt METS METS XFDU XFDU OAI-ORE OAI-ORE IMS-CP IMS-CP SCORM SCORM MPEG-21 DIDL MPEG-21 DIDL FOXML FOXML
OAIS Reference Model – Libraries are not data archives; our users are not data scientists OAIS Reference Model – Libraries are not data archives; our users are not data scientists
PREMIS – Devil in the missing details PREMIS – Devil in the missing details
Content – You don’t always get what you want Content – You don’t always get what you want
NLM tested 10 different digital preservation repository systems (DLib May/June 2009), evaluating Fedora as the best. “The best” at this point includes: NLM tested 10 different digital preservation repository systems (DLib May/June 2009), evaluating Fedora as the best. “The best” at this point includes:DLib May/June 2009DLib May/June 2009 No work flow for submission review No work flow for submission review No virus checking on submitted content No virus checking on submitted content No format validation on ingest No format validation on ingest No coordination between deletion of content and deletion of associated metadata No coordination between deletion of content and deletion of associated metadata No file migration No file migration Extremely limited reporting Extremely limited reporting Weak maintenance facilities for adding new content, editing metadata, troubleshooting Weak maintenance facilities for adding new content, editing metadata, troubleshooting No support for Z39.50, SRU/SRW, OpenURL or Z39.87 No support for Z39.50, SRU/SRW, OpenURL or Z39.87
Format registries provide some useful technical information about file formats but they do not at the moment provide access to representation information in the OAIS sense of the word. Format registries provide some useful technical information about file formats but they do not at the moment provide access to representation information in the OAIS sense of the word. The UDFR will be based on the existing PRONOM technical infrastructure, which provides even less support for representation information than the GDFR did. The UDFR will be based on the existing PRONOM technical infrastructure, which provides even less support for representation information than the GDFR did. And you won’t see it until next year. And you won’t see it until next year.
Reflections on Trusting Trust, Ken Thompson, Communication of the ACM, Vol. 27, No. 8, August 1984, pp Reflections on Trusting Trust, Ken Thompson, Communication of the ACM, Vol. 27, No. 8, August 1984, pp Reflections on Trusting Trust Reflections on Trusting Trust Current systems assume that metadata will be the basis on which people will evaluate the authenticity and integrity of preserved digital information. This assumes that you can trust the metadata to have maintained its authenticity and integrity. Which means you need metadata for your metadata. But to trust that metadata’s authenticity and integrity, you need metadata…. Current systems assume that metadata will be the basis on which people will evaluate the authenticity and integrity of preserved digital information. This assumes that you can trust the metadata to have maintained its authenticity and integrity. Which means you need metadata for your metadata. But to trust that metadata’s authenticity and integrity, you need metadata…. Lesson: Technology can’t solve everything. Lesson: Technology can’t solve everything.
TRAC – demonstrates whether others should trust your organization, but not whether you’re actually being successful in your mission. TRAC – demonstrates whether others should trust your organization, but not whether you’re actually being successful in your mission. DRAMBORA – helps you identify potential risks to materials and develop plans to address them, but doesn’t actually measure your performance. DRAMBORA – helps you identify potential risks to materials and develop plans to address them, but doesn’t actually measure your performance. We don’t have any longitudinal data on long- term maintenance costs and on data preservaton/loss to develop metrics for what constitutes success We don’t have any longitudinal data on long- term maintenance costs and on data preservaton/loss to develop metrics for what constitutes success
Jerome McDonough Graduate School of Library & Information Science University of Illinois at Urbana Champaign