Digital Preservation Dale Flecker Stephen Abrams February 15, 2007 HUL University Library Council.

Slides:



Advertisements
Similar presentations
Current State of Play in Digital Preservation Peter B. Hirtle Cornell University Library Society of American Archivists.
Advertisements

OCLC Online Computer Library Center Steering Around the Iceberg: Economic Sustainability for Digital Collections Brian Lavoie Research Scientist OCLC Economics.
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
Pulling it all together… with thanks to Sheila Anderson.
Digital Preservation A Matter of Trust. Context * As of March 5, 2011.
Digital Preservation and Trusted Digital Repositories Priscilla Caplan Florida Center for Library Automation ALA 2005 Chicago IL.
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Special collections and digital libraries: a new role for consortia? Dale Flecker Harvard University Library.
A centre of expertise in data curation and preservation MIS Seminar :: University of Edinburgh :: 2 October 2006 Funded by: This work is licensed under.
Challenges of Digital Preservation MA / CS 109 April 22, 2011 Andrea Goethals Manager of Digital Preservation & Repository Services Harvard Library.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
ISO & OAI-PMH By Neal Harmeyer, Amy Hatfield, and Brandon Beatty PURDUE UNIVERSITY RESEARCH REPOSITORY.
Digital Repository Service (DRS) Harvard University Library OIS presented by: Wendy Gogel & Andrea Goethals.
Preserving Digital Collections Andrea Goethals Florida Center for Library Automation (FCLA)
Promoting Digital Preservation Partnerships at the U.S. Library of Congress April 2004.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
Digital Preservation Andrea Goethals Wendy Gogel From Harvard University Library NELA 18 October 2010.
Ensuring Enduring Access: A Forum on Digital Preservation, July 21, 2009.
Tackling concrete digital preservation challenges with SPRUCE Paul Wheatley SPRUCE Project Manager University of Leeds Twitter:
Jenn Riley Metadata Librarian Indiana University Digital Library Program.
Two little talks CrossRef Membership Meeting November, 2004.
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
OAIS Open Archival Information System. “Content creators, systems developers, custodians, and future users are all potential stakeholders in the preservation.
DigCCurr Professional Institute: Curation Practices for the Digital Object Lifecycle Digital Curation Program Development Nancy Y McGovern Research Assistant.
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.
File format registries - a global infrastructure for local persistence Andreas Aschenbrenner, ERPANET.
JH VE 2 The Fifth International Conference on Preservation of Digital Objects British Library, September 2008 What? So What? The Next-Generation.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
Libraries, Archives, and Digital Preservation: The Reality of What We Must Do Leslie Johnston Acting Director, National Digital Information Infrastructure.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Digital Preservation MetaArchive Cooperative.  9:00-9:45 - Session 1: Digital Preservation Overview  9:45-11:00 - Session 2: Policy & Planning Overview.
Digital preservation activities at the NLW Sally McInnes 18 September 2009.
Small steps and lasting impact: making a start with preservation or It’s not all NASA Patricia Sleeman Digital Archives and Repositories University of.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
The KB e-Depot long-term preservation of scientific publications in practice Marcel Ras, National library of The Netherlands.
ETD2006 Preserving ETDs With D.A.I.T.S.S. FLORIDA CENTER FOR LIBRARY AUTOMATION FC LA PAPER AUTHORS: Chuck Thomas Priscilla.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Digital Preservation across the technologies, strategies, open standards & interoperability aspects including the legal issues Pratik Shrivastava Scientist.
NDSR Boston webinar: Digital Preservation Introduction Presenter: Nancy Y McGovern October 2015.
The OAIS Reference Model Michael Day, Digital Curation Centre UKOLN, University of Bath Reference Models meeting,
Preservation Metadata Initiatives: Status and Direction Brian Lavoie Senior Research Scientist Office of Research OCLC Archiving Web Resources Canberra.
Portico’s “d-collections” preservation service Stephanie Orphan Positive trends in sustainability? Emerging approaches to archiving commercial databases.
Managing Access at the University of Oregon : a Case Study of Scholars’ Bank by Carol Hixson Head, Metadata and Digital Library Services
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
@ulccwww.ulcc.ac.uk IRMS Cymru October 2015 From EDRMS to digital archive: a wish-list for ways to preserve digital records.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
Digital Preservation What, Why, and How? Dan Albertson’s Digital Libraries Class April 13, 2016 Jody DeRidder Head, Metadata & Digital Services University.
Practical Aspects of Preservation Peter Simpson Development Officer Arts and Humanities Data Service.
Transparent Format Migration of Preserved Web Content D. S. H. Rosenthal, T. Lipkis, T. S. Robertson, S. Morabito Lib Magazine, 11(1), 2005
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Joint Meeting of CSUL Committees,
Preserving Digital Collections
Ingest and Dissemination with DAITSS
FLORIDA CENTER FOR LIBRARY AUTOMATION
The National Archives Washington DC July 10, 2008
An Introduction to Tessella and The Safety Deposit Box Platform
Digital Project Lifecycle Curating Across the Curriculum
Implementing an Institutional Repository: Part II
Research data preservation in Canada
Digital Preservation and Trusted Digital Repositories
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

Digital Preservation Dale Flecker Stephen Abrams February 15, 2007 HUL University Library Council

Agenda IThe problem IIWhat has Harvard been doing? IIIWhat more do we need to do?

IThe problem …

… is twofold Keeping the bits Keeping the bits useful

Keeping the bits Digital things are amazingly easy to destroy! –Bad guys want to do damage –Hardware/software fails –People make mistakes The slip of a finger, or an unnoticed consequence of change, happen easily - and are potentially catastrophic

Destruction is not always apparent Data not used regularly is always at risk of unintended and unnoticed damage. (Note that archival copies can be pretty invisible…)

Keeping bits useful Digital materials are fragile!!! They depend on technologies for their vitality… and those technologies age and disappear rapidly.

Fragility Using digital content requires mediation by hardware and software Hardware and software must understand the format of the content Hardware and software technology change continually …

Fragility Old technology will break New technology frequently does not understand old formats

II What has Harvard been doing? Internally …

Digital Repository Service (DRS) Secure, professionally managed environment –Manage data rigorously, with discipline, and in accordance to community best practices Redundant, heterogeneous, distributed storage with periodic media migration …

Digital Repository Service (DRS) Know what data you have –What are the logical objects (“works”, not files)? –What are the technical characteristics of those objects? Check the data continuously Manage access to stored objects

Format Understanding formats is fundamental to preservation ffd8ffe000104a ffed0fb050686f74 6f73686f e d 03e90a e e666f f40240ffeeffee fc d f d03ed0a f6c f 6e a

Format Understanding formats is fundamental to preservation ffd8ffe000104a ffed0fb050686f74 6f73686f e d 03e90a e e666f f40240ffeeffee fc d f d03ed0a f6c f 6e a SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2...

Format Understanding formats is fundamental to preservation ffd8ffe000104a ffed0fb050686f74 6f73686f e d 03e90a e e666f f40240ffeeffee fc d f d03ed0a f6c f 6e a SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2...

Format Formats vary significantly in their “preservability” Keeping multiple versions of a given piece of content for different purposes is frequently wise –E.g. archival master, production master, use copy

Format Some criteria for “preservability” (from LC) –Disclosure (how well documented?) –Adoption (how widely used?) –Transparency (is compression used?) –Self documenting (good!) –External dependencies (self sufficiency is good) –Patents (could limit preservation actions) –DRM/encryption (what if decryption key is not available?)

Metadata The basis of decision-making for preservation –Technical metadata What format is this in? What format options are used? –Structural metadata If I change this, what else is affected? …

Metadata –Administrative metadata Who has the right to make decisions about this? –Relationship metadata Are there other versions of this object? –How do these affect my preservation strategy? –Provenance metadata Where did this come from? What changes has it already undergone?

Guidelines for “preservable” objects The least expensive, and most effective preservation measure is to think about the future when an object is created! (Guidelines on format, metadata, archival masters, etc.)

JHOVE (JSTOR/Harvard Object Validation Environment) A widely used tool for format identification, validation, and characterization.

JHOVE (JSTOR/Harvard Object Validation Environment) When an object is ingested: Determine its format (“identify”) Insure that it is properly formed (“validate”) Extract meaningful technical metadata (“characterize”)

DRS: what’s managed today As of January 2007, 5.6M files and 22 TB, excluding Google and web archiving

II What has Harvard been doing? Externally…

E-journal archiving “How can we ensure that licensed e-journal content will remain usable over time?” Mellon-funded study Explored technical formats, content types, transactions and dataflows, validation, systems requirements, contractual requirements, business models Harvard’s proposed model largely implemented by Portico

Technical Metadata for Digital Still Images “What are the appropriate technical metadata necessary for the preservation of images?” Standardized as NISO Z39.87 Expressed in the MIX schema –Maintained by LC The basis for DRS image technical metadata

METS (Metadata Encoding and Transmission Standard) “Is there a generic packaging form for digital content?” For example, –Digital books –Audio works –Images (archival master, production master, deliverables) Useful for exchange of objects between repositories Maintained by LC

Core audio metadata “What are the appropriate technical metadata necessary for the preservation of audio?” Standardized as AES X-098 Used as the basis for DRS audio technical metadata

PDF/A “PDF defines too many options; is there a ‘flavor’ that will be more ‘preservable’ over time?” Requires, recommends, and restricts PDF functionality to enhance preservability Standardized as ISO 19005

PREMIS PREservation Metadata: Implementation Strategies “What are the general metadata elements necessary to preserve digital content over time?” OCLC/RLG-sponsored work group Recommendations and best practices for preservation metadata –Core elements, data dictionary, implementation strategies, cooperative projects …

PREMIS PREservation Metadata: Implementation Strategies Report on current practices and recommended metadata elements available Maintained by LC

AIHT (Archive Ingest and Handling Test) “What difficulties can we expect to arise during the exchange of content between heterogeneous repositories?” LC-funded project to investigate exchange of complex data between preservation repositories Harvard, Stanford, Johns Hopkins, Old Dominion ingest and exchange web archive data

GDFR (Global Digital Format Registry) “What will need to know in the future about formats in use today, and how will we know it?” Shared registry of preservation-related information about technical format Reduce work for repositories to create and maintain information about objects they ingest …

GDFR (Global Digital Format Registry) Enables sharing of format expertise Directed by Harvard, implemented by OCLC Funded by Mellon Foundation

Registry of Digital Masters “How can I found out who has accepted archival responsibility for a given piece of content?” Initially reformatted materials; intention to expand to born-digital DLF project Implemented by and housed at OCLC

Repository certification “Why should a collection manager trust a digital repository?” RLG/OCLC report on Trusted Repository Attributes RLG/NARA Digital Repository Certification Task Force …

Repository certification Recommend structure and metrics of an international process for certifying preservation repositories –Organizational role and structure, staff size and skill, formal operations and documentation, appropriate technical infrastructure and facilities, on-going funding, and “hand-off” plan, etc. CRL Auditing and Certification project

Key activities elsewhere ISO OAIS (Open Archival Information System) LC NDIIPP (National Digital Information Infrastructure Preservation Program) Web archiving (IA, IIPC) NARA ERA (Electronic Records Archiving) Digital Curation Centre PLANETS

IIIWhat more do we need to do?

Evolution: from projects to program Digital preservation requires continual pro- active program –You can’t just stop and start –Time frames are MUCH shorter than for preservation of physical collections Need to define scope and role of our preservation efforts Investment required in both technology and staffing

Preservation lifecycle Creation –Format and technical specification choices –Accompanying metadata –Packaging for ingest Ingest –Validation –Normalization …

Preservation lifecycle Assumption of preservation responsibility Monitoring –When is intervention necessary? Changes to the technical environment Changes to user expectations Planning –Significant properties All preservation decisions involve choice; how to choose what to preserve? …

Preservation lifecycle Intervention (preserving usability) –Re-acquisition –Re-generation from an archival master –Migration before necessary (“just in case”) –Migration at point of request (“just in time”) –Emulation of obsolete technology in contemporary environment –Universal Virtual Computer (UVC) Rewrite necessary software to run on technology- agnostic “virtual” computer …

Preservation lifecycle Intervention (continued) –Save for digital archeologists After intervention –Post-intervention quality assurance –Documenting the process of change Succession planning –What do we do when we want to get out of the repository business?

Staffing and responsibilities Technical –Infrastructure maintenance –Monitor technological change –Integration into larger preservation environment –Preservation planning Curatorial –Preservation intervention will involve trade-offs What attributes need to be preserved? Cost/benefit analysis

Immediate challenges Google –Substantial increase in scale (both number and size) –“Dark” content; no expectation of current access Web archiving –Explosion of data types –No forethought on format selection and technical specifications –No metadata –Some failure may be inevitable

Coming soon? Institutional repository (IR) to enhance scholarly communication and preserve scholarly creations –Similar to web archiving: objects not typically created with preservation in mind, nor accompanied by metadata “Just in case” local copies of licensed content –May necessitate increased sophistication of IPR management

Longer term issues Economics – What can we afford to preserve? Scale – How much can we preserve? Selection – What do we leave for others? Federation – Can we share responsibilities for preservation? –Copies in independent environments are safest Certification – Do we need formal certification? –Note Section 108 revision Education – who at Harvard needs to understand?