Preserving Digital Collections Andrea Goethals Florida Center for Library Automation (FCLA)
Outline FCLA? The Motivation to Preserve Preservation Key FCLA Digital Archive Digital Preservation Infrastructure
The FCLA … Has 46 full-time staff Provides centralized automation support for over 50 libraries at Florida’s 10 public universities Is attached to UF only administratively Runs the largest central ILS in US
Motivation to Preserve Are we living in the “Digital Dark Age”? BBC’s 1986 Domesday Book vs. original 1086 Domesday Book Amount of digital information 93% of world info produced in 1999 (UC Berkeley 2000 study: “How Much Information”) Rate of technological change ‘Antique’ if 15+ years old Source: oldcomputers.net
Preservation Key: Fault-tolerance through Redundancy Centralized (A) Decentralized (B) Distributed (C) Source: Dodge, 2003
FCLA Digital Archive (FDA) Operational Fall 2003 - ? Funding help from IMLS Goals: Establish a working digital preservation archive for the use of the libraries of FL’s public universities Identify costs involved with sufficient granularity to support reasonable cost-recovery pricing To disseminate tools, procedures and results for the widest national impact
FDA Preservation Approach Still in flux (!) Dark archive Automated – DAITSS (Dark Archive In The Sunshine State) 2 levels of preservation (bit-level, full) Always keep the originals Plan for migrations from the very beginning Combination of ‘traditional’ format migration, migration on request, and normalization (archival data formats, converting to standards, canonicalization)
FDA Business Plan Free! All data contributions through libraries (Until the end of the grant period, then cost-recovery) All data contributions through libraries Libraries are our customers, we are building in customer options
FDA Ingest Example CIP XML PDF AVI
FDA Ingest Example SIP XML XML CIP XML XML PDF AVI XML
FDA Ingest Example AIP SIP CIP XML XML XML XML XML XML XML TIFF TIFF PDF AVI TIFF XML TIFF TIFF Database Records XML
Digital Preservation Infrastructure Commonly-accepted terminology OAIS Model (Partially helpful to us) Good Typology of Preservation Strategies Thibodeau’s matrix Preservation Metadata METS (LOC) - Technical metadata still developing NISO Technical Metadata for Digital Still Images (MIX schema) LOC A/V prototyping project PREMIS (OCLC, RLG)
Source: Thibodeau, 2002. APPLICABILITY OBJECTIVE General Specific Persistent Archives Universal Virtual Computer General Object Interchange Format Typed Object Conversion Virtual Machine Rosetta Stone Translation APPLICABILITY Format Standardization Emulation Re-engineer Software Programmable Chips Version Migration Viewer Maintain original technology Specific Preserve Technology Preserve Objects OBJECTIVE Source: Thibodeau, 2002.
Digital Preservation Infrastructure File Format Knowledge-base Format info (Global Format Registry, PRONOM, FCLA) Archived specifications Recommended submitted formats and why? Recommended migrations and normalizations (FCLA, Nat’l Archives of Australia) Migration Experiments (Harvard) Economic information for preservation ‘Fair’ billing models (effect of formats, preservation strategies)
Digital Preservation Infrastructure Digital Archive Software Open-source archive software Fedora, DAITSS, DSpace Software for automatic format recognition, technical metadata extraction What can read this format? (PRONOM, Global Format Registry?) File format converters Open source software (Ghostscript, etc.)
Dodge, Martin. An Atlas of Cyberspaces, 2003. http://www DSpace http://www.dspace.org FCLA Digital Archive / DAITSS http://www.fcla.edu/digitalArchive/ Fedora Project http://www.fedora.info Global Registry for Digital Format Representation Information http://hul.harvard.edu/formatregistry/ Reference Model for an Open Archival Information System (OAIS) http://ssdoo.gsfc.nasa.gov/nost/isoas/ Library of Congress A/V Prototyping Project http://lcweb.loc.gov/rr/mopic/avprot/metsmenu2.html METS http://www.loc.gov/standards/mets/ MIX http://www.loc.gov/standards/mix/ National Archives of Australia http://www.naa.gov.au/recordkeeping/preservation/digital/summary.html PREMIS http://www.oclc.org/research/pmwg/ PRONOM http:www.pro.gov.uk/about/preservation/digital/pronom/default.htm Thibodeau, Kenneth. “Overview of Technological Approaches to Digital Preservation and Challenges in Coming Years”, The State of Digital Preservation: An International Perspective, Conference Proceedings, July 2002.