Digital Preservation Andrea Goethals Wendy Gogel From Harvard University Library NELA 18 October 2010
Digital Preservation 1. Why digital preservation? 2. What’s the problem? 3. What’s being done? 4. What can you do? 5. Questions?
1. Why digital preservation?
Everything is digital 1957 first digital image 1969 ARPAnet 1971 first sent 1972 first video game 1998 first digital theatrical release
Digital content may be… “… after 12:00 noon January 20, 2001, the National Archives and Records Administration ("NARA") shall have sole legal custody of all ClintonGore Administration electronic mail records that are governed by the Presidential Records Act ("PRA"), 44 U.S.C. 2201,…” Memorandum of Understanding between NARA and The Executive Office of the President, dated January 11, 2001 accessed Oct at: …historically significant.
Digital content may be… or your favorite movie. …your favorite song,
Digital content may be… Harvard Magazine May/June 2009 …the only version.
Digital content may be… …a work of art. Doug Aitken. (American, born 1968). sleepwalkers Six-channel video (color, sound), seven monitors, 12:57 min. The Dunn Bequest. © 2008 Doug Aitken. Photo: Fred Charles.
Digital content may be… …important to scholarship.
Who cares? Cultural Resource Institutions Museums, historical societies MOMA’s Matters in Media Arts Libraries, archives, special collections Academic institutions Governments National Library Of New Zealand’s NDHA NARA’s ERA The Entertainment Industry AFI Digital Preservation Project
Who cares? You and me, personally!
2. What’s the problem?
Digital content is… Transient Fragile Hidden 2400 B.C.E C.E.
Digital content is transient The average lifespan of a web site is between 44 and 100 days Captured April 8, 2009Visited October 13, 2010
Digital content is fragile Digital things are amazingly easy to destroy Bad people Software or hardware failure Human mistakes The slip of a finger or an unnoticed consequence of change can happen easily - and are potentially catastrophic “Help! Accidental deletion. I accidentally deleted 62 images… can you please recover them from backups?”
Digital content is hidden Loss is not always apparent Are either of these corrupt?
Digital content is hidden Loss is not always apparent Both are corrupt! Use helps but its not enough
Even if it’s safe is it usable??? It’s not enough to preserve the bits if the format of the bits is obsolete! WordStar? AppleWorks? Excel 1.0? To use digital content we are dependent on software that can understand the format…
The importance of format Understanding formats is fundamental to preservation ffd8ffe000104a ffed0fb050686f74 6f73686f e d 03e90a e e666f f40240ffeeffee fc d f d03ed0a f6c f 6e a
The importance of format Understanding formats is fundamental to preservation ffd8ffe000104a ffed0fb050686f74 6f73686f e d 03e90a e e666f f40240ffeeffee fc d f d03ed0a f6c f 6e a SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2...
The importance of format Understanding formats is fundamental to preservation ffd8ffe000104a ffed0fb050686f74 6f73686f e d 03e90a e e666f f40240ffeeffee fc d f d03ed0a f6c f 6e a SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2...
Using information content information content bits formats SW HW HW (paper) information content HW (paper) symbols language Analog book Unmediated use Digital book Technology-mediated use
Formats are key to determining usability information content bits formats SW HW supporting technologies digital content Formats are the bridge between the content we want to preserve and supporting technologies
Dependence on fleeting technology We are dependent on technology to interpret digital content... Technologies must understand the format of the content Technologies age and disappear!
3. What’s being done?
Primary goals of digital preservation 1. Keep the bits safe 2. Keep the bits useful to people
1. Keep the bits safe Infrastructure, processes, policies and professional staff to counter risks High quality storage Redundancy (multiple copies, multiple locations) Media refreshing (replacing) Integrity monitoring (check for corruption) Security and access management Content recovery
2. Keep the bits useful Provide ways for people to find it Provide ways to manage it Keep records of history and significant events Know what formats you have Make sure there’s technology to support the formats! “Technology watch” And if there’s not, force there to be technology that supports the formats (migration, emulation, creation of viewing software)
Degrees of preservation “passive preservation” aka “bit-level preservation” “active preservation” aka “full preservation” aka “logical preservation” better understood & less costly will not ensure long-term usability - ensures current and near-term usability more complex, challenging & costly requires more expertise but better ensures very long- term usability requires passive preservation
Degrees of preservation “passive preservation” aka “bit-level preservation” “active preservation” aka “full preservation” aka “logical preservation” Store Secure Maintain Prevent Migrate Re-engineer software Emulate Digital archaeology Monitor Restore Add value
Strategic thinking The least expensive, and most effective preservation measure is to think about the future when digital content is created! The content production matters! It makes good sense to try to influence the content creation process
Preservation lifecycle Create or acquire digital content Ingest into a preservation repository Continuous cycle of: Monitoring Planning Intervention Subject to collection management decisions Transfer to next generation of the repository or to a different repository A series of hand-offs over time
Ongoing commitment Requires continual pro-active program You can’t just stop and start Time frames are MUCH shorter than for preservation of physical collections Requires ongoing investment in both technology and staffing
Can’t do it alone More than any other library activity, preservation responsibility must be shared across institutions Even collectively we do not have adequate resources or understanding
Preservation community efforts Collaborative organizations (NDSA, IIPC, OPF) Collaborative projects (AIHT, TIPR) Standards and metadata Technical metadata for still images, audio, documents METS (package for metadata and digital objects) PREMIS (preservation metadata) “Preservable formats” (PDF/A) Repository certification Infrastructure Formats registry (UDFR, Pronom) Repository software (Fedora, DAITTSS, LOCKSS, etc.) Tools (Jhove, FITS, etc.)
4. What can you do?
First steps Inventory your content Identify where it is all kept web locations computer hard drive Removable media (CDs, etc.) Select Decide what is worth keeping Given a choice keep the highest quality version Is someone else already preserving it? Consider deleting content that's not needed
Second steps Organize your digital content Create a logical directory/folder structure for the content Give descriptive names to the files If possible tag or embed with descriptions Catalog your content Draft a summary description Keep your inventory and a summary description of the content and how you have it organized in a secure location
Third steps Make multiple copies of your content Use formats that are amenable to long-term survival Use open formats when possible Store on durable media Store in multiple locations Preferably in different disaster zones. Use it! Periodically check that you can access the content Migrate to new media over time.
Fourth steps Keep informed. LC's website Research, training and outreach (DCC, DPC, JISC, IIPC, NEDCC) p p Professional organizations (ALA, SAA) Conference proceedings (iPRES, IS&T Archiving, DLF) How to preserve your own digital materials (LC): basic characteristics of digital preservation repositories (CRL website): archives/metrics-assessing-and-certifying/core-re archives/metrics-assessing-and-certifying/core-re
Image Credits First digital image Pong: : First theatrically released: iPod ad: Avatar: Cuneiform 2400 BC: Book of Hours in French and Latin: Server: Sleepwalkers at MOMA: PRS data sets: Corrupt images: New Yorker Cover, June 8 and 15, 2009 and October 18, 2010
5.Questions?