DArcMail Demonstration D igital Arc hive e Mail System Riccardo Smithsonian Institution Archiving Stewardship Tools Workshop Harvard University
data points Earliest dated in the late 1980’s First preserved digitally in 2005 Largest account preserved during CERP 80K s Favorite example of large account 250,000+ s Largest account to date = 30 Gb ???,??? s Most recent account acquired last week 20 GB Primary processing and preservation tool DArcMail Some of the Smithsonian’s platforms over the past 35 years.
Introducing a successor to the CERP Parser DArcMail
CERP Parser Works on one message or a whole account Does preservation: MBOX to XML Generates metadata files and attachments directory, etc. (i.e., the “package”) All components are open source, but – Squeak is not a popular platform – Raw XML is ugly – GUI is the order of the day
DArcMail in the SI Archives Context Appraisal is a precondition to acquisition. Documentation of accessions, their accessions, etc. happens in SIA’s collection management system (CMS). Digital preservation is as preemptive as possible; it begins as soon as an accession is finalized. Storage packages manually transferred to separate server and LTOs.
DArcMail CERP Parser functions plus searching, exporting Simple GUI 4x faster processing Runs on Python and MySQL Puts understanding the account first, preservation second
DArcMail Lifecycle stages outside DArcMail’s scope Appraisal, Capture and preliminary normalization if needed – MS Outlook for PSTs; MBOX client for other formats; Aid4Mail, MessageSave for preliminary normalization Sensitive Data Processing – MS Outlook for PSTs; MBOX client for other formats Repository – Transfer to spinning disk, tape Access – Online Discovery