Download presentation
Presentation is loading. Please wait.
Published byAngela Henry Modified over 8 years ago
1
DArcMail Demonstration D igital Arc hive e Mail System Riccardo Ferrante @raferrante Smithsonian Institution Archives @SmithsonianArch Email Archiving Stewardship Tools Workshop Harvard University
2
Email data points Earliest email dated in the late 1980’s First email preserved digitally in 2005 Largest account preserved during CERP 80K emails Favorite example of large account 250,000+ emails Largest account to date = 30 Gb ???,??? emails Most recent account acquired last week 20 GB Primary processing and preservation tool DArcMail Some of the Smithsonian’s email platforms over the past 35 years.
3
Introducing a successor to the CERP Parser DArcMail
4
CERP Parser Works on one message or a whole account Does preservation: MBOX to XML Generates metadata files and attachments directory, etc. (i.e., the “package”) All components are open source, but – Squeak is not a popular platform – Raw XML is ugly – GUI is the order of the day
5
DArcMail in the SI Archives Context Appraisal is a precondition to acquisition. Documentation of accessions, their accessions, etc. happens in SIA’s collection management system (CMS). Digital preservation is as preemptive as possible; it begins as soon as an accession is finalized. Storage packages manually transferred to separate server and LTOs.
6
DArcMail CERP Parser functions plus searching, exporting Simple GUI 4x faster processing Runs on Python and MySQL Puts understanding the account first, preservation second
26
DArcMail Lifecycle stages outside DArcMail’s scope Appraisal, Capture and preliminary normalization if needed – MS Outlook for PSTs; MBOX client for other formats; Aid4Mail, MessageSave for preliminary normalization Sensitive Data Processing – MS Outlook for PSTs; MBOX client for other formats Repository – Transfer to spinning disk, tape Access – Online Discovery
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.