Preservation of Electronic Mail Druscie Simpson NC State Archives November 19, 2004
The Digital Divide Also Multiplies
as a Burden The Radicati Group and Merrill Lynch estimate that is growing at a rate of 300% annually. The Age (July 8, 2003) The real problem: not more , but “larger and larger attachments, generating an average of 5MB of content” daily. The Age (July 8, 2003) generates about 400,000 terabytes of new information each year worldwide About 31 billion s are sent daily, on the Internet and elsewhere, a figure which is expected to double by 2006 (source: International Data Corporation (IDC). The average is about 59 kilobytes in size, thus the annual flow of s worldwide is 667,585 terabytes. (How Much Information 2003, UC Berkeley)
What do I do with ALL that e ‑ mail?! Why are we so interested in E ‑ Mail and Digital Records? ’s far reaching effects
Loss of Corporate Knowledge Imagine you’re new in the office. All of the information to do your job was on your computer. Your predecessor deleted the information before leaving or it was password protected. You don’t have the password.
Legal Implications If it is in an and it sent from, received by, or is stored on a government computer, it is a legal record Never put anything in an you don ’ t want on the front page of the local paper. Always CYO cover your office.)
Users have several options for keeping their saved s: They may leave it on the mail provider’s server They may leave it on a web-based mail server such as Hotmail or Yahoo They may store it in their client such as Outlook, Eudora, Netscape They may store it on the file system of their PC as individual.eml files (MS Outlook Express Electronic Mail)
In each of these circumstances the actual byte stream used to represent the message is slightly different. While an server and client are obliged to communicate with each other using standards (SMTP, POP3, and IMAP) they are not required to store the using any sort of standard.
We will be looking for a solution that will have the widest possible use Start with an IMAP server Enhance server with the ability to take the contents of its message store and create the desired standard XML files called XMTP Using XMTP, SMTP messages can be transformed via XSLT into HTML pages for viewing. XMTP has been used to implement a telemedicine consultation system using SMTP e- mail and HTML In the testing phase, but not launched yet
IMAP seems to be the only protocol that supports moving and copying messages from place to place while preserving the message’s native format. This means that no matter where the message ends up, almost any IMAP compliant client can send it to an “archives” server.
How? Have the user send directly to a server hosted by the NC State Archives Have the user send to an enhanced IMAP server maintained by their agency This would enable the agency to be able to locally access the archives messages IMAP server could then send snapshots to or send us the XMTP files on electronic media via USPS
Have the user collect and send.pst files to the NC State Archives Archives will open them with Outlook and move them to the enhanced IMAP server (process would be automated) Archives should also be able to access packages of in other formats since Outlook can convert from Eudora, Netscape, etc. Once loaded into Outlook, the packages would then be sent to the IMAP server.
Any strategy based on the interception of the data stream is out since we want to collect the message only after the user has been given a chance to cull and organize them.
Our proposal is to use hmailserver (a source forge open source project) which is an IMAP server that uses MySql or Microsoft SQL server as its message store.
The hMailServer installation contains a minimal MySQL-installation, so if you don't already have a database server in your network, MySQL is installed automatically when you install hMailServer. The XML creation utility could interface directly with the message store instead of the IMAP protocol. Hmailserver comes with an attendant com component that can be used to access the data store
Life of an message message is sent to the user’s mail server User downloads the message to his/her mailbox User optionally places the message into a folder on his/her local system User creates a folder on the “Archive” IMAP server User moves the mail from his/her inbox or specified folder to the folder on the “Archives” IMAP server An administrator requests that the IMAP server create one or more XML files containing the user’s XML files are saved as a preservation copy
Access to #1 Load the XML into ENCompass Utilize the IMAP server by enhancing it to provide web access to its native store similar to the user interface provided by Lurker
Access to #2 Utilizing Documentum by enhancing it to ingest the XML produced by the IMAP server. Documentum server would be used purely as an repository, not as a document management application. Utilize Documentum as a document management application to interfile messages into named record series
Access to #3 Move messages into a Share Point Portal server Use Outlook to collect the message from the IMAP server and send them to SPP. Switch-to-Switch Protocol. Protocol specified in the DLSw standard, used by routers establish DLSw connections, locate resources, forward data, and handle flow control and error recovery.? XML files would serve purely as a preservation copy.
This Particular Project Take 6 gigabytes of from Governor Jim Hunt’s administration ( ; bulk dates ) and make it accessible and preservable. has been appraised and culled to create the core for preservation is in Microsoft Outlook.pst files and can be accessed only by using the correct version of Outlook Create/utilize programs to move the s out of Microsoft’s proprietary.pst format into a non-proprietary and stable XML format
Also want to write software that is more universal in scope and can be used with most electronic records. Hire a programmer to write code to convert the.pst files from their format to XML format Take the converted XML files and load them onto our server and make them available to the public via the web and searchable through our online catalog system (ENCompass/MARS)
Wish us luck! We are very excited to have this opportunity to explore this potential solution We hope to take what we learn and apply it to the collection of other electronic government resources that are archival We’ll keep you posted!