Digital record keeping and preservation I Email as records ARK2100 Digital record keeping and preservation I 2016 Thomas Sødring thomas.sodring@hioa.no P48-R407 67238287
What is an email? It's a message that can be communicated between two parties that both have an email account
Email metadata To From CC BCC Date Subject Message-ID Unique Id Reply-To The address you should reply to Received Trace path Return-Path Final part of trace path
Email metadata Resent Information about message being resent Sender Address of the actual sender In-Reply-To Message-Id of another message if this message is a reply to another message References Message-Id of previous messages this message is related to
Email infrastructure To send an email, the email client first connects to an SMTP server Simple Mail Transfer Protocol We can connect to the gmail smtp server from a putty terminal (bibin) openssl s_client -connect smtp.gmail.com:465 -crlf Type HELO hi rcpt to: <bill.gates@gmail.com> An SMTP server talks to a Mail Transfer Agent to send the mail to the correct account
Email infrastructure A Post Office Protocol (POP3) server is where your email is stored until you download it With POP, the server deletes the email from the server when you download it Modern mail servers are based on the Internet Message Access Protocol (IMAP) where the messages are stored on the server Often download a copy to the client Allows directory structures on the server
Example SMTP exchange S: 220 smtp.example.com ESMTP Postfix C: HELO relay.example.org S: 250 Hello relay.example.org, I am glad to meet you C: MAIL FROM:<bob@example.org> S: 250 Ok C: RCPT TO:<alice@example.com> C: RCPT TO:<theboss@example.com> C: DATA S: 354 End data with <CR><LF>.<CR><LF> C: From: "Bob Example" <bob@example.org> C: To: Alice Example <alice@example.com> C: Cc: theboss@example.com C: Date: Tue, 15 January 2008 16:02:43 -0500 C: Subject: Test message C: C: Hello Alice. C: This is a test message with 5 header fields and 4 lines in the message body. C: Your friend, C: Bob C: . S: 250 Ok: queued as 12345 C: QUIT S: 221 Bye {The server closes the connection}
Class example I will now show an example of this via putty
Emails as records? During the 2016 US election there was a lot of focus on Clintons use of a private email server While she was was at the state department Some believed it was to buypass laws including archive and FOI Definitely records there, but also classified information (CC BY 2.0 https://www.flickr.com/photos/jeepersmedia/26665662473
Emails and record keeping Email and record keeping is definitely a problematic area Goes back to an employee becoming a user and being empowered This missing reflection over the need for record keeping, bothersome process Email is also seen as a private sphere, even though we typically have not done anything to keep contents private
Are emails records? There are many definitions of records, but it is commonly accepted that emails can be seen as records In our Noark perspective, an email is required to be registered with a registryEntry for it to formally become a record We think/know many emails have not been registered But are emails automatically records? No! In the Norwegian perspective they become records if they fall under journalling requirements of law or are defined as 'archive documents' with associated properties
Register an email How do I register en email? An email might be a record, but it might also only be a carrier of a record
Encoding & format Historically, encoding was in 7-bit ASCII and a standard called MIME was used to handle character sets and binary data Content-Type: text/plain SMTP is a standard for transmitting text (i.e. not binary) Unicode (UTF-8) is commonly used now Typically messages are sent either as plain text or html Plain text is easy to archive, html needs to be rendered to a single object (PDF)
HTML email HTML email can be a problem as a HTML page is just a set of instructions and requires a software (browser) to render it Particular problem is with embedded pictures! Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <pre wrap=""><i>Henry Olsen, Olsenveien 45, 3344 Olsenby </i> 10.11.16 <b>Olsen Municipailty, Mucky Road, 3344 Olsenby </b> Dear casehandler, Please find attached the drawings of the<font color="#ff0000"> new garage </font>my client wants to build at Olsenveien 45. We expect to start construction before Christmas. Regards John Snøhetta Architect </pre> </body> </html>
Attachments Binary Attachments to an email are typically encoded in base64 --_003_147799946453838379hioano_ Content-Type: application/pdf; name="drawing2.pdf" Content-Description: drawing2.pdf Content-Disposition: attachment; filename="drawing2.pdf"; size=481884; creation-date="Tue, 01 Nov 2016 09:21:08 GMT"; modification-date="Tue, 01 Nov 2016 09:21:19 GMT" Content-Transfer-Encoding: base64 JVBERi0xLjUNCiW1tbW1DQoxIDAgb2JqDQo8PC9UeXBlL0NhdGFsb2cvUGFnZXMgMiAwIFIvTGFu Zyhubi1OTykgL1N0cnVjdFRyZWVSb290IDc0IDAgUi9NYXJrSW5mbzw8L01hcmtlZCB0cnVlPj4+ Pg0KZW5kb2JqDQoyIDAgb2JqDQo8PC9UeXBlL1BhZ2VzL0NvdW50IDEyL0tpZHNbIDMgMCBSIDIz IDAgUiAzMSAwIFIgMzMgMCBSIDM1IDAgUiAzNyAwIFIgNDQgMCBSIDQ5IDAgUiA1MSAwIFIgNTUg MCBSIDYyIDAgUiA2OCAwIFJdID4+DQplbmRvYmoNCjMgMCBvYmoNCjw8L1R5cGUvUGFnZS9QYXJl bnQgMiAwIFIvUmVzb3VyY2VzPDwvRm9udDw8L0YxIDUgMCBSL0YyIDggMCBSL0YzIDEwIDAgUi9G NCAxMiAwIFIvRjUgMTcgMCBSL0Y2IDE5IDAgUi9GNyAyMSAwIFI+Pi9FeHRHU3RhdGU8PC9HUzcg NyAwIFI+Pi9Qcm9jU2V0Wy9QREYvVGV4dC9JbWFnZUIvSW1hZ2VDL0ltYWdlSV0gPj4vTWVkaWFC
base64 Base64 is a binary-to-text encoding scheme as emails traditionally did not transmit raw binary data Binary data is converted to ASCII in an email Converted back from ASCII to binary by recipient
Capstone* Instead of dealing with individual emails as records, capstone tries a different approach capturing all mails depending on your role in the organisation In many ways a capstone style approach turns the problem of email upside down Before users were required to register emails Now all emails are registered and users have to remove non-records The upside is that these records can be collected without requiring user input *https://www.youtube.com/watch?v=TZd12Ka9pnM
Capstone Accounts (c) National Archive (US). Image from https://arkivmote.files.wordpress.com/2016/04/arian-d-ravanbakhsk.pdf
Organisation Pyramids You look at your organisation and identify pyramids of (email) record creators The organisation will have a large pyramid but it is likely there are multiple smaller pyramids (departments etc.) within the organisation Each (sub) pyramid will have people with various roles (manager) and emails emanating from these roles will be records You work with email records at the account level, rather than at the item/email level
Capstone General Records Schedule Divides email accounts into three categories: Permanent Mail from senior executives 7 year temporary Largest category, most employees 3 year temporary Employees responsible for routine, administrative tasks
What about private email The downside to the capstone approach is that personal emails may find their way into the archive But capstone expects that personal mail has been culled / removed / deleted Sensitive (classified) information may also find its way into the archive
Capstone Capstone is an approach to handling the problem of records capture for records that find themselves within an email system Does not solve all problems Does not eliminate the need for a record keeping system It's an additional 'tool' that government can use to ensure they are capturing enough records
IM will be the problem Email has its limitations and issues, but it's instant messaging that will be the next problem IM clients exchanging information that if it was in an email would clearly be defined as a record But we are entering another private sphere, especially for the citizen IM is often carried out by third party services and we may have problems capturing records from IM But I can probably be solved with similar approaches to capstone