South Carolina Information Technology Directors Association September 8, 2008 Bill Henry, Matt Guzzi SC Department of Archives and History
2 Background – Last Year 2007 NHPRC grant proposal not funded AZ Archives submitted multi-state grant proposal to Library of Congress AZ proposal had same basic goals SC too late for funding Paid own expenses to join project 2
3 Electronic Archives Funding One-time funding from General Assembly Digitize paper records Capture agency website snapshots Purchase hardware and software Library of Congress approved additional funds for project SC now a fully-funded partner 3
4 What is PeDALS? Persistent Digital Archives and Library System Multi-state grant project funded by the Library of Congress and the Institute for Museum and Library Services Five state partners: Arizona, Florida, New York, Wisconsin, South Carolina Project will run months; if successful, SCDAH intends to continue participation beyond this period At the end of the project each partner will have a functioning digital archives system 4
5 Why is PeDALS Needed? An increasing number of long-term and archival records are created and maintained only in digital formats Traditional archival practices designed for paper records won’t work in digital environment Need ability to preserve electronic records so that we can demonstrate authenticity and protect integrity PeDALS is both a learning opportunity and a chance to implement a functioning system 5
6 Technical Goals To develop a curatorial rationale that can be implemented in software to support an automated, integrated workflow to process collections of digital records To build “digital stacks” – storage that has appropriate controls for preservation and disaster preparedness 6
7 Traditional Curatorial Processes for Paper Records Appraisal Acquisition Arrangement and description Housing and storage Reference and access Preservation 7
8 Curatorial Rationale for Digital Records Transformation of traditional, paper-based practices into the digital arena Focus on the rules, not the records Automate the rules 8
9 Digital Stacks More than storing the data (CD, tape, disk) LOCKSS 1. Automatic integrity checking and error detection 2. Secure 3. Geographically distributed 9
10 Additional Goals To build a community of shared practice that meets the needs of a wide range of repositories - For best practices - For resource sharing To remove barriers by keeping costs as low as possible 10
11 The Open Archival Information System (OAIS) Reference Model OAIS an international (ISO) standard Defines minimal set of responsibilities for long-term preservation Can be applied to any information or object that needs to be retained long-term OAIS does not specify a specific design or implementation e/650x0b1.pdf e/650x0b1.pdf 11
12 View of an OAIS Environment Producer OAIS (PeDALS) Consumer Management
13 PeDALS (OAIS) Functional Areas Ingest Archival storage Data management Administration Preservation planning Access
14 PeDALS Overview - 1 Agency records in an electronic records system are transferred via the Internet to the PeDALS system Supplemental processing checks for file integrity and completeness prior to transfer
15 PeDALS Overview - 2 Agency records with associated metadata are transferred to middleware server (Microsoft BizTalk®) Rules-based software will transform records into format for long-term storage along with a copy for web access
16 PeDALS Overview - 3 Records are transferred into LOCKSS servers for long-term preservation LOCKSS is a “dark archives”
17 PeDALS Overview - 4 Public access will be provided via the web Restricted records will be blocked from public access
19 PeDALS Network Architecture Agency’s will have the ability to login and upload records to the South Carolina Digital Archive. Biz Talk will check the incoming records for completeness and matches the hash value on upload. 19
20 Archivist Review Once records are received the Archivist will receive an . The files will then be reviewed and a high level description will be entered in the Database Catalog. The SIP (Submission Information Package) is created. 20
21 Biz Talk This is where the magic happens. 21
22 Biz Talk Processes DIP (Dissemination Information Package) created. The Catalog database is updated with Access, Description and Preservation Information. The Archival records are placed on the Manifest Server for Ingest into LOCKSS. The public access database is updated. 22
23 LOCKSS (Lots of Copies Keep Stuff Safe) Based at Stanford University. LOCKSS has primarily been used for scientific journals and publications. Open Source and uses Open BSD which is a multi-platform 4.4BSD-based UNIX-like operating system. 23
24 LOCKSS Boots from CD = No operating system installed on the server. Communicates using a VPN virtual private network. Files for LOCKSS are stored on a separate Admin server running linux. 1 LOCKSS cluster with 7 Servers in our private distributed LOCKSS network. Initially setup to take in 1TB of data and can be expanded. 24
25 LOCKSS Storage Dark secure archival storage LOCKSS is a sophisticated data storage system that scans for and repairs file corruption and other data integrity problems Level 4 firewalls and geographic distribution provide added security 25
26 Public Access Process BizTalk Process - AIP (Archives Information Package). This process moves records from LOCKSS to the Public Access web server based on the record access date. 26
27 PeDALS Network Architecture Web server will provide Internet access to records through a web-based search interface. Access to records restricted by statute or otherwise will be blocked during restriction period. Restricted records are held in the LOCKSS dark archive no user copy is sent to the web server until public access is allowed. 27
28 Future Public Access We are currently in the process of implementing the web component of Rediscovery. This will allow the public to search our holdings. We are hoping to use Biz Talk to automatic populate the Rediscovery catalog. Public access will be granted through URls to the Rediscovery web component. 28
29 PeDALS Open Archival Information System (OAIS) Network Architecture 29
30 Records Eligible for PeDALS Permanently valuable electronic records scheduled for transfer to the SCDAH Pilot project agencies and records: Judicial Department – Supreme Court Case Files Election Commission – Voter Registration Master Files Public Service Commission – Orders DHEC – Electronic Index to Death Certificates 30
31 Project Status Core metadata defined and data dictionary completed System design completed Hardware and software acquired and installed Agency partners and records identified System prototype built (AZ & SC) BizTalk® training completed
32 On the Horizon Other states purchase and configure hardware & software First ingest of records in early winter Develop public search website
33 Post-Grant Move from pilot to production mode Develop procedures for agency participation Expand participation to additional agencies and records 33
34 PeDALS Bill Henry Electronic Records Consultant (803) Matt Guzzi Electronic Records Archivist (803)