Presentation is loading. Please wait.

Presentation is loading. Please wait.

Preserving the H-Net Lists: A Case Study in Trusted Digital Repository Assessment Lisa M. Schmidt

Similar presentations


Presentation on theme: "Preserving the H-Net Lists: A Case Study in Trusted Digital Repository Assessment Lisa M. Schmidt"— Presentation transcript:

1 Preserving the H-Net E-Mail Lists: A Case Study in Trusted Digital Repository Assessment Lisa M. Schmidt lisa.schmidt@matrix.msu.edu http://www.h-net.org/archive/ MATRIX: The Center for Humane Arts, Letters & Social Sciences Online Michigan State University January 25, 2009

2 Preserving the H-Net E-Mail Lists H-Net Background Current Preservation Practices Use of the Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) Preservation Improvement Plan

3 H-Net: Humanities and Social Sciences Online International consortium of scholars and teachers Oldest collection of born-digital and content- moderated arts, humanities, and social science material on the Internet Valuable scholarly resource –More than 180 networks, or e-mail lists –More than 230 “private” lists More than 1 million e-mail messages Hosted by MATRIX

4 MATRIX Digital humanities research center Devoted to the application of new technologies in humanities and social science teaching and research Uses Internet technologies to improve education and increase the flow of information

5 NHPRC Grant Conduct assessment of existing H-Net preservation policies and practices Apply OCLC/CRL TRAC checklist Develop and implement an improved long- term preservation plan Useful to those managing large collections of electronic records Research semantic clustering search techniques

6 How H-Net Works: Backup & Security 3 TB of data, including H-Net Server rack kept in climate controlled, physically secured room Daily incremental backups, weekly full Monthly full, “permanent” tape backups

7 How H-Net Works: Posting Messages H-Net runs on LISTSERV Software Users must be list subscribers to post Messages written in plain text No attachments allowed on public lists Editors approve and post messages Editors can overwrite creation metadata

8 How H-Net Works: Posting Messages Message Posting Process

9 How H-Net Works: Archiving of Lists Messages post from a few seconds up to several days after approval Messages kept in flat text files called “notebooks” Notebook includes messages posted during a weekly time period

10 How H-Net Works: Archiving of Lists Time PeriodDay of Month a1-7 b8-14 c15-21 d22-28 e29-31 Ex. “h-africa.log0802a”

11 How H-Net Works: Archiving of Lists BRS Database –Newest notebook messages parsed and copied every 24 hours –MD5 hashes created for each message –Available for full-text search MySQL Database Cache –Key metadata extracted, MD5 hashes created, written to database cache –Enables more efficient browsing

12 How H-Net Works: Archiving of Lists Message Metadata Stored in MySQL Database

13 How H-Net Works: Message Retrieval http://h-net.msu.edu/cgi-bin/logbrowse.pl?trx=vx&list=H-Albion &month=0808&week=b&msg=w8utW6nKNO1FuY19vSK2mo &user=&pw=

14 Current Preservation Practices Message Ingest, Storage, and Retrieval Processes

15 Current Preservation Practices Backup, but only local—and no true archiving Significant property: message/notebook content, stored in plain text formats –No need for XML encoding –Attachments require migration plan Authenticity –Informal check by author and/or editor on posting –Broken URL on message retrieval attempt –Cached metadata as PDI Reference, Content, Provenance Information MD5 hashes for message discovery, not fixity No Fixity Information for notebook files

16 Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) Standards for accreditation of archives called for in OAIS Reference Model (January 2002) Foundation in RLG/OCLC’s Trusted Digital Repositories: Attributes and Responsibilities (May 2002) –Definition: Repository that provides reliable, long-term access to managed digital resources –Attributes: OAIS Compliance, Administrative Responsibility, Organizational Viability, Financial Sustainability, Technological and Procedural Suitability, System Security, Procedural Accountability –Recommendations on organizational and technological responsibilities

17 Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) NARA and RLG Task Force on Digital Repository Certification established 2003 TRAC 1.0 published by CRL and OCLC, February 2007

18 Other Certification Methodologies for Trusted Digital Repositories Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) http://www.repositoryaudit.eu/ http://www.repositoryaudit.eu/ Network of Expertise in long-term STOrage and long-term availability of digital Resources in Germany (nestor) http://www.langzeitarchivierung.de/index.php http://www.langzeitarchivierung.de/index.php ISO standard: Digital Repository Audit and Certification Working Group Wiki http://wiki.digitalrepositoryauditandcertificatio n.org/bin/view/Main/WebHome http://wiki.digitalrepositoryauditandcertificatio n.org/bin/view/Main/WebHome

19 Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) For certification by third party or self assessment –Looks at digital information management system –Understands threats to /risks within system –Constant monitoring, planning, maintenance –Conscious actions / strategy implementations Three sections –A. Organizational Infrastructure –B. Digital Object Management –C. Technologies, Technical Infrastructure, & Security

20 Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) Compare core audit criteria to local capabilities—“Gap Analysis,” illuminating areas requiring improvement Formulate strategies to narrow the gap and improve trustworthiness of repository

21 Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) Applicability of criteria –Varies per context of institution –Not all criteria will apply to all repositories Examples of documentation, other evidence –List of minimum required documents that satisfy multiple requirements (Appendix 3) Contingency and succession plans Policies relating to legal permissions Financial procedures Procedures related to ingest Preservation and storage/migration strategies Policy for access Disaster plans

22 Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC)

23 TRAC: A. Organizational Infrastructure Characteristics of the repository organization that affect performance, accountability, and sustainability –A1. Governance & Organizational Viability –A2. Organizational Structure & Staffing –A3. Procedural Accountability & Policy Framework –A4. Financial Sustainability –A5. Contracts, Licenses, & Liabilities

24 TRAC: A. Organizational Infrastructure A1. Governance & Organizational Viability –A1.2. “Repository has an appropriate, formal succession plan, contingency plans, and/or escrow arrangements in place in case the repository ceases to operate or the governing or funding institution substantially changes in scope.” Evidence: succession plan(s); escrow plan(s); explicit and specific statement documenting the intent to ensure continuity of the repository and the steps taken and to be taken to ensure continuity; formal documents describing exit strategies and contingency plans; depositor agreements –H-Net: No succession plan currently in place –Narrow the gap: Identify, negotiate with, and make preliminary plans with potential successor; write policy that states intent and describes what’s needed in successor

25 TRAC: A. Organizational Infrastructure A3. Procedural accountability & policy framework –A3.3. “Repository maintains written policies that specify the nature of any legal permissions required to preserve digital content over time, and repository can demonstrate that these permissions have been acquired when needed.” Evidence: deposit agreements; records schedule; digital preservation policies; records legislation and policies; service agreements –H-Net: Policy on Copyright/Intellectual Property, Constitution, By-Laws; authors retain copyright, sending message constitutes granting permission for distribution and (implicit) preservation –No gap! Restate policies to include explicit permission to preserve messages; link to copyright policies from digital preservation plan

26 TRAC: B. Digital Object Management Repository functions, processes, and procedures needed to ingest, manage, and provide access to digital objects for the long term –B1. Ingest: Acquisition of Content –B2. Ingest: Creation of the Archival Package –B3. Preservation Planning –B4. Archival Storage & Preservation/Maintenance of AIPs –B5. Information Management –B6. Access Management

27 TRAC: B. Digital Object Management B2 Ingest: Creation of the Archival Package –B2.1 “Repository has an identifiable, written definition for each AIP or class of information preserved by the repository.” Evidence: Documentation identifying each class of AIP and describing how each is implemented within the repository. Implementations may, for example, involve some combination of files, databases, and/or documents –H-Net: Not documented at time of analysis –Narrow the gap: Preservation strategy document, Information Packages document, clear policies

28 TRAC: B. Digital Object Management B4 Archival Storage & Preservation/Maintenance of AIPs –B4.4 “Repository actively monitors integrity of archival objects (I.e., AIPs).” Evidence: Logs of fixity checks (e.g., checksums); documentation of how AIPs and Fixity Information are kept separate –H-Net: No integrity monitoring at time of analysis –Narrow the gap: Use MD5 hashes to perform fixity checks, keep logs of most current fixity checks, document fixity process

29 TRAC: C. Technologies, Technical Infrastructure, & Security Adequacy of technical infrastructure and ability to meet management and security demands of repository and its digital objects –C1. System Infrastructure –C2. Appropriate Technologies –C3. Security

30 TRAC: C. Technologies, Technical Infrastructure, & Security C1. System Infrastructure –C1.1 Repository functions on well-supported operating systems and other core infrastructural software Evidence: Software inventory; system documentation; support contracts; use of strongly community-supported software (e.g., Apache) –H-Net: H-Net servers run on Debian distribution of Linux –No gap! Include in digital preservation policy documents

31 TRAC: C. Technologies, Technical Infrastructure, & Security C3. Security –C3.4 Repository has suitable written disaster preparedness and recovery plan(s), including at least one off-site backup of all preserved information together with an off-site copy of the recovery plan(s). Evidence: ISO 17799 certification; disaster and recovery plans; information about and proof of at least one off-site copy of preserved information; service continuity plan; documentation linking roles with activities; local geological, geographical, or meteorological data or threat assessments –H-Net: No plan in place, no off-site backups at time of assessment –Narrowing the gap: Develop disaster recovery plan, off-site storage plans; tap into greater MSU disaster recovery plan

32 The TRAC Experience Tedious—84 criteria! Required consultations with systems admin, other MATRIX and H-Net staff Conducted over course of a week Thorough yet flexible, leaving room for interpretation, lots of options for supporting documentation/evidence

33 The TRAC Experience Good snapshot of current state of repository Clarifies what’s needed to narrow the gap Great internal audit tool Useful for certification of a trusted digital repository

34 Integrated Rule Oriented Data System (iRODS) and TRAC Data grid software system developed by the Data Intensive Cyber Environments (DICE) group (http://www.irods.org)http://www.irods.org Provides a unified view and seamless access to distributed digital objects across a wide area network Goal: Automate archival processes, data management tasks, verification of assessment criteria (TRAC) How: Map management policies to rules enforced by data management system DICE wishes to use H-Net as iRODS testbed

35 Digital Preservation Management Workshop and TRAC Workshop developed by Nancy McGovern, Digital Preservation Officer at ICPSR Combines organizational and technological perspectives for institutions to develop appropriate responses to challenges of digital preservation –Two foundation documents Trusted Digital Repositories: Attributes and Responsibilities OAIS Reference Model –Attributes of a Trusted Digital Repository

36 Digital Preservation Management Workshop and TRAC Action plans for policy, technology, and resource frameworks that map to TRAC –Organizational Infrastructure  Section A –Technological Infrastructure  Sections B, C –Resource Framework  Section A Wealth of policy examples Tools for determining costs 2009 workshops: http://www.icpsr.umich.edu/dpm/

37 Preservation Improvement Plan: Backup & Archival Storage Backup Backup log Secure storage Second permanent backup tape stored offsite Reciprocal backup storage arrangement with ICPSR Archival Storage Annual copying to tape of H-Net data, databases,scripts Media refreshment every 5 years Copy to alternative storage repository Participation in distributed archival storage system, such as iRODS or LOCKSS

38 Preservation Improvement Plan: Authenticity Fixity: Individual Messages (SIPs/AIPs) Shorten time window for generation of MD5 hashes Create database of MD5 hashes for fixity checks Validate message hashes on notebook completion Fixity: Notebook Files (AICs) Create SHA-2 message digests on completion of notebooks Calculate SHA-2 message digests for existing notebooks Create database of SHA-2 message digests for fixity checks Validate notebook hashes on weekly basis

39 Preservation Improvement Plan: Authenticity Accurate Message Creation Metadata Build list editing web interface for editors H-Net Council decided not to implement Restriction of Editors’ Administration Capabilities Eliminate editors’ ability to retrieve and change notebooks Restrict notebook modification rights to MATRIX postmasters H-Net Tampering Risk? Low—staff with root system account privileges are trusted employees No action required

40 Preservation Improvement Plan: Attachments Browser Access for Private Lists Provide constructed URLs, as with public lists Provide download links to attachments Migration Strategy Conduct inventory of attachments on H-Net-related lists Establish or leverage technology watch Provide links to websites containing conversion tools

41 Preservation Improvement Plan: Narrowing the TRAC Gap Technical improvements underway Digital preservation policies to be written Lather, rinse, repeat: New TRAC assessment Future: Use TRAC to design MSU institutional repository

42 References H-Net Archives Project, http://www.h-net.org/archive/http://www.h-net.org/archive/ H-Net: Humanities and Social Sciences Online, http://www.h-net.org http://www.h-net.org MATRIX: The Center for Humane Arts, Letters, and Social Sciences Online, http://www.matrix.msu.eduhttp://www.matrix.msu.edu OAIS Reference Model, http://public.ccsds.org/publications/archive/650x0b1.pdf http://public.ccsds.org/publications/archive/650x0b1.pdf Trusted Digital Repositories: Attributes and Responsibilities, http://www.oclc.org/programs/ourwork/past/trustedrep/r epositories.pdf http://www.oclc.org/programs/ourwork/past/trustedrep/r epositories.pdf Trustworthy Repositories Audit & Certification: Criteria and Checklist, http://www.crl.edu/PDF/trac.pdfhttp://www.crl.edu/PDF/trac.pdf


Download ppt "Preserving the H-Net Lists: A Case Study in Trusted Digital Repository Assessment Lisa M. Schmidt"

Similar presentations


Ads by Google