Preserving the H-Net Lists: A Case Study in Trusted Digital Repository Assessment Lisa M. Schmidt

Slides:



Advertisements
Similar presentations
Platter Planning Tool For Trusted Electronic Repositories
Advertisements

A centre of expertise in data curation and preservation London :: ARK Group Workshop: Archiving the Web :: 28 Sept 2006 Funded by: This work is licensed.
May 16, 2012EDMC Workshop in College Park MDDan Kowal Trusted Digital Repositories: A New Audit Standard A Follow-on to the OAIS Dan Kowal, Data Administrator,
Fedora Users’ Conference Rutgers University May 14, 2005 Researching Fedora's Ability to Serve as a Preservation System for Electronic University Records.
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
Institutional Repositories It’s not Just the Technology New England Archivists Boston College March 11, 2006 Eliot Wilczek University Records Manager Tufts.
Authentication of the Federal Register Charley Barth Director, Office of the Federal Register United States Government.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Sustainable Preservation Services for Archivists through Distributed Custody Caryn Wojcik State of Michigan Records Management Services.
Preserving a Born-Digital Archive: The H-Net Lists Lisa M. Schmidt MATRIX: The Center.
By Eileen Clegg Digital Preservation at Columbia in the Old Days (2009)
TRAC / TDR ICPSR Trustworthy Digital Repositories.
DESIGNING A PUBLIC KEY INFRASTRUCTURE
Information Security Policies and Standards
DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Rutgers University Libraries What is RUcore? o An institutional repository, to preserve, manage and make accessible the research and publications of the.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Persistent Digital Archives and Library System (PeDALS) A Guide for Wisconsin State Agencies.
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
Electronic Mail List Preservation Takes Off: The H-Net Archive Lisa M. Schmidt MATRIX: The Center.
Preserving Electronic Mailing Lists: The H-Net Archive H-Net Mapped to the OAIS Model Preservation AssessmentPreservation improvementsOverview How H-Net.
Statewide Digitization and the FCLA Digital Archive Priscilla Caplan, Florida Center for Library Automation Statewide Digitization Planners Meeting OCLC,
Security Baseline. Definition A preliminary assessment of a newly implemented system Serves as a starting point to measure changes in configurations and.
Science Archives in the 21st Century 25/26 April Towards an International standard for Audit and Certification of Digital Repositories David Giaretta.
Petra Pejšová, National Technical Library, Czech Republic Marcus Vaska, University of Calgary, Canada GL13, DECEMBER 5-6, 2011 Audit DRAMBORA for Trustworthy.
Repository Requirements and Assessment August 1, 2013 Data Curation Course.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Peter Doorn Data Archiving.
1 A journey of a thousand miles begins with a single step. Chinese Proverb.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
OAIS Open Archival Information System. “Content creators, systems developers, custodians, and future users are all potential stakeholders in the preservation.
Reference Model for an Open Archival Information System (OAIS) ESIP Summer Meeting John Garrett – ADNET Systems at NASA/GSFC ESIP Summer Meeting.
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
OAIS in the Library Environment Managing and Preserving Electronic Resources FLICC/CENDI Washington DC, December 11,2001 Anne Van Camp RLG, Member Initiatives.
DigCCurr Professional Institute: Curation Practices for the Digital Object Lifecycle Digital Curation Program Development Nancy Y McGovern Research Assistant.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
HathiTrust’s Past, Present and Future. Short- and Long-term Functional Objectives Short-term Page turner mechanism (and Mobile!) Branding (overall initiative;
The Canadian Information Network for Research in the Social Sciences and Humanities Tim Au Yeung and Mary Westell Libraries.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
1 Strategic Plan for Digital Archives Programme DAP PROJECT SCOPE OVERVIEW STATUS.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
APT Trustworthy Digital Repository / Certification Working Group Progress Report, October 2015 Stephen Paul Davis, Columbia University Libraries.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
April 12, 2005 WHAT DOES IT MEAN TO BE AN ARCHIVES? Trusted Digital Repository Model Original Presentation by Bruce Ambacher Extended by Don Sawyer 12.
NDSR Boston webinar: Digital Preservation Introduction Presenter: Nancy Y McGovern October 2015.
The Project Three-year grant from the National Historical Publications and Records Commission (NHPRC), April 2010-March 2013 Develop electronic records.
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
Aligning Digital Preservation Policies with Community Standards Nancy McGovern Digital Preservation Officer.
Managing Access at the University of Oregon : a Case Study of Scholars’ Bank by Carol Hixson Head, Metadata and Digital Library Services
ARIADNE is funded by the European Commission's Seventh Framework Programme Archiving and Repositories Holly Wright.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
SEDAC Long-Term Archive Development Robert R. Downs Socioeconomic Data and Applications Center Center for International Earth Science Information Network.
Information Resource Stewardship A suggested approach for managing the critical information assets of the organization.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
Trusted Repository Systems Overview
Trustworthiness of Preservation Systems
Implementing an Institutional Repository: Part II
Certifying Preservation Actions - TRAC and related initiatives
Certifying Preservation Actions - TRAC and related initiatives
Robin Dale RLG OAIS Functionality Robin Dale RLG
Nancy Y. McGovern Digital Preservation Officer, ICPSR IASSIST 2007
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

Preserving the H-Net Lists: A Case Study in Trusted Digital Repository Assessment Lisa M. Schmidt MATRIX: The Center for Humane Arts, Letters & Social Sciences Online Michigan State University January 25, 2009

Preserving the H-Net Lists H-Net Background Current Preservation Practices Use of the Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) Preservation Improvement Plan

H-Net: Humanities and Social Sciences Online International consortium of scholars and teachers Oldest collection of born-digital and content- moderated arts, humanities, and social science material on the Internet Valuable scholarly resource –More than 180 networks, or lists –More than 230 “private” lists More than 1 million messages Hosted by MATRIX

MATRIX Digital humanities research center Devoted to the application of new technologies in humanities and social science teaching and research Uses Internet technologies to improve education and increase the flow of information

NHPRC Grant Conduct assessment of existing H-Net preservation policies and practices Apply OCLC/CRL TRAC checklist Develop and implement an improved long- term preservation plan Useful to those managing large collections of electronic records Research semantic clustering search techniques

How H-Net Works: Backup & Security 3 TB of data, including H-Net Server rack kept in climate controlled, physically secured room Daily incremental backups, weekly full Monthly full, “permanent” tape backups

How H-Net Works: Posting Messages H-Net runs on LISTSERV Software Users must be list subscribers to post Messages written in plain text No attachments allowed on public lists Editors approve and post messages Editors can overwrite creation metadata

How H-Net Works: Posting Messages Message Posting Process

How H-Net Works: Archiving of Lists Messages post from a few seconds up to several days after approval Messages kept in flat text files called “notebooks” Notebook includes messages posted during a weekly time period

How H-Net Works: Archiving of Lists Time PeriodDay of Month a1-7 b8-14 c15-21 d22-28 e29-31 Ex. “h-africa.log0802a”

How H-Net Works: Archiving of Lists BRS Database –Newest notebook messages parsed and copied every 24 hours –MD5 hashes created for each message –Available for full-text search MySQL Database Cache –Key metadata extracted, MD5 hashes created, written to database cache –Enables more efficient browsing

How H-Net Works: Archiving of Lists Message Metadata Stored in MySQL Database

How H-Net Works: Message Retrieval &month=0808&week=b&msg=w8utW6nKNO1FuY19vSK2mo &user=&pw=

Current Preservation Practices Message Ingest, Storage, and Retrieval Processes

Current Preservation Practices Backup, but only local—and no true archiving Significant property: message/notebook content, stored in plain text formats –No need for XML encoding –Attachments require migration plan Authenticity –Informal check by author and/or editor on posting –Broken URL on message retrieval attempt –Cached metadata as PDI Reference, Content, Provenance Information MD5 hashes for message discovery, not fixity No Fixity Information for notebook files

Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) Standards for accreditation of archives called for in OAIS Reference Model (January 2002) Foundation in RLG/OCLC’s Trusted Digital Repositories: Attributes and Responsibilities (May 2002) –Definition: Repository that provides reliable, long-term access to managed digital resources –Attributes: OAIS Compliance, Administrative Responsibility, Organizational Viability, Financial Sustainability, Technological and Procedural Suitability, System Security, Procedural Accountability –Recommendations on organizational and technological responsibilities

Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) NARA and RLG Task Force on Digital Repository Certification established 2003 TRAC 1.0 published by CRL and OCLC, February 2007

Other Certification Methodologies for Trusted Digital Repositories Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) Network of Expertise in long-term STOrage and long-term availability of digital Resources in Germany (nestor) ISO standard: Digital Repository Audit and Certification Working Group Wiki n.org/bin/view/Main/WebHome n.org/bin/view/Main/WebHome

Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) For certification by third party or self assessment –Looks at digital information management system –Understands threats to /risks within system –Constant monitoring, planning, maintenance –Conscious actions / strategy implementations Three sections –A. Organizational Infrastructure –B. Digital Object Management –C. Technologies, Technical Infrastructure, & Security

Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) Compare core audit criteria to local capabilities—“Gap Analysis,” illuminating areas requiring improvement Formulate strategies to narrow the gap and improve trustworthiness of repository

Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) Applicability of criteria –Varies per context of institution –Not all criteria will apply to all repositories Examples of documentation, other evidence –List of minimum required documents that satisfy multiple requirements (Appendix 3) Contingency and succession plans Policies relating to legal permissions Financial procedures Procedures related to ingest Preservation and storage/migration strategies Policy for access Disaster plans

Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC)

TRAC: A. Organizational Infrastructure Characteristics of the repository organization that affect performance, accountability, and sustainability –A1. Governance & Organizational Viability –A2. Organizational Structure & Staffing –A3. Procedural Accountability & Policy Framework –A4. Financial Sustainability –A5. Contracts, Licenses, & Liabilities

TRAC: A. Organizational Infrastructure A1. Governance & Organizational Viability –A1.2. “Repository has an appropriate, formal succession plan, contingency plans, and/or escrow arrangements in place in case the repository ceases to operate or the governing or funding institution substantially changes in scope.” Evidence: succession plan(s); escrow plan(s); explicit and specific statement documenting the intent to ensure continuity of the repository and the steps taken and to be taken to ensure continuity; formal documents describing exit strategies and contingency plans; depositor agreements –H-Net: No succession plan currently in place –Narrow the gap: Identify, negotiate with, and make preliminary plans with potential successor; write policy that states intent and describes what’s needed in successor

TRAC: A. Organizational Infrastructure A3. Procedural accountability & policy framework –A3.3. “Repository maintains written policies that specify the nature of any legal permissions required to preserve digital content over time, and repository can demonstrate that these permissions have been acquired when needed.” Evidence: deposit agreements; records schedule; digital preservation policies; records legislation and policies; service agreements –H-Net: Policy on Copyright/Intellectual Property, Constitution, By-Laws; authors retain copyright, sending message constitutes granting permission for distribution and (implicit) preservation –No gap! Restate policies to include explicit permission to preserve messages; link to copyright policies from digital preservation plan

TRAC: B. Digital Object Management Repository functions, processes, and procedures needed to ingest, manage, and provide access to digital objects for the long term –B1. Ingest: Acquisition of Content –B2. Ingest: Creation of the Archival Package –B3. Preservation Planning –B4. Archival Storage & Preservation/Maintenance of AIPs –B5. Information Management –B6. Access Management

TRAC: B. Digital Object Management B2 Ingest: Creation of the Archival Package –B2.1 “Repository has an identifiable, written definition for each AIP or class of information preserved by the repository.” Evidence: Documentation identifying each class of AIP and describing how each is implemented within the repository. Implementations may, for example, involve some combination of files, databases, and/or documents –H-Net: Not documented at time of analysis –Narrow the gap: Preservation strategy document, Information Packages document, clear policies

TRAC: B. Digital Object Management B4 Archival Storage & Preservation/Maintenance of AIPs –B4.4 “Repository actively monitors integrity of archival objects (I.e., AIPs).” Evidence: Logs of fixity checks (e.g., checksums); documentation of how AIPs and Fixity Information are kept separate –H-Net: No integrity monitoring at time of analysis –Narrow the gap: Use MD5 hashes to perform fixity checks, keep logs of most current fixity checks, document fixity process

TRAC: C. Technologies, Technical Infrastructure, & Security Adequacy of technical infrastructure and ability to meet management and security demands of repository and its digital objects –C1. System Infrastructure –C2. Appropriate Technologies –C3. Security

TRAC: C. Technologies, Technical Infrastructure, & Security C1. System Infrastructure –C1.1 Repository functions on well-supported operating systems and other core infrastructural software Evidence: Software inventory; system documentation; support contracts; use of strongly community-supported software (e.g., Apache) –H-Net: H-Net servers run on Debian distribution of Linux –No gap! Include in digital preservation policy documents

TRAC: C. Technologies, Technical Infrastructure, & Security C3. Security –C3.4 Repository has suitable written disaster preparedness and recovery plan(s), including at least one off-site backup of all preserved information together with an off-site copy of the recovery plan(s). Evidence: ISO certification; disaster and recovery plans; information about and proof of at least one off-site copy of preserved information; service continuity plan; documentation linking roles with activities; local geological, geographical, or meteorological data or threat assessments –H-Net: No plan in place, no off-site backups at time of assessment –Narrowing the gap: Develop disaster recovery plan, off-site storage plans; tap into greater MSU disaster recovery plan

The TRAC Experience Tedious—84 criteria! Required consultations with systems admin, other MATRIX and H-Net staff Conducted over course of a week Thorough yet flexible, leaving room for interpretation, lots of options for supporting documentation/evidence

The TRAC Experience Good snapshot of current state of repository Clarifies what’s needed to narrow the gap Great internal audit tool Useful for certification of a trusted digital repository

Integrated Rule Oriented Data System (iRODS) and TRAC Data grid software system developed by the Data Intensive Cyber Environments (DICE) group ( Provides a unified view and seamless access to distributed digital objects across a wide area network Goal: Automate archival processes, data management tasks, verification of assessment criteria (TRAC) How: Map management policies to rules enforced by data management system DICE wishes to use H-Net as iRODS testbed

Digital Preservation Management Workshop and TRAC Workshop developed by Nancy McGovern, Digital Preservation Officer at ICPSR Combines organizational and technological perspectives for institutions to develop appropriate responses to challenges of digital preservation –Two foundation documents Trusted Digital Repositories: Attributes and Responsibilities OAIS Reference Model –Attributes of a Trusted Digital Repository

Digital Preservation Management Workshop and TRAC Action plans for policy, technology, and resource frameworks that map to TRAC –Organizational Infrastructure  Section A –Technological Infrastructure  Sections B, C –Resource Framework  Section A Wealth of policy examples Tools for determining costs 2009 workshops:

Preservation Improvement Plan: Backup & Archival Storage Backup Backup log Secure storage Second permanent backup tape stored offsite Reciprocal backup storage arrangement with ICPSR Archival Storage Annual copying to tape of H-Net data, databases,scripts Media refreshment every 5 years Copy to alternative storage repository Participation in distributed archival storage system, such as iRODS or LOCKSS

Preservation Improvement Plan: Authenticity Fixity: Individual Messages (SIPs/AIPs) Shorten time window for generation of MD5 hashes Create database of MD5 hashes for fixity checks Validate message hashes on notebook completion Fixity: Notebook Files (AICs) Create SHA-2 message digests on completion of notebooks Calculate SHA-2 message digests for existing notebooks Create database of SHA-2 message digests for fixity checks Validate notebook hashes on weekly basis

Preservation Improvement Plan: Authenticity Accurate Message Creation Metadata Build list editing web interface for editors H-Net Council decided not to implement Restriction of Editors’ Administration Capabilities Eliminate editors’ ability to retrieve and change notebooks Restrict notebook modification rights to MATRIX postmasters H-Net Tampering Risk? Low—staff with root system account privileges are trusted employees No action required

Preservation Improvement Plan: Attachments Browser Access for Private Lists Provide constructed URLs, as with public lists Provide download links to attachments Migration Strategy Conduct inventory of attachments on H-Net-related lists Establish or leverage technology watch Provide links to websites containing conversion tools

Preservation Improvement Plan: Narrowing the TRAC Gap Technical improvements underway Digital preservation policies to be written Lather, rinse, repeat: New TRAC assessment Future: Use TRAC to design MSU institutional repository

References H-Net Archives Project, H-Net: Humanities and Social Sciences Online, MATRIX: The Center for Humane Arts, Letters, and Social Sciences Online, OAIS Reference Model, Trusted Digital Repositories: Attributes and Responsibilities, epositories.pdf epositories.pdf Trustworthy Repositories Audit & Certification: Criteria and Checklist,