Preserving a Born-Digital Archive: The H-Net Lists Lisa M. Schmidt MATRIX: The Center.

Slides:



Advertisements
Similar presentations
Benchmarking the Preservation Process for Electronic Scholarly Journals Brian Lavoie Office of Research OCLC Online Computer Library Center, Inc. DLF Fall.
Advertisements

Public Key Infrastructure A Quick Look Inside PKI Technology Investigation Center 3/27/2002.
OCLC Digital Archive Overview Judith Cobb LIPA Meeting July 2006.
Fedora Users’ Conference Rutgers University May 14, 2005 Researching Fedora's Ability to Serve as a Preservation System for Electronic University Records.
Institutional Repositories It’s not Just the Technology New England Archivists Boston College March 11, 2006 Eliot Wilczek University Records Manager Tufts.
Authentication of the Federal Register Charley Barth Director, Office of the Federal Register United States Government.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
OPEN ACCESS Your Publisher of Choice DE GRUYTER OPEN Society-Pays Publishing Program.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Sustainable Preservation Services for Archivists through Distributed Custody Caryn Wojcik State of Michigan Records Management Services.
Digital Preservation Practices and Strategies at Colorado State University Libraries.
ISO & OAI-PMH By Neal Harmeyer, Amy Hatfield, and Brandon Beatty PURDUE UNIVERSITY RESEARCH REPOSITORY.
DESIGNING A PUBLIC KEY INFRASTRUCTURE
Kevin L. Glick Electronic Records Archivist Manuscripts and Archives Yale University ECURE Arizona State University March 2, 2005 Fedora and the Preservation.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation.
1 From Filing Cabinet to Desktop and Network: Records Management in N.C. State Government Ed Southern Government Records Branch N.C. Office of Archives.
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
Preserving the H-Net Lists: A Case Study in Trusted Digital Repository Assessment Lisa M. Schmidt
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
Electronic Mail List Preservation Takes Off: The H-Net Archive Lisa M. Schmidt MATRIX: The Center.
Preserving Electronic Mailing Lists: The H-Net Archive H-Net Mapped to the OAIS Model Preservation AssessmentPreservation improvementsOverview How H-Net.
Statewide Digitization and the FCLA Digital Archive Priscilla Caplan, Florida Center for Library Automation Statewide Digitization Planners Meeting OCLC,
PeDALS Persistent Digital Archives & Library System Richard Pearce-Moses Deputy Director for Technology & Information Resources Arizona State Library,
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Digital Preservation 101, or, How to Keep Bits for Centuries Julie C. Swierczek Digital Asset Manager and Digital Archivist Harvard Art Museums.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
Digital Preservation Ontario Consortium of University Libraries (OCUL) Caitlin Tillman OCUL IR Chair With notes from Kathy Scardellato, OCUL Executive.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
GPO’s Federal Digital System December 10, 2009 U.S. Government Printing Office.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Carcanet Case Study Fran Baker, John Rylands University Library University of Manchester SPRUCE event 19 January 2012.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan Florida Center for Library Automation (FCLA)
NDSR Boston webinar: Digital Preservation Introduction Presenter: Nancy Y McGovern October 2015.
Archiving microdata Standards and good practices United Nations Statistics Commission New York, February 26, 2009 Olivier Dupriez World Bank, Development.
The Project Three-year grant from the National Historical Publications and Records Commission (NHPRC), April 2010-March 2013 Develop electronic records.
Fedora and the Preservation of University Electronic Records Project NHPRC Electronic Records Research Grant Kevin L. Glick Manuscripts and Archives, Yale.
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
Managing Access at the University of Oregon : a Case Study of Scholars’ Bank by Carol Hixson Head, Metadata and Digital Library Services
ARIADNE is funded by the European Commission's Seventh Framework Programme Archiving and Repositories Holly Wright.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
The OAIS Reference Model and Trustworthy Repositories Josh Lubell Manufacturing Engineering Laboratory NIST
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
Preservation Functionality in a Digital Archive Erik Oltmans Koninklijke Bibliotheek Raymond J. van Diessen IBM Business Consulting Services Hilde van.
Building Foundations: Fedora, Fez, and the ADR prepared by Jessica Branco Colati ADR Project Director, Colorado Alliance of Research Libraries
GNU EPrints 2 Overview Christopher Gutteridge 19 th October 2002 CERN. Geneva, Switzerland.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Joint Meeting of CSUL Committees,
Ingest and Dissemination with DAITSS
Building A Repository for Digital Objects
DAITSS: Dark Archive in the Sunshine State
Trustworthiness of Preservation Systems
DAITSS and the Florida Digital Archive
An Overview of Data-PASS Shared Catalog
Statewide Digitization and the FCLA Digital Archive
Implementing an Institutional Repository: Part II
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

Preserving a Born-Digital Archive: The H-Net Lists Lisa M. Schmidt MATRIX: The Center for Humane Arts, Letters & Social Sciences Online Michigan State University November 16, 2009

Preserving the H-Net Lists H-Net Background Original “Preservation” Practices Use of the Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) Preservation Improvements

H-Net: Humanities and Social Sciences Online International consortium of scholars and teachers Oldest collection of born-digital and content- moderated arts, humanities, and social science material on the Internet Hosted by MATRIX

H-Net: Humanities and Social Sciences Online Valuable scholarly resource –More than 180 networks, or lists, with more than 130,000 unique subscribers –More than 5,000 posts per month –More than 230 “private” lists –230,000 message views in single week More than 1 million messages

MATRIX Digital humanities research center Devoted to the application of new technologies in teaching, research, and outreach Creates and maintains digital libraries of humanities and social science materials Provides training in computing and new teaching technologies Creates forums for the exchange of ideas and expertise

NHPRC Grant Conduct assessment of existing H-Net preservation policies and practices Apply OCLC/CRL TRAC checklist Develop and implement an improved long- term preservation plan Useful to those managing large collections of electronic records Research semantic clustering search techniques

How H-Net Works: Backup & Security 3 TB of data, including H-Net Server rack kept in climate controlled, physically secured room Daily incremental backups, weekly full Full, “permanent” tape backups every four months

How H-Net Works H-Net runs on LISTSERV software Submission policies –Users must be list subscribers to post –Messages written in plain text –No attachments allowed on public lists

How H-Net Works: An Archival Perspective Appraisal/Acquisition/Accession –All approved messages permanently archived –Editors approve and post messages –Messages post from a few seconds up to several days after approval

How H-Net Works: An Archival Perspective Message Posting Process

How H-Net Works: An Archival Perspective Arrangement –Messages kept in flat text files called “notebooks” –Single notebook includes messages posted during seven-day time period, concatenated in original order

How H-Net Works: An Archival Perspective

Arrangement –Notebooks appear to be arranged in original order within each list directory

How H-Net Works: An Archival Perspective Description –Most descriptive metadata for messages automatically generated on creation/posting –“Author’s Subject” inserted by creator

How H-Net Works: An Archival Perspective PeriodDay of Month a1-7 b8-14 c15-21 d22-28 e Ex. “h-africa.log0802a” Notebook description contained in filename Notebook File Naming

How H-Net Works: Message Retrieval BRS Database –Newest notebook messages parsed and copied every 24 hours –MD5 hashes created for each message –Available for full-text search MySQL Database Cache –Key metadata extracted, MD5 hashes created, written to database cache –Enables more efficient browsing

How H-Net Works: Message Retrieval Message Metadata Stored in MySQL Database

How H-Net Works: Message Retrieval &month=0808&week=b&msg=w8utW6nKNO1FuY19vSK2mo &user=&pw=

How H-Net Works Message Ingest, Storage, and Retrieval Processes

Original “Preservation” Practices Backup, but only local—and no true archiving No normalization or migration strategy –Message/notebook content: No need Created and stored in plain text formats XML encoding only required with proprietary formats –Needed for attachments on private lists

Original “Preservation” Practices Authenticity –Informal check by author and/or editor on posting –Broken URL on message retrieval attempt –Cached metadata as PDI Reference, Content, Provenance Information MD5 hashes for message discovery, not fixity No Fixity Information for notebook files Policies –No documented preservation policies

Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) TRAC 1.0 published in February 2007 For certification by third party or self assessment Three sections –A. Organizational Infrastructure –B. Digital Object Management –C. Technologies, Technical Infrastructure, & Security 84 audit criteria

Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) Compare core audit criteria to local capabilities—“Gap Analysis,” illuminating areas requiring improvement Formulate strategies to narrow the gap and improve trustworthiness of repository

Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) Example 1: Repository has formal succession plan –H-Net: No succession plan in place –Narrow the gap: Identify, negotiate with, and make preliminary plans with potential successor; document intent, describing what’s needed in successor

Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) Example 2: Repository functions on well-supported operating systems and other core infrastructural software –H-Net: Servers run on Debian distribution of Linux –No gap!

Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC)

The TRAC Experience Thorough yet flexible, leaving room for interpretation, lots of options for supporting documentation/evidence Good snapshot of current state of repository Clarifies what’s needed to narrow the gap Great internal audit tool Useful for certification of a trusted digital repository

Preservation Improvements: Backup & Archival Storage Backup Long-term (“permanent”) backup tape sets stored offsite, put on 3-year retention schedule Reciprocal backup storage arrangement with ICPSR Archival Storage Annual copying to tape of H-Net data, databases, scripts Media refreshment every 5 years Future: Copy to alternative storage repository Future: Participation in distributed archival storage system

Preservation Improvements: Authenticity Fixity: Individual Messages (SIPs/AIPs) Shorten time window for generation of hashes Create database of SHA-256 hashes for fixity checks Validate message hashes on notebook completion Fixity: Notebook Files (AICs) Create SHA-256 message digests on completion of notebooks Calculate SHA-256 message digests for existing notebooks Create database of SHA-256 message digests for fixity checks Validate notebook hashes on weekly basis

Preservation Improvements: Authenticity

Preservation Improvements: Attachments Found with < 0.01% of H-Net messages –MS Office, PDF, image files Provide constructed URLs, as with public lists Provide download links No file normalization or migration plan –Most files should open in viewers, later versions of applications –MATRIX will help users if problems arise

Preservation Improvements: Digital Preservation Policies Documented digital preservation policies and procedures for the H-Net lists – Based on the Digital Preservation Policy Framework developed by Nancy McGovern of ICPSR –Digital Preservation Management Workshop/Tutorial –Roadmap to developing and documenting policies –Wealth of examples

Preservation Improvements: Narrowing the Gap Lather, rinse, repeat: New TRAC assessment Technical improvements Digital preservation policies

Conclusions Relevant to preservation discussion Applicable to preservation of LISTSERV- based and other lists Testbed for other preservation tools and systems Useful foundation for digital preservation planning at Michigan State

References Digital Preservation Management Tutorial, H-Net Archives Project, H-Net: Humanities and Social Sciences Online, MATRIX: The Center for Humane Arts, Letters, and Social Sciences Online, OAIS Reference Model, Trusted Digital Repositories: Attributes and Responsibilities, ories.pdf ories.pdf Trustworthy Repositories Audit & Certification: Criteria and Checklist,