Module 3 — Preparing for Submission

Slides:



Advertisements
Similar presentations
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
Advertisements

Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
PREMIS: To Be or Not To Be in My METS The Preservation Journey at the University of Connecticut Libraries ALA Annual 2013 ALCTS PARS Intellectual Access.
METS: An Introduction Structuring Digital Content.
Fedora Users’ Conference Rutgers University May 14, 2005 Researching Fedora's Ability to Serve as a Preservation System for Electronic University Records.
An Introduction June 17, 2013 Open Archival Information System (OAIS)
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
1 Archiving Workflow between a Local Repository and the National Library Archive Experiences from the DiVA Project Eva Müller, Peter Hansson, Uwe Klosa,
Merrilee Proffitt e(X)literature / Digital Cultures Project April 2003 News from the Digital Library The Metadata Encoding and Transmission Standard; the.
PREMIS What is PREMIS? o Preservation Metadata Implementation Strategies When is PREMIS use? o PREMIS is used for “repository design, evaluation, and archived.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
The British Library’s METS Experience The Cost of METS Carl Wilson
Release & Deployment ITIL Version 3
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
Moving into Design SYSTEMS ANALYSIS AND DESIGN, 6 TH EDITION DENNIS, WIXOM, AND ROTH © 2015 JOHN WILEY & SONS. ALL RIGHTS RESERVED. 1 Roberta M. Roth.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
Meet and Confer Rule 26(f) of the Federal Rules of Civil Procedure states that “parties must confer as soon as practicable - and in any event at least.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Library Repositories and the Documentation of Rights Leslie Johnston, University of Virginia Library NISO Workshop on Rights Expression May 19, 2005.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
Small steps and lasting impact: making a start with preservation or It’s not all NASA Patricia Sleeman Digital Archives and Repositories University of.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
Advanced Higher Computing Science
Project Management PTM721S
Module 6 — Sustainability
Joint Meeting of CSUL Committees,
Introduction for the Implementation of Software Configuration Management I thought I knew it all !
Module 1 — Enabling Programmatic Digital Preservation
Module 4 — Submission & Ingest
Module 5 — Post-Submission
Project Planning: Scope and the Work Breakdown Structure
Ingest and Dissemination with DAITSS
APTrust and Georgetown University Library
Using E-Business Suite Attachments
DAITSS and the Florida Digital Archive
Exercise: understanding authenticity evidence
Bentley Project Reel Digitization Bentley Historical Library t
COIT20235 Business Process Modelling
Introduction to DSpace
Digital Project Lifecycle Curating Across the Curriculum
Storage Basic recommendations:
Systems analysis and design, 6th edition Dennis, wixom, and roth
Implementing an Institutional Repository: Part II
EDD/581 Action Research Proposal (insert your name)
Systems analysis and design, 6th edition Dennis, wixom, and roth
Digital Stewardship Curriculum
2020 Census Local Update of Census Addresses Operation (LUCA)
Open Archival Information System
EDD/581 Action Research Proposal (insert your name)
Robin Dale RLG OAIS Functionality Robin Dale RLG
Implementing an Institutional Repository: Part II
Digital Library and Plan for Institutional Repository
How to Implement an Institutional Repository: Part II
Digital Library and Plan for Institutional Repository
Presentation transcript:

Module 3 — Preparing for Submission Digital Preservation Workflow Curriculum Final: March 8, 2017 This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA. Please reference the Digital Preservation Network (DPN) http://www.dpn.org when using this work. Digital Preservation Network (DPN)

Planning Selection Preparation Submission Post-Submission Sustainability

Module Goals Upon completion of this module, participants should be able to: Understand the requirements of a digital preservation service and how to interpret them to local context Understand strategies for preparing content for submission that considers different content sources (e.g., born-digital vs. digitized) Understand BagIt and how to create bags Identify practical solutions for overcoming roadblocks at this stage Develop approaches for integrating submission preparation into an ongoing digital preservation program

Lesson 1: Submission Information Packages

A very brief introduction to the OAIS Reference Model Functional model Provide a brief overview of the OAIS functional entities Module 3 — Preparing for Submission / Lesson 1 — Submission Information Packages

A very brief introduction to the OAIS Reference Model Information model SIP AIP DIP Provide a brief overview of the OAIS information model, including definitions of each from the OAIS document. Note that for the purposes of this workshop, we are focusing on SIPs prepared by an organization and submitted to an internal or external preservation service. Module 3 — Preparing for Submission / Lesson 1 — Submission Information Packages

What is a SIP? A SIP is the package of data that is delivered to a preservation environment. It contains the content to be preserved, including the data objects & metadata. SIPs can be assembled from many sources (e.g., content creators, digitization vendors, collection managers). SIPs can be created at many different levels of granularity (e.g., file level, intellectual entity level, collection level). SIPs may be physical or logical. Module 3 — Preparing for Submission / Lesson 1 — Submission Information Packages

What are some SIP requirements? There are two types of SIP requirements: internal requirements (of the submitting entity), and external requirements (of the preservation environment). SIP requirements may address: File formats (e.g.,TIFF required for images) Required metadata values (e.g., Title must be supplied) Required metadata format (e.g., YYYY-MM-DD) Required metadata structure (e.g., MODS XML) Additional documentation (e.g., contracts, installation instructions) Packaging structure (e.g., directory naming, BagIt) A defined and documented SIP can be used to verify completeness before and after submission. Note that in this lesson, we are focusing on internal SIP requirements. Module 3 — Preparing for Submission / Lesson 1 — Submission Information Packages

What is a SIP class? A SIP class is a conceptual grouping of content in order to identify discrete requirements where there may be important differences. SIP classes may align to different content types or sources (e.g., institutional repository, in-house digitization, born-digital archives) Again, emphasize that we are talking about internal requirements, not those of the preservation environment Module 3 — Preparing for Submission / Lesson 1 — Submission Information Packages

What are some SIP class examples? Hypothetical examples (1/3) Born-digital content via institutional repository: Metadata requirements: title, author, date (Dublin Core .xml) Creative commons license (.txt) Publication file (.pdf, .docx, or .doc) Associated research data in sub-directory directory named “research-data” (any format) Introduce as hypothetical SIP classes that an organization has created for some of its content for submission to a repository service Note that these are aligned to, but not necessarily the same as the requirements from content submitters Module 3 — Preparing for Submission / Lesson 1 — Submission Information Packages

What are some SIP class examples? Hypothetical examples (2/3) In-house image digitization: Metadata: METS file (.xml) listing all files and their relationships, with embedded MODS record Image files (TIFF) in sub-directory named “images” Checksum file (.md5) for each image file with same file name Module 3 — Preparing for Submission / Lesson 1 — Submission Information Packages

What are some SIP class examples? Hypothetical examples (3/3) Outsourced audiovisual digitization: Audio files (.wav, linear PCM encoding) Video files (.mov, v210 encoding) Digitization vendor report (provided template - .xlsx) Note that in this case, there are no specific metadata requirement. Point this out before moving on to the next slide. Module 3 — Preparing for Submission / Lesson 1 — Submission Information Packages

What is a “good enough” SIP? The only real criteria is that SIPs meet the requirements of the preservation environment. In-house SIP requirements and classes are nice to have, but not necessary. They also don’t need to be perfect. Consider aligning SIP requirements and classes to preservation capabilities, e.g.,: What is required so that bit-level preservation of the content can be ensured? What is required so that the content can be independently identified? What is required to ensure that the content creator can understand the data? What is required to ensure that all future users can understand the data? Note that we are focusing on this topic because it tends to be a bottleneck for many organizations. Common challenges include: No one has been assigned the authority to make these decisions Requirements were defined for one type of content that don’t support other types SIP requirements are hard to fulfill Staff are worried that older content doesn’t meet new requirements and don’t want to have to go back and change everything. Keep in mind that these issues can cause paralysis. A governance process can help ensure that there is a means for resolving issues. Module 3 — Preparing for Submission / Lesson 1 — Submission Information Packages

How should internal SIP requirements be determined? Involve stakeholders who will have responsibilities in the SIP creation workflow (starting from the point of creation/digitization or acquisition). Align SIP requirements to existing processes as much as possible. Look at what is realistic vs. ideal. If creating METS files by hand is too cumbersome, could there be an easier guideline? Could their creation be automated? Is METS necessary for this content type? Be flexible. Keep in mind that new content may come in down the road that doesn’t fit the current requirements. Build in a review and update process. Module 3 — Preparing for Submission / Lesson 1 — Submission Information Packages

Bottleneck check! Lack of decision-making on this issue creates preservation bottlenecks for many organizations. Common challenges include: No one has been assigned the authority to make these decisions Requirements were defined for one type of content that don’t support other types SIP requirements are hard to fulfill Staff are worried that older content doesn’t meet new requirements and don’t want to have to go back and change everything. These issues can cause paralysis and place digital content at risk. A governance process can help ensure that there is a means for resolving issues. Module 3 — Preparing for Submission / Lesson 1 — Submission Information Packages

Lesson 2: SIP Creation

Exercise: Understanding 3rd Party Submission Requirements Type: Group exercise Goal: Understand requirements from preservation services Description: Participants divide into groups of 4-5 Each group receives a set of SIP requirements from a preservation environment (see note) Groups identify whether or not there are requirements/guidelines for: Packaging (e.g. bags, folder structure) Fixity Formats Metadata Identification Document the answers (e.g., yes/no, red/green) in a spreadsheet that the participants can see (e.g., columns = service name, rows = criteria) Discuss the variety of requirements and why Note: Instructors should find current requirements from existing organizations. These will change over time. Choose from a variety of orgs with different requirements. Examples might be pulled from: Chronopolis DPN APTrust Dryad CDL (e.g. https://merritt.cdlib.org/docs/merritt_user_guide.pdf) etc. Module 3 — Preparing for Submission / Lesson 2 — SIP Creation

What does a SIP look like? Physical SIPs: Directory structure Bag Logical SIPs: Assembled via API, upload, etc. Note that in this session we will be focusing on bags Module 3 — Preparing for Submission / Lesson 2 — Creating SIPs

Sample physical SIP in a directory SIP Requirements (hypothetical): One SIP per content item (e.g. book, photograph) Metadata for the content item should be in the root directory called metadata.xml All content files should be in a folder called /content Each content file should be accompanied by a .md5 file. MD5 files must have the same name as the content files, with an .md5 extension Imagine this SIP gets picked up by Module 3 — Preparing for Submission / Lesson 2 — Creating SIPs

Sample physical SIP in a bag SIP Requirements (hypothetical): SIPs must be submitted in a valid bag There are no requirements for the contents of the bag Imagine this SIP gets picked up by Module 3 — Preparing for Submission / Lesson 2 — Creating SIPs

Introduction to BagIt “BagIt is a hierarchical file packaging format designed to support disk- based storage and network transfer of arbitrary digital content. A "bag" consists of a "payload" (the arbitrary content) and "tags," which are metadata files intended to document the storage and transfer of the bag. A required tag file contains a manifest listing every file in the payload together with its corresponding secure hash / message digest (checksum).” — https://wiki.duraspace.org/display/DPNC/BagIt+Specification Breakdown this definition Module 3 — Preparing for Submission / Lesson 2 — Creating SIPs

Why is BagIt useful? De facto community standard Built-in manifests allow you to verify what you are sending/receiving Built-in fixity checking ensures there are no errors in transfer Module 3 — Preparing for Submission / Lesson 2 — Creating SIPs

What is in a bag? A base directory containing: A sub-directory named “data” (the payload directory) Required and optional tag files: manifest-[algorithm].txt (required) bagit.txt (required) tagmanifest-[algorithm].txt (optional) baginfo.txt (optional) Current version at the time of writing was: https://tools.ietf.org/html/draft-kunze-bagit-14 Please update to the latest version Module 3 — Preparing for Submission / Lesson 2 — Creating SIPs

Exercise: Creating SIPs According to Requirements Type: Hands-on exercise Goal: Practice creating SIPs according to specifications Description: Participants divide into groups of 2, or work individually Each group receives: 2 sets of SIP requirements from preservation services (real or hypothetical) A set of sample files Access to computers with Exactly installed Instructor presents and overview of Exactly Participants create sample SIPs that conform to specs using the sample datasets provided. Re-use SIP content specs from Lesson 2 Each group should be required to create at least one bag The same dataset should be used to create both SIPs Resources Required: Computers with Exactly installed Sample files SIP requirements from preservation services Optional: Instructors may choose to provide each group with the same two sets of requirements and then compare the results Module 3 — Preparing for Submission / Lesson 2 — SIP Creation

Interim local storage management SIPs may be assembled before they are ready for submission, or may be staged over time Ensure best practices for storage management in the interim, to secure data and mitigate risk, including: Don’t store exclusively on removable media (e.g., CDs, DVDs, USB, even HDDs) Have at least two copies, stored on different media Store copies in locations with different hazards Keep track of file copies and locations Module 3 — Preparing for Submission / Lesson 2 — Creating SIPs

Inventory tracking “Inventories are living documents that must be updated and maintained as collections evolve.” [Revisiting inventory tracking from Module 2] Tracking SIP preparation progress is an important data point to maintain. Module 3 — Preparing for Submission / Lesson 2 — Creating SIPs

Lesson 3: Managing SIP Creation

Exercise: Creating SIPs at Org XYZ Type: Case study problem solving Goal: Create a solution for a common challenge Description: Participants divide into groups of max 4-5 Each group receives the same scenario (see next slide) Using the problem solving approach outlined in module 1, create a plan for more streamlined SIP creation that: Improves compliance with external specs Improves compliance with internal specs Addresses what stakeholders should be involved and in what role Groups present agreed upon solutions to the rest of the workshop Each group should go through the steps below (see module 1 for details) and be able to talk through each in their presentation to the entire group Problem solving methodology: Create a vision Identify and prioritize challenges For the top challenges, create problem statements that Describe the goal of overcoming the problem Identifies the root cause (using 5 whys) Propose possible solutions to problems Prioritize possible solutions Creation of a roadmap Propose a sustainability plan for implementation Module 3 — Preparing for Submission / Lesson 3 — Managing SIP Creation

Exercise: Creating SIPs at Org XYZ Scenario: SIP creation is an enormous challenge for Org XYZ. Content to be submitted to the internal preservation repository can be found in a variety of collections around the institution. To date, each collection manager has taken the responsibility for creating SIPs, but the results have varied. Some do not understand the SIP requirements and as a result have created SIPs that are non-conformant to the repository specs (which require submission in bags). Others simply haven’t created SIPs at all, and as a result none of the content from that collection has gone into the repository. Across the board, SIPs are inconsistent, and don’t include the components that the Org has specified as its own requirements. As a result, Org XYZ has only submitted a handful of SIPs to the repository. Many of the remaining assets are at risk of loss due to their storage on unstable media or lack of metadata tracking their existence. Each group should go through the steps below (see module 1 for details) and be able to talk through each in their presentation to the entire group Problem solving methodology: Create a vision Identify and prioritize challenges For the top challenges, create problem statements that Describe the goal of overcoming the problem Identifies the root cause (using 5 whys) Propose possible solutions to problems Prioritize possible solutions Creation of a roadmap Propose a sustainability plan for implementation Module 3 — Preparing for Submission / Lesson 3 — Managing SIP Creation

Lesson 4: Creating a Program

Exercise: A Programmatic Approach to Preparing for Submission Type: Group discussion Goal: Solidify the participants understanding of the processes presented in this module as part of a larger programmatic whole. Description: Participants will discuss the decision points that need to be identified in order to create a programmatic approach to developing SIP requirements, creating conformant SIPs, and tracking progress Possible questions for discussion: What are the decisions points that need to be identified in order to create a programmatic approach to SIP creation? For example: What metadata is required? What files are required Who is responsible for creating SIPs? What is required on an ongoing basis in order for SIP creation to become sustainable? What resources would help? Tools Skills Module 3 — Preparing for Submission / Lesson 4 — Creating a Program