May Archiving PAWN: A Policy-Driven Software Environment for Implementing Producer- Archive Interactions in Support of Long Term Digital Preservation Mike Smorul, Mike McGann, Joseph JaJa Institute for Advanced Computer Science Studies University of Maryland, College Park Sponsored by National Archives and Records Administration, Library of Congress and NSF
May Archiving Problems Facing Ingestion Ensure integrity of data ingestion Each producer-archive interaction is unique Final destination for items in an archive is unique. Differing roles between producer and archive Hostile producers
May Archiving What is PAWN? Software that provides an ingestion framework Distributed and secure ingestion of digital objects into an archive. Handles the process –From package assembly –To archival storage Simple, customizable interface for end-users Flexible interface for archive publication
May Archiving Package Workflow 1.Create Producer-Archive Agreement 2.Client package template. 3.Create package based on template 4.Once approved, packages can be archived 5.Rejected packages can be held until rectified or deleted for resubmission.
May Archiving Expanding a Simple Workflow Support for multiple workflows. –Grouped into logical domains Definable roles per workflow Pluggable components for assembly and archival publishing Distributed components –Web-service based components
May Archiving Domain Organization Producers organized into domains, each domain contains a transfer agreement negotiated with the archive. Each domain contains a hierarchical organization of data grouped into record sets/templates (convenient groupings from the transfer agreement). Each domain contains its own users. An end-user operates within a set of record sets.
May Archiving Domain Example
May Archiving Custom Roles Actions in PAWN can be grouped together to create roles. –There are no common roles between archives, so allow custom ones. Default roles –Producer – Individual data supplier –Records Manager – Oversight of producers –Archive Manager – Final review and archive publishing –Global Administrator – Creates domain, sysadmin-like account Sample Actions –Setting permissions on record sets –Record Schedule creation and modification –Add or delete whole packages –Modify items in a package …
May Archiving Custom Package Building PAWN provides an API for developing custom package builders Custom package builders can be written in JAVA and implement a simple interface. Builders interact with a hierarchical structured package Manifest Namespace Type Descriptive Name Data Type Descriptive Name Bits Metadata … Manifest … Metadata Type Bits Name
May Archiving PAWN Archive Gateway Pluggable component that provides an API for developing gateways into various services. Each gateway may have multiple instances, each configured differently PAWN handles managing and associating gateways with the appropriate data.
May Archiving PAWN Architecture Divided into producer and archive side components –Producer: data supplying and domain management –Archive: data storage, resource allocation and archival publishing Web-service based communication Trust relationship between producer and archive components –SAML and PKI
May Archiving Components
May Archiving Case Studies ICDL Book Builder SLAC Record Ingestion 10,000 CDroms Remote ingestion Unskilled labor Custom hardware Sample NARA ingestion Model government roles DOE Record Schedule Custom package builder Multiple data sources Model logical books
May Archiving PAWN Summary Platform for ingestion Customizable Components –Roles, ingest and publishing Distributed architecture
May Archiving More information Web site: – Wiki link for technical details. Or “I’m feeling lucky” Google keywords: –ADAPT UMIACS