20 March 2002LaRC Presentation1 Framework for Digital Archiving: OAIS Reference Model Donald Sawyer/NASA/GSFC Lou Reich/NASA/CSC 20-March-2002
20 March 2002LaRC Presentation2 Organizational Background National Space Science Data Center — NASA’s first digital archive — Experienced many technology changes since 1966 Consultative Committee for Space Data Systems — International group of space agencies — Developed variety of science discipline- independent standards — Became working body for an ISO TC 20/ SC 13 about 1990 ISO suggested that SC 13 should develop archive standards –Address data used in conjunction with space missions –Address intermediate and indefinite long term storage of digital data
20 March 2002LaRC Presentation3 Response Response to Consultative Committee for Space Data Systems (CCSDS) and ISO TC 20/SC 13 –No framework widely recognized for developing specific digital archive standards –Begin by developing a ‘Reference Model’ to establish common terms and concepts –Ensure broad participation, including traditional archives (Not restricted to space communities; all participation is welcome!) –Focus on data in electronic forms, but recognize that other forms exist in most archives –Follow up with additional archive standards efforts as appropriate
20 March 2002LaRC Presentation4 What is a Reference Model? A framework — for understanding significant relationships among the entities of some environment, and — for the development of consistent standards or specifications supporting that environment. A reference model — is based on a small number of unifying concepts — is an abstraction of the key concepts, their relationships, and their interfaces both to each other and to the external environment — may be used as a basis for education and explaining standards to a non-specialist.
20 March 2002LaRC Presentation5 What was the Motivation? Agencies and organizations have a significant stewardship responsibility for the digital information obtained by their programs — Data are often irreplaceable Long term (indefinite) preservation of this information is difficult — “Data + metadata” (i.e., information) must be migrated across new media, operating systems, and management systems — Field representations and formats may need to be revised to keep pace with evolving technologies and supported standards — What constitutes adequate metadata is not widely understood or standardized — Digital information is becoming ever more widely distributed — Digital information must be readily transportable from archive to archive
20 March 2002LaRC Presentation6 Organizational Approach Organize US contribution under a framework with NASA lead in October, 1995 — Establish liaison with Federal Geographic Data Committee (FGDC) and National Archives and Records Administration (NARA) — Agency archives and users must be represented in this process An “Open” process — Important to stimulate dialogue with broad archive/user communities — Results of US and International workshops put on WEB — Support comments/critiques Broad international workshops also held — UK and France — Issue resolution at ISO/Consultative Committee for Space Data Systems international workshops
20 March 2002LaRC Presentation7 Technical Approach Investigated other Reference Models. — ISO “Seven Layer”Communications Reference Model — ISO Reference Model for Open Distributed Processing — ISO TC211 Reference Model for Geomantics Adopted some concepts from the seminal “Preserving Digital Information” report Define what is meant by ‘archiving of data’ Break ‘archiving’ into a few functional areas (e.g., ingest, storage, access, and preservation planning) Define a set of interfaces between the functional areas Define a set of data classes for use in Archiving Choose formal specification techniques — Data flow diagrams for functional models and interfaces — Unified Modeling Language (UML) for data classes
20 March 2002LaRC Presentation8 Resulting Model Model targeted to several categories of reader — Archive designers — Archive users — Archive managers, to clarify digital preservation issues and assist in securing appropriate resources — Standards developers Adopted terminology that crosses various disciplines — Traditional archivists — Scientific data centers — Digital libraries
20 March 2002LaRC Presentation9 Adoption Already widely adopted as starting point in digital preservation efforts — Digital libraries (e.g., Netherlands National Library) — Traditional archives (e.g., US National Archives) — Scientific data centers (e.g., National Space Science Data Center) — Commercial Organizations (e.g., Aerospace Industries Association preservation working team)
20 March 2002LaRC Presentation10 Reference Model Status Completed CCSDS Red Book review in November 2000 Completed ISO Draft International Standard (DIS) review — Same content as CCSDS Red Book Comments were received from several organizations — Issues discussed and resolved at the November 2000 and May 2001 ISO Archiving Workshop — Major impact was to highlight the preservation planning function in the functional model New version was delivered to the ISO and CCSDS Secretariats in July CCSDS review ended October 2001, with a few editorial comments Two-month ISO review will end March 24, 2002 We’re projecting a final ISO standard in the Summer, 2002
20 March 2002LaRC Presentation11 Reference Model for an Open Archival Information System (OAIS) Technical Overview
20 March 2002LaRC Presentation12 Open Archival Information System (OAIS) Open –Reference Model standard(s) are developed using a public process and are freely available Information –Any type of knowledge that can be exchanged –Independent of the forms (i.e., physical or digital) used to represent the information –Data are the representation forms of information Archival Information System –Hardware, software, and people who are responsible for the acquisition, preservation and dissemination of the information –Additional OAIS responsibilities are identified later and are more fully defined in the Reference Model document
20 March 2002LaRC Presentation13 Document Organization Introduction – Purpose and Scope, Applicability, Rationale, Road Map for Future Work, Document Structure, and Definitions of Terms OAIS Concepts and Responsibilities – High level view of OAIS functionality and information models – OAIS external environment – Minimum responsibilities to become an “OAIS” Detailed Models – Functional model descriptions and information model perspectives Preservation Perspectives – Media migration, compression, format conversions, access Archive Interoperability – Criteria to distinguish types of cooperation among archives Annexes – Scenarios of existing archives, relationships with other standards
20 March 2002LaRC Presentation14 Purpose, Scope, and Applicability Framework for understanding and applying concepts needed for long-term digital information preservation –Long-term is long enough to be concerned about changing technologies –Starting point for model addressing non-digital information Provides set of minimal responsibilities to distinguish an OAIS from other uses of ‘archive’ Framework for comparing architectures and operations of existing and future archives Basis for development of additional related standards Addresses a full range of archival functions Applicable to all long-term archives and those organizations and individuals dealing with information that may need long- term preservation Does NOT specify any implementation
20 March 2002LaRC Presentation15 Model View of an OAIS Environment OAIS (archive) Management Producer Consumer Producer is the role played by those persons, or client systems, who provide the information to be preserved Management is the role played by those who set overall OAIS policy as one component in a broader policy domain Consumer is the role played by those persons, or client systems, who interact with OAIS services to find and acquire preserved information of interest
20 March 2002LaRC Presentation16 OAIS Information Definition Information is defined as any type of knowledge that can be exchanged, and this information is always expressed (i.e., represented) by some type of data In general, it can be said that “Data interpreted using its Representation Information yields Information” In order for this Information Object to be successfully preserved, it is critical for an archive to clearly identify and understand the Data Object and its associated Representation Information Data Object Interpreted Using its Representation Information Yields Information Object
20 March 2002LaRC Presentation17 Information Package Definition An Information Package is a conceptual container of two types of information called Content Information and Preservation Description Information (PDI) Preservation Description Information Content Information
20 March 2002LaRC Presentation18 Information Package Variants Submission Information Package – Negotiated between Producer and OAIS – Sent to OAIS by a Producer Archival Information Package – Information Package used for preservation – Includes complete set of Preservation Description Information for the Content Information Dissemination Information Package – Includes part or all of one or more Archival Information Packages – Sent to a Consumer by the OAIS
20 March 2002LaRC Presentation19 Producer Consumer Submission Information Packages Dissemination Information Packages queries result sets orders OAIS Archival Information Packages Legend = Entity Information Package Data Object = Data Flow = External Data Flow Diagram
20 March 2002LaRC Presentation20 Negotiates and accepts Information Packages from information producers Obtains sufficient control to ensure long-term preservation Determines which communities (designated) need to be able to understand the preserved information Ensures the information to be preserved is independently understandable to the Designated Communities Follows documented policies and procedures which ensure the information is preserved against all reasonable contingencies Makes the preserved information available to the Designated Communities in forms understandable to those communities OAIS Responsibilities
20 March 2002LaRC Presentation21 Detailed Models Overview
20 March 2002LaRC Presentation22 Overview of Detailed Models It was decided to do both a functional and an information model of the OAIS Both models were tasked to: — Use the models to better communicate OAIS Concepts — Use a well established, formal modeling technique — Stay as implementation independent as possible — Avoid detailed designs
20 March 2002LaRC Presentation23 Detailed Models Information Model
20 March 2002LaRC Presentation24 General Principles Define classes of “information objects’ that illustrate information necessary to enable Long-term storage and access to Archives The class definition should be implementation Independent Use class diagrams from Unified Modeling Language (UML)
20 March 2002LaRC Presentation25 UML Notation Overview Class: Class Name Aggregation: Assembly Class Part -1 Class Part-2 Class Multiplicity of Associations: Class 1..* Exactly one Many (zero or more) Optional (zero or one) One or more Class-1 Class-2 Association Name Parent Class Child -1 ClassChild-2 Class Specialization:Association: 1 * 0..1 *
20 March 2002LaRC Presentation26 Information Objects Information Object Representation Information 1+ interpreted using Data Object Interpreted using Physical Object Digital Object Bit 1.. * 1 * 1
20 March 2002LaRC Presentation27 Representation Information The Representation Information accompanying a physical object like a moon rock may give additional meaning, as a result of some analysis, to the physically observable attributes of the rock The Representation Information accompanying a digital object, or sequence of bits, is used to provide additional meaning. — It typically maps the bits into commonly recognized data types such as character, integer, and real and into groups of these data types — It associates these with higher level meanings which can have complex inter-relationships that are also described
20 March 2002LaRC Presentation28 Representation Information Object
20 March 2002LaRC Presentation29 Representation Information Components Structure Information — Common computer types such as characters, numbers, pixels, arrays, and aggregations of such structures with rules on how they map to each other Semantic Information — Additional meanings associated with the structural elements such as valid values, science parameter being represented, etc. Other Representation Information — Identifiers of other standards providing Representation Information, such as a reference to the ASCII standard
20 March 2002LaRC Presentation30 Types of Information Used in OAIS
20 March 2002LaRC Presentation31 Content Information The information that is the original target of preservation An instance of Content Information is the information that an archive is tasked to preserve. Deciding what is the Content Information may not be obvious and may need to be negotiated with the Producer The Content Data Object in the Content Information may be either a Digital Object or a Physical Object (e.g., a physical sample, microfilm)
20 March 2002LaRC Presentation32 Preservation Description Information Provenance Information – Describes the source of Content Information, who has had custody of it, what is its history Context Information – Describes how the Content Information relates to other information outside the Information Package Reference Information – Provides one or more identifiers, or systems of identifiers, by which the Content Information may be uniquely identified Fixity Information – Protects the Content Information from undocumented alteration
20 March 2002LaRC Presentation33 Example of Preservation Description Information Content Information Type ReferenceProvenanceContextFixity Space Science Data Digital Library Collections Software Package Object Identifier Journal Reference Mission, instrument, and title attribute set Bibliographic description Persistent identifier Name Author Version number Serial Number Instrument Description Processing History Sensor Description Instrument Instrument mode Decommutation map Software Interface Specifications For scanned collection Pointer to master version Metadata about digitization Metadata on preservation process Revision Histroy License holder Registration Copyright Calibration history Related data sets Mission Funding history Pointers to related documents in original environment at the time of publication Help file User Guide Related Software Language CRC Checksum Reed-Solomon coding Digital signature Checksum Authenticity indicator Certificate Checksum Encryption CRC
20 March 2002LaRC Presentation34 Descriptive Information Contain the data that serves as the input to documents or applications called Access Aids. Access Aids can be used by a consumer to locate, analyze, retrieve, or order information from the OAIS.
20 March 2002LaRC Presentation35 Packaging Information Information which, either actually or logically, binds and relates the components of the package into an identifiable entity on specific media Examples of Packaging Information include tape marks, directory structures and filenames
20 March 2002LaRC Presentation36 Archival Information Package (AIP) Content Information Preservation Description Information (PDI) e.g., Hardcopy document Document as an electronic file together with its format description Scientific data set consisting of image file, text file, and format descriptions file describing the other files e.g., How the Content Information came into being, who has held it, how it relates to other information, and how its integrity is assured OAIS Archival Information Package Packaging Information Package Descriptor further described by delimited by derived from e.g., How to find Content information and PDI on some medium e.g., Information supporting customer searches for AIP
20 March 2002LaRC Presentation37 AIP Types Based on the difference in Content Information complexity AIUs contain a single Content Data Object in their Content Information AICs contain multiple AIPs in their Content Information — Each contained AIP has its own Content Information and PDI — The AIC also contains unique PDI on the collection process
20 March 2002LaRC Presentation38 Package Descriptors and Access Aids Package descriptors are needed by an OAIS to provide visibility and access to the OAIS holdings Package Descriptors contain 1 or more Associated Descriptions which describe the AIP Content Information from the point of view of a single Access Aid Some examples of Access Aids Include: — Finding Aids - assist the consumer in locating information of interest — Ordering Aids - allow the consumer to discover the cost of and order AIUs of interest — Retrieval Aids - enable authorized users to retrieve the AIU described by the Unit Descriptor from Archival Storage
20 March 2002LaRC Presentation39 Information Model Summary Presented a model of information objects as containing data objects and representation objects Classified information required for Long-term archiving into 4 classes: Content Information, PDI, Packaging Information and Descriptive Information Described how these classes would be aggregated and related in an AIP to fully describe an instance of Content Information Presented information needed for Access, in addition to that needed for Long-term Preservation Put the Access oriented structures in the context of the other data needed to operate an OAIS
20 March 2002LaRC Presentation40 Detailed Models Functional View
20 March 2002LaRC Presentation41 General Principles Highlight the major functional areas important to digital archiving Use functional decomposition to clarify the range of functionality that might be encountered – Don't decompose beyond two levels to avoid becoming too implementation dependent – Provide a useful set of terms and concepts – Do not imply that all archives need to implement all the sub-functions Identify some common services which are likely to be needed, and are assumed to be available, as underlying support
20 March 2002LaRC Presentation42 Common Services Modern, distributed computing applications assume a number of supporting services Examples of Common Services include: — inter-process communication — name services — temporary storage allocation — exception handling — security — file and directory services
20 March 2002LaRC Presentation43 OAIS Functional Entities SIP = Submission Information Package AIP = Archival Information Package DIP = Dissemination Information Package SIP Descriptive Info. AIP DIP Administration PRODUCERPRODUCER CONSUMERCONSUMER queries result sets MANAGEMENT Ingest Access Data Management Archival Storage Descriptive Info. Preservation Planning orders
20 March 2002LaRC Presentation44 Functional Entities In An OAIS Ingest: This entity provides the services and functions to accept Submission Information Packages (SIPs) from Producers and prepare the contents for storage and management within the archive Archival Storage: This entity provides the services and functions for the storage, maintenance and retrieval of Archival Information Packages Data Management: This entity provides the services and functions for populating, maintaining, and accessing both descriptive information which identifies and documents archive holdings and internal archive administrative data. Administration: This entity manages the overall operation of the archive system Preservation Planning: This entity monitors the environment of the OAIS and provides recommendations to ensure that the information stored in the OAIS remain accessible to the Designated User Community over the long term even if the original computing environment becomes obsolete. Access: This entity supports consumers in determining the existence, description, location and availability of information stored in the OAIS and allows consumers to request and receive information products
20 March 2002LaRC Presentation45 Preservation Planning
20 March 2002LaRC Presentation46 Reference Model Summary Reference model is to be applicable to all digital archives, and their Producers and Consumers Identifies a minimum set of responsibilities for an archive to claim it is an OAIS Establishes common terms and concepts for comparing implementations, but does not specify an implementation Provides detailed models of both archival functions and archival information Discusses OAIS information migration and interoperability among OAISs
20 March 2002LaRC Presentation47 Some Applications
20 March 2002LaRC Presentation48 Basis of Systems Architectures NEDLIB (Networked European Deposit Library) effort used OAIS Reference Model as a basis for the design and architecture of Deposit System for Electronic Publications (DSEP) National Library of Australia used it as basis for their implementation CEDARS: A multi-site UK project to create exemplars in Digital Archiving is using OAIS representation data as the basis for research into long term preservation NSSDC (National Space Science Data Center ) is evolving their archive using OAIS RM as a basis for a new architecture SIPAD: French space agency plasma physics archive used the OAIS as a basis for design METS (Metadata Encoding and Transmission Standard) is using OAIS concepts in an implementation of types of Submission, Archival, and Dissemination Information Packages. InterPARES, a body of National Archives from many countries, adopted OAIS as a starting point for their modeling work
20 March 2002LaRC Presentation49 Enhanced Communications and Productivity among varied Communities National Archives and Records Administration contracted some work on long term preservation of collections to the San Diego Super Computer Center. Both parties claimed use of the OAIS RM saved several weeks of effort in the specification of the task Similar experiences between: — National Library of France and French space agency (CNES) representatives — National Center for Supercomputer Applications HDF format developers and DNA researchers — Life Sciences Archive developer and micro-gravity researchers — United States Department of Agriculture and digital preservation experts
20 March 2002LaRC Presentation50 More OAIS Accomplishments Royal Library of the Netherlands (RLN) — OAIS mandated in their implementation RFP — IBM implementing OAIS-based system for RLN (£5M project) British National Library is following suit France setting up a working group within ARISTOTE — interested in archive of digital information, including libraries and Dept of Justice. (in french) “astonishing unifying role” from OAIS reference model OAIS likely to be used by CODATA archive task group in study on long-term preservation Playing significant role in Research Libraries Group and OCLC (Online Computer Library Center) digital preservation work
20 March 2002LaRC Presentation51 Archive System Re-engineering National Space Science Data Center — Adopted the AIP concept as basis for preservation — Implemented a standard AIU packaging structure with required and optional attributes — Adopted a few ‘canonical form’ for the Content Data Objects created from the files to be preserved — Enables AIUs to be independent of underlying media and readily migrated across media types — Expanding AIU packaging to incorporate multiple files
20 March 2002LaRC Presentation52
20 March 2002LaRC Presentation53 Two CCSDS/ISO Follow-on Activities Producer-Archive Interface Methodology Standard — Provides framework for Producer/Archive interactions — Identifies steps and types of information exchanged during the ‘negotiation’ — May be used as a checklist by archives Certification Coordination Function — Will track and summarize various archive certification efforts — Will attempt to extract high-level model/checklist
20 March 2002LaRC Presentation54 Reference URLs July 15 OAIS RM version ISO Archive Standards Overview Web site Lavoie, Brian. "Meeting the challenges of digital preservation: the OAIS reference model". OCLC Newsletter. No. 243.January/February Pages *An excellent overview of the OAIS RM and Workshops. * — Research Libraries Group has established a web page to track OAIS implementation efforts and issues —