Download presentation
Presentation is loading. Please wait.
1
Introduction to Preservation Metadata
5 Feb 2009 Introduction to Preservation Metadata Peter McKinney National Library of New Zealand Te Puna Mātauranga o Aotearoa PREMIS Rome Tutorial
2
What is preservation metadata?
5 Feb 2009 OUTLINE What is preservation metadata? What functions of a digital repository does preservation metadata support? Background of PREMIS Data Dictionary PREMIS data model, identifiers, and relationships PREMIS maintenance activity PREMIS Rome Tutorial
3
Metadata and preservation metadata
“Structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource”* “Metadata that supports and documents the digital preservation process” METADATA *
4
Preservation metadata includes:
Provenance: Who has had custody/ownership of the digital object? Authenticity: Is the digital object what it purports to be? Preservation Activity: What has been done to preserve it? Technical Environment: What is needed to render and use it? Rights Management: What IPR must be observed? Makes digital objects self-documenting across time Content 10 years on 50 years on Forever! There are a number of questions that need to be answered to ensure preservation of digital objects over time. We all know that files can get corrupted and unreadable and it is important to be able to track who has had custody of an object and all actions performed on an object. What preservation strategies have been applied needs to be recorded as well as what software or hardware is needed to use it and whether there are any restrictions on preservation activities (in addition to access). If metadata is associated with an object it can become self-documenting over time. Authenticity of digital content A digital object is said to be authentic if it is what it purports to be. Integrity of digital content Digital object integrity is the quality of a digital object remaining uncorrupted and free of unauthorised and undocumented change.
5
Examples of preservation functions and how metadata supports them
Object should be stored securely so it can’t be altered; checksums are stored as metadata to show whether the object has changed between two points in time Files are stored on media that can be read by current computers; metadata supports media management by recording type and age of media and dates refreshed Over time file formats become obsolete and preservation managers must employ preservation strategies to keep them useable; migration and emulation strategies require metadata about original format and software and hardware environments supporting them Preservation actions involve changing original resources and authenticity comes into question; metadata supports digital provenance (chain of custody and change history)
6
Standards that address preservation metadata: technical
PREMIS Images NISO Z39.87 and MIX Adobe and XMP (Extensible Metadata Platform) Exif (Exchangeable Image File Format) IPTC (International Press Telecommunications Council)/XMP Examples of technical metadata for images: byte order, compression scheme, color space, color profile, image width
7
Standards that address preservation metadata: technical
5 Feb 2009 Standards that address preservation metadata: technical Text: textMD Examples: fonts, character set, byte order Sound AES : Audio Object XML Schema AES : Core Audio Metadata AudioMD (Library of Congress) Examples: audio data encoding; byte order; physical structure, materials, dimensions XML Schema maintained by the Library of Congress that details technical metadata for text-based digital objects PREMIS Rome Tutorial
8
Standards that address preservation metadata: technical
5 Feb 2009 Standards that address preservation metadata: technical Video VideoMD SMPTE RP210 Technical metadata in EBUCore, PBCore U.S. Federal Agencies Digitization Guidelines MPEG-7 and MPEG-21 for video Examples: generation, dimensions, byte order, bits per sample, aspect ratio, color coding reVTMD FAGDI PREMIS Rome Tutorial
9
Standards that address preservation metadata: Structural
METS PREMIS MPEG 21 Digital Item Declaration OAI/ORE Specific format types MXF AVI
10
Standards that address preservation metadata: Rights
PREMIS METS Rights CDL Copyright schema Creative commons PLUS for images MPEG-21 REL for moving images ONIX for licensing terms Full rights expression languages XRML/MPEG-21 ODRL
11
PREMIS Data Dictionary
May 2005: Data Dictionary for Preservation Metadata: Final Report of the PREMIS Working Group March 2008: PREMIS Data Dictionary for Preservation Metadata, version 2.0 (version 2.1 Jan. 2011) April 2012: version 2.2 Includes PREMIS Data Dictionary, context/assumptions, data model, usage examples XML schema to support implementation Data Dictionary: Comprehensive view of information needed to support digital preservation Guidelines/recommendations to support creation, use, management Based on deep pool of institutional experiences in setting up and managing operational capacity for digital preservation
12
Guiding principles: “implementable, core preservation metadata”
5 Feb 2009 Guiding principles: “implementable, core preservation metadata” Preservation metadata: maintain viability, renderability, understandability, authenticity, identity in a preservation context Core: What most preservation repositories need to know to preserve digital materials over the long-term Implementable: rigorously defined; supported by usage guidelines/recommendations; emphasis on automated workflows PREMIS Rome Tutorial
13
Guiding principles: “technical neutrality”
Digital archiving system: no assumptions about specific archiving technology, system/DB architectures, preservation strategy Metadata management: no assumptions about whether metadata is stored locally or in external registry; recorded explicitly or known implicitly; instantiated in one metadata element or multiple elements Promotes flexibility, applicability in wide range of contexts
14
Scope What PREMIS DD is: What PREMIS DD is not:
5 Feb 2009 Scope What PREMIS DD is: Common data model for organizing/thinking about preservation metadata Guidance for local implementations Standard for exchanging information packages between repositories Compatible with the OAIS reference and information model What PREMIS DD is not: Out-of-the-box solution: need to instantiate as metadata elements in repository system All needed metadata: excludes business rules, format-specific technical metadata, descriptive metadata for access, non-core preservation metadata Lifecycle management of objects outside repository Rights management: limited to permissions regarding actions taken within repository PREMIS Rome Tutorial
15
What’s in PREMIS? “Things” you have to describe PREMIS Data Model
What you want to say about these “things” Semantic units in the PREMIS Data Dictionary How you want this information to be encoded and implemented In XML PREMIS XML schema In RDF OWL ontology Or any other way you like it
16
The PREMIS Data Model Data model includes:
5 Feb 2009 The PREMIS Data Model Data model includes: Entities: “things” relevant to digital preservation that are described by preservation metadata (Intellectual Entities, Objects, Events, Rights, Agents) Properties of Entities (semantic units) Relationships between Entities Why have data model? Organizational convenience (for development and use) Useful framework for distinguishing applicability of semantic units across different types of Entities and different types of Objects But: not a formal entity-relationship model; not sufficient to design databases PREMIS Rome Tutorial
17
PREMIS Data Model Intellectual Entities Rights Statements Agents
Objects Events
18
Intellectual Entities
5 Feb 2009 Intellectual Entities Set of content that is considered a single intellectual unit for purposes of management and description (e.g., a book, a photograph, a map, a database) Has one or more digital representations May include other Intellectual Entities (e.g. a website that includes a web page) Not fully described in PREMIS DD, but can be linked to in metadata describing digital representation THIS WILL CHANGE IN 3.0 Examples: The Chamber by John Grisham (an ebook) “Maggie at the beach” (a photograph) The Metropolitan New York Library Council Website (a website) PREMIS Rome Tutorial
19
Objects Objects are what repository actually preserves Examples:
5 Feb 2009 Objects Objects are what repository actually preserves FILE: named and ordered sequence of bytes that is known by an operating system REPRESENTATION: set of files, including structural metadata, that, taken together, constitute a complete rendering of an Intellectual Entity BITSTREAM: data within a file with properties relevant for preservation purposes (but needs additional structure or reformatting to be stand-alone file) FILESTREAMS (files within files) are considered files since can be rendered alone Examples: a PDF file A book composed of several XML files and many images TIFF file containing a header and 2 images PREMIS Rome Tutorial
20
Object Example: book in two versions
Intellectual Entity Da Vinci Code by Dan Brown Representation 1 Page image version Representation 2 ebook version File 1: page1.tiff File 2: page2.tiff File N: pageN.tiff book.lit File N+1: METS.xml
21
Technical metadata pertaining to objects
Object identifier Preservation level Significant characteristics Object characteristics fixity format size creating application inhibitors object characteristics extension Original name Storage Environment software hardware will change in 3.0 Digital signatures Relationships Linking event identifier Linking rights statement identifier
22
5 Feb 2009 Events An action that involves or impacts at least one Object or Agent associated with or known by the preservation repository Helps document digital provenance. Can track history of Object through the chain of Events that occur during the Objects lifecycle Determining which Events are in scope is up to the repository (e.g., Events which occur before ingest, or after de-accession) Determining which Events should be recorded, and at what level of granularity is up to the repository Examples: Validation Event: use JHOVE tool to verify that chapter1.pdf is a valid PDF file Ingest Event: transform an OAIS SIP into an AIP (one Event or multiple Events?) PREMIS Rome Tutorial
23
5 Feb 2009 Agents Person, organization, or software program/system associated with an Event or a Right (permission statement) Agents are associated only indirectly to Objects through Events or Rights Not defined in detail in PREMIS DD; not considered core preservation metadata beyond identification Examples: Rebecca Guenther (a person) New York Public Library (an organization) JHOVE version 1.0 (a software program) PREMIS Rome Tutorial
24
Rights Statements An agreement with a rights holder that grants permission for the repository to undertake an action(s) associated with an Object(s) in the repository. Not a full rights expression language; focuses exclusively on permissions that take the form: Agent X grants Permission Y to the repository in regard to Object Z. Example: Priscilla Caplan grants FCLA digital repository permission to make three copies of metadata_fundamentals.pdf for preservation purposes.
25
From the data model to the data dictionary
5 Feb 2009 From the data model to the data dictionary Data model: defines Entities and Relationships between them Data dictionary: for each entity lists its semantic units A semantic unit is a property of an Entity Something you need to know about an Object, Event, Agent, Right Piece of information most repositories need to know in order to carry out their digital preservation functions Two kinds of semantic units: Container: groups together related semantic units Semantic components: semantic units grouped under the same container Semantic units can be recorded explicitly, or known implicitly However it is implemented/recorded, a semantic unit should be recoverable from archiving system Example: ObjectIdentifier [container] ObjectIdentifierType [semantic component] ObjectIdentifierValue [semantic component] PREMIS Rome Tutorial
26
Identifiers in PREMIS Identifiers used to All identifiers have
identify unambiguously an object, agent, event, rights statement… [entity]Identifier and link it to another entity linking[entity]Identifier All identifiers have An identifierType (category of identifier) An identifierValue (the identifier itself) identifierType makes the identifier unique in a given domain Examples: URL, DOI, ARK, local… If all identifiers are local to the repository system, identifierType does not have to be recorded for each identifier in the system BUT it should be supplied when exchanging data with others
27
5 Feb 2009 Relationships Many different types of information relevant to preservation can be expressed as relationships: e.g., “A is part of B”, “A is scanned from B”, “A is a version of B” PREMIS Data Dictionary supports expression of relationships between: Different Objects Across same level or different levels Structural: relationships between parts of a whole Derivation: relationships resulting from replication or transformation of an Object Different Entities Relationships are established through reference to Identifiers of other Objects or Entities PREMIS Rome Tutorial
28
Example: Structural relationship File “is part of” Representation
5 Feb 2009 Example: Structural relationship File “is part of” Representation File: graphic.gif Representation: Web Page relationship [part of the description of File] relationshipType = structural relationshipSubType = is part of relatedObjectIdentification [the Web page] relatedObjectIdentifierType = repositoryID relatedObjectIdentifierValue = relatedObjectSequence = 0 relatedEventIdentification [none] is part of PREMIS Rome Tutorial
29
5 Feb 2009 Example: Derivation relationship File 1 “is source of” File 2 through Migration Event is source of File 1 (original) File 2 (migrated) relationship [part of description of File 1] relationshipType = derivation relationshipSubType = is source of relatedObjectIdentification [identifier of File 2] relatedObjectIdentifierType = repositoryID relatedObjectIdentifierValue = F004400 relatedObjectSequence [none] relatedEventIdentification [Migration Event ID] relatedEventIdentifierType = repEventID relatedEventIdentifierValue = E0192 relatedEventSequence [none] through event Migration Event PREMIS Rome Tutorial
30
Extension containers in PREMIS
PREMIS is core preservation metadata PREMIS defines an Extension container to extend PREMIS if you need more granular description specific semantic units (non-core information) out of scope semantic units (not grounded in preservation) Extensions are empty containers Its semantic components are whatever you need One schema per extension; if more schemas are needed, the extension element needs to be repeated Mechanism in PREMIS XML Schema: <mdSec> element Data in the container may replace, refine or be additional to the appropriate PREMIS semantic unit
31
PREMIS Extensions significantPropertiesExtension
creatingApplicationExtension objectCharacteristicsExtension environmentExtension signatureInformationExtension eventOutcomeDetailExtension agentExtension rightsExtension
32
Sample data dictionary entry
Is it a container unit? What does it contain? Why should it be recorded? How should it be recorded? constraints and examples How should it be provided? Some implementation guidelines
33
How do people implement PREMIS?
Using METS in XML to package together metadata and content information In relational databases In commercial preservation repository systems Part of open source repository system that supports metadata Archivematica DAITTS METS using PREMIS in METS guidelines for SIP and DIP Open source tool available for creation and validation
34
PREMIS Maintenance Activity
Web site: Permanent Web presence, hosted by Library of Congress Central destination for PREMIS-related info, announcements, resources Home of the PREMIS Implementers’ Group (PIG) discussion list PREMIS Editorial Committee: Set directions/priorities for PREMIS development Coordinate future revisions of Data Dictionary and XML schema Promote implementation International in scope, cross domain Membership has changed; previously we had a representative from Los Alamos National Labs, National Archives in Scotland and National Library of Australia.
35
Implementation resources
Tools: XML schema PREMIS-in-METS toolbox < Controlled vocabularies at RDF/OWL ontology for use as Linked Data Guidelines: PREMIS conformance statement PREMIS & METS guidelines Community Working groups on special topics Implementation Fairs Others: Understanding PREMIS (available in multiple languages) PIG Forum Implementation Registry Tools Registry PREMIS in METS toolbox consists of 3 modules to help implementers: describe (generate PREMIS metadata), convert (between PREMIS and METS), validate (ensure quality metadata) Controlled vocabularies to increase interoperability and consistency of metadata RDF/OWL ontology to allow for interconnection among preservation repositories, facilitate querying the metadata, and incorporate preservation-specific controlled vocabularies Guidelines available results in quality and consistent metadata through the conformance statement and the guidelines for using PREMIS in METS Community working groups on specific topics include: Ontology working group; Environment working group (to amend the data model)– open to the preservation community at large to participate PREMIS Implementers group forum allows for the preservation community to participate in PREMIS development and submit change requests to the EC Implementation registry assists new implementers in planning their preservation systems Tools registry gives implementers tools
36
Some implementers … DAITTSS (Florida)
A preservation repository for the use of the libraries of the public universities of Florida Developed PREMIS tools and open source software Ex Libris Rosetta a commercial digital preservation system supporting acquisition, validation, ingest, storage, management, preservation and dissemination of different types of digital objects National Digital Newspaper Program Implemented an early version of PREMIS FDSys (US Government Printing Office) Content management, search engine, preservation repository Supports media refreshment, content authentication, virus checking Abstract
37
Some implementers (cont.)
Archivematica Comprehensive open-source digital preservation system Supports many preservation repository functions TIPR (Towards Interoperable Preservation Repositories) Experimenting with a standard package for exchange based on PREMIS in METS guidelines FCLA, NYU and Cornell HathiTrust: partnership of major research institutions and libraries working to ensure that the cultural record is preserved and accessible long into the future Digital libraries in Spain Mandated for use in cultural heritage preservation repositories See:
38
Making preservation metadata happen
Metadata should be the basis for design of a digital repository Systems are needed to store information supporting preservation and access decisions Enough metadata needs to be provided to take appropriate actions to maintain digital objects over the long-term There should be automatic population of the maximum number of elements Any system should ensure preservation of metadata as well as preservation of objects PREMIS provides a critical piece of reliable digital preservation infrastructure comprised of technology, standards, and best practice
39
URLs, etc. PREMIS Maintenance Activity:
PREMIS Data Dictionary for Preservation Metadata: PREMIS Implementation Registry PREMIS Implementers Group list
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.