Metadata for Digital Objects

Slides:



Advertisements
Similar presentations
Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
Advertisements

October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Fedora Users’ Conference Rutgers University May 14, 2005 Researching Fedora's Ability to Serve as a Preservation System for Electronic University Records.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Fedora 3.0 and METS: A Partnership for the Organization, Presentation and Preservation of Digital Objects Open Repositories Georgia Tech, Atlanta,
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
Lifecycle Metadata for Digital Objects October 30, 2006 Archival Metadata Appraisal, Inventory, Retention Schedule, Authenticity, Accession.
Common Use Cases for Preservation Metadata Deborah Woodyard-Robinson Digital Preservation Consultant Long-term Repositories:
PREMIS What is PREMIS? – Preservation Metadata Implementation Strategies When is PREMIS use? – PREMIS is used for “repository design, evaluation, and archived.
An Introduction to Metadata by Wendy Duff ECURE 2000 October 6, 2000.
Metadata: An Introduction By Wendy Duff October 13, 2001 ECURE.
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
PREMIS What is PREMIS? o Preservation Metadata Implementation Strategies When is PREMIS use? o PREMIS is used for “repository design, evaluation, and archived.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
Descriptive Metadata o When will mods.xml be used by METS (aip.xml) ?  METS will use the mods.xml to encode descriptive metadata. Information that describes,
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Metadata : Setting the Scene or a Basic Introduction Wendy Duff University of Toronto, Faculty of Information Studies.
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Lifecycle Metadata for Digital Objects October 4, 2004 Creation Metadata.
A CIDOC CRM – compatible metadata model for digital preservation
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Lifecycle Metadata for Digital Objects November 22, 2004 Usage and Rights Management Metadata.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Lifecycle Metadata for Digital Objects September 11, 2002 Major archival and digital library metadata schemes.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Lifecycle Metadata for Digital Objects October 18, 2004 Transfer / Authenticity Metadata.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Evidence from Metadata INST 734 Doug Oard Module 8.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Lifecycle Metadata for Digital Objects October 23, 2006 Creation Metadata.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Lifecycle Metadata for Digital Objects September 4, 2002 Overall framework: OZ meets WC3.
Cedars work on metadata Michael Day UKOLN, University of Bath Cedars Workshop Manchester, February 2002.
Lifecycle Metadata for Digital Objects October 9, 2002 Transfer / Authenticity Metadata.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
Lifecycle Metadata for Digital Objects November 27, 2006 Rights Management Metadata.
2/26/2004 Dan Swaney 1 Preservation Metadata and the OAIS Information Model A Metadata Framework to Support the Preservation of Digital Objects A review.
Lifecycle Metadata for Digital Objects November 13, 2002 Rights Management Metadata.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Lifecycle Metadata for Digital Objects With an emphasis on preservation…
Lifecycle Metadata for Digital Objects September 11, 2006 It’s markup all the way down.
Joint Meeting of CSUL Committees,
CESSDA SaW Training on Trust, Identifying Demand & Networking
Metadata Issues in Long-term Management of Data and Metadata
7th Annual Hong Kong Innovative Users Group Meeting
Ingest and Dissemination with DAITSS
DAITSS: Dark Archive in the Sunshine State
Lifecycle Metadata for Digital Objects
Metadata in Digital Preservation: Setting the Scene
An Open Archival Repository System for UT Austin
Oya Y. Rieger Cornell University Library May 2004
Open Archival Information System
Presentation transcript:

Metadata for Digital Objects With an emphasis on preservation… Pat Galloway, SoD, 9/10/09

Remarks on digitization Cost-benefit Sliver of a sliver? Or corpus? Digitization as preservation Obligation to preserve Resulting requirements for metadata

What is metadata? Data about data Functions? Kinds? Database usage Web usage (metatags) Functions? Kinds? Several perspectives from which to consider metadata: orders, functions, life-cycle

First-order metadata: representation schemes Encoding (ASCII, proprietary formatting schemes) Compression schemes Encryption or other intentional distortion schemes These lie at the base of digital objects and exist before the creation of the object

Second-order metadata Written natural language (for example) Layout conventions Separation of words Arrangement of groups of words Punctuation, capitalization, etc. Note that this is usually considered to belong to an external standard (“English”)

Third-order metadata “Connections to the world” Meaning Semantics Pragmatics

Fourth-order metadata Functions What can you do with the digital object? What is its purpose? How does it work? Functionality significant for preservation Explicit digital object types

Fifth-order metadata Groups of digital objects Context of the group Archival series Project files “Complex documents” Context of the group

More orders? Additional intermediate orders could be thought of Depends on granularity May depend on object type

Classic objects of preservation in archives Content Context Structure

Functional types of metadata Administrative Descriptive (especially resource discovery) Preservation Technical Use

Life cycle view of metadata Appraisal/Inventory/Scheduling Creation and versioning Transfer/Authenticity Descriptive Use Rights management Preservation and disposition

Attributes of metadata items Source of metadata (internal or external) Method of metadata creation (auto or manual) Nature of metadata (lay or expert) Status (static or dynamic) Structure (structured or unstructured) Semantics (controlled or uncontrolled) Level (item or collection) Note: these attributes are relevant for all metadata)

Major Archival Metadata Schemes

University of Pittsburgh metadata reference model in six layers Handle Terms & Conditions Structural Contextual Content Use History

Example: Structural Layer specifies technical details File identification metadata File encoding metadata File rendering metadata Record rendering metadata Content structure metadata Source metadata

InterPARES Project Authenticity template Documentary form Extrinsic elements Intrinsic elements Annotations Medium Context

Dublin Core Metadata Initiative Supported by OCLC Primarily a surrogate/discovery metadata scheme Does not aim to document everything Useful for management of active digital objects

Basic Dublin Core elements Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights

Dublin Core development Initial development of simple elements Subelements and user communities Warwick Framework Qualified Dublin Core RDF and XML

Metadata Encoding and Transmission Standard (METS) Developed out of LoC’s MOA project Designed to support maintenance of libraries of digital objects METS document is a “wrapper” containing pointer to the object plus its metadata Three overall types of metadata (three segments of METS document) Descriptive Administrative Structural

METS Descriptive metadata External (e.g., finding aid that can be pointed to via a URL) Internal (included in the document) Can include several different metadata sets as relevant

METS Administrative metadata Technical metadata Intellectual property rights metadata Source metadata (for analog source) Digital provenance metadata Relations between files Migration/transformation data

METS Structural metadata File groups list Structural map (defines relations between files and METS element structure) Behavior segment (associates executable methods with specific files, e.g. for display)

METS and XML The METS XML schema http://www.loc.gov/standards/mets/mets_xsd/mets.html Why is it all so complicated? How can anyone ever keep track of all this metadata?

XML in 10 Points XML is for structuring XML looks like HTML XML is text for computers XML is purposely verbose XML is a family XML is only partly new XHTML->XML XML is modular XML is base for RDF, Semantic Web XML is free, universal, supported

Creation Metadata

Metadata added at creation By the creator By the creating application program (note: some of this is meant for system use) Example of hybrid process: creation of Word file

Example: Word processing

Digitization as creation Preprocessing Conversion Quality control Object manipulation Surrogate outputs (see handouts)

Appraisal / Inventory / Retention Schedule Metadata

Digital Appraisal Decisions Keep (costs of carrying into the future) Allow to Die (keep but do nothing) Repurpose (separating content and form) Destroy (microwave the disk?)

Digital Appraisal: What to Appraise Content (as with paper?) Technical support System Creating application Display requirements Functionality

What is a Retention Schedule? Classic record statuses: active, semiactive, inactive Keep Alter function of custodian Alter custodianship Allow to Die Leave with creator? Why not always do this? Destroy Determine when to destroy Almost always a method for reprieve exists…

Record-level vs Group-level Metadata Record-level: Metadata orders 1-4 1 encoded (content) 2 written (content) 3 meaning (ontology) 4 function/purpose=type (form) Group-level: Metadata order 5 5 Object grouping schemes (categories) Record groups, record series (intellectual management) Format, security concerns (physical management)

Transfer / Authenticity Metadata

The central problem: Security guaranteeing Authenticity Guarding the object (authenticity, integrity) Tracking the object through its lifetime Proving the identities of the people responsible for transferring the object (authentication, non-repudiation) Transferring the object in a secure way

What is transfer about? What is a digital copy? What qualifies? Data compression issues Data segmentation issues Creating application vs file-management application How can a digital copy be guaranteed? Digital object as string of bits Message digest of object as math on the bits Ship the message digest with the object Recalculate and compare at the other end

Guaranteeing the authenticity of the object (Integrity) Object as open or secret Must we disguise the object? Can we move it around in clear? Message digest Creates single number: “one-way hash” Number will change with the slightest change in the object on which it was calculated Encryption (Confidentiality) Asymmetric Symmetric

Accession Metadata

What is the nature of the accession task? The object received has been uprooted from its former context Object is equipped with enough metadata to reconstruct that context Contextual metadata now is no longer functional but descriptive of the old context Object must be integrated into a new context (which may mirror the old) New functions must be provided for (meta-activities)

Validation of the object Validation test suite Validation tools Formal validation process Validation outcomes Rejection Re-transfer Acceptance

Preparation of the object for storage Metadata as data and as processing instructions Digital object and use copy Storage issues

Descriptive Metadata

Descriptive metadata for what? Individual objects (Dublin Core, RDF) Books and other chunks (MARC, MODS) Multimedia objects (METS, MPEG 21) Finding aids (EAD): collection-level

What about the single object? Is Dublin Core enough? What for? Who will describe at the object level? Zillions of archivists? Automatic analysis? Ad hoc analysis? Taggers on the Internet?

Preservation Metadata

What is Preservation Metadata? Object stability (OAIS “content data object”) What elements of the object’s content should be preserved? What is it? What is it for? What functions of the object should be preserved? (i.e., how can it remain itself into the future, and what do we mean by “itself”?) Environmental support (OAIS “environment”) What kind of environmental characteristics does the object need to stay alive (software, hardware)? (i.e., how do we specify its life support system?)

Object Stability I: Content Authenticity revisited: stability for what? Access to genuine article Historical truth Guarantee of prior art Intellectual property guarantee Range of attributes needed for each What does “content” mean?

Object Stability II: Functionality Static objects (e.g. text) Look and feel Dynamic objects (e.g. computer game) Connectivity Interactivity

Environmental Support I: Emulation Making it possible to see the object as it was originally seen Making it possible for the object to function as it originally did Providing software support for that to happen Running the original program (in an environment that emulates the original environment) Running something that looks like (emulates) the original program

Environmental Support II: Migration Deciding what to migrate (deciding what to lose) Transformations to the object If reversible, no need to keep original object If not, retention of original object necessary

Documentation requirements for preservation What the object was What the object is What happened in between

OAIS metadata model I

OAIS metadata model II SIP (send), AIP (archive), DIP (disseminate) Parts of an object Content Preservation description Reference (unique identifier) Provenance (history in and out of repository) Context (archival bond) Fixity (message digest) Packaging Descriptive

OAIS metadata model III What is “representation information”? How much must be kept? Monitoring changes What is the “knowledge base”? Designated user community DUC as “the public”

PREservation Metadata Implementation Strategies Preservation metadata set, 2003-present Assumes OAIS model Maintaining viability, renderability, understandability, authenticity, identity Emphasis on provenance and relationships Entity concept [Intellectual entity: descriptive metadata] Object Event Agent (MARC, MADS) Rights Technical/hardware metadata out of scope

PREMIS Example: Object objectIdentifier objectCategory preservationLevel significantProperties objectCharacteristics originalName storage environment signatureInformation relationship linkingEventIdentifier linkingIntellectualEntityIdentifier linkingRightsStatementIdentifier

Usage Metadata

What is Usage Metadata? Internal users (with respect to the creator) External users (with respect to the creator) Internal users (with respect to the repository) External users (with respect to the repository)

Creator Usage The creator’s actual use of the object Version control The creator’s colleagues’ use of the object Object function Object used for reference, model The creator’s customers’ use of the object Object function: mediates relationship

Repository Usage Management usage Designated user community Object maintenance and preservation Object analysis Designated user community Object viewing Object acquisition

Rights Management Metadata

What is Rights Management? Protection of copyright Protection of patent Protection of the integrity of the digital object (and thereby reputation of the author/creator herself)

What is being protected? Object itself (integrity) Uses of the object (access controls) Limiting use (protecting rights of the owner) Enabling use (protecting rights of the user)

Protection against theft Threats of the law Fully document with metadata and protect the metadata Authentication of users and user requests Watermarking/steganography

What about integrity of the digital object? Relevant even in public domain E.g. “copyleft” agreement: http://www.gnu.org/copyleft/gpl.txt See but not change, or change only with notification

Metadata Conclusions?