PREMIS, METS and preservation metadata: emerging trends and future directions Eld Zierau The Royal Library of Denmark.

Slides:



Advertisements
Similar presentations
Curating Research: problems and policy Dale Peters Scientific Technical Manager DRIVER II.
Advertisements

The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
Pulling it all together… with thanks to Sheila Anderson.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
The OAIS experience at the British Library Deborah Woodyard Digital Preservation Coordinator ERPANET OAIS Training Seminar, Nov 2002.
PREMIS: To Be or Not To Be in My METS The Preservation Journey at the University of Connecticut Libraries ALA Annual 2013 ALCTS PARS Intellectual Access.
Spatial Data Infrastructure: Concepts and Components Geog 458: Map Sources and Errors March 6, 2006.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
ISO & OAI-PMH By Neal Harmeyer, Amy Hatfield, and Brandon Beatty PURDUE UNIVERSITY RESEARCH REPOSITORY.
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
Common Use Cases for Preservation Metadata Deborah Woodyard-Robinson Digital Preservation Consultant Long-term Repositories:
R.Jantz, August 31, Two-day forum on PREMIS Preservation Metadata and the Trusted Digital Repositories August 31, September 1 National Library of.
Current Thinking on Digital Preservation: Role of Metadata Oya Y. Rieger Coordinator, Library Office of Distributed Learning Cornell University Library.
US GPO AIP Independence Test CS 496A – Senior Design Fall 2010 Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong.
AIP Archival Information Package – Defines how digital objects and its associated metadata are packaged using XML based files. METS (binding file) MODS.
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
Changes for Preservation Policy Metadata concerning preservationLevel entity Eld Zierau The Royal Library of Denmark PREMIS implementation Fair 2014 Melbourne,
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
Digital preservation Hydra Europe, LSE 24 April 2015 Anders Conrad.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
IIPC GA, Stanford, US - WARCApril 28 th 2015Slide 1 WARC as Package Format for all Preserved Digital Material by Eld Zierau The Royal Library of Denmark.
PREMIS Implementation at The Royal Library of Denmark by Eld Zierau.
Jenn Riley Metadata Librarian Indiana University Digital Library Program.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Digital Preservation MetaArchive Cooperative.  9:00-9:45 - Session 1: Digital Preservation Overview  9:45-11:00 - Session 2: Policy & Planning Overview.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
E.Soundararajan R.Baskaran & M.Sai Baba Indira Gandhi Centre for Atomic Research, Kalpakkam.
Habing1 Integrating PREMIS and METS PREMIS Tutorial Implementers’ Panel June 21, 2007, 9:00-5:30 Library of Congress, Jefferson Building, Whittall.
PREMIS Implementation Fair – SF 2009 PREMIS use in Rosetta Yair Brama – Ex Libris.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Conceptual Data Modelling for Digital Preservation Planets and PREMIS Angela Dappert.
ETD2006 Preserving ETDs With D.A.I.T.S.S. FLORIDA CENTER FOR LIBRARY AUTOMATION FC LA PAPER AUTHORS: Chuck Thomas Priscilla.
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
VITAL at the National Library of Wales Glen Robson
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Enterprise Solutions Chapter 10 – Enterprise Content Management.
Dr. Norman Swindells Ferroday Ltd Birkenhead, U.K. Conservation of digital data.
Implementing PREMIS in DigiTool Michael Kaplan ALA 2007 Update.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
OAIS (archive) Producer Management Consumer. Representation Information Data Object Information Object Interpreted using its Yields.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
OAIS (archive) OAIS (archive) Producer Management Consumer.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Joint Meeting of CSUL Committees,
FLORIDA CENTER FOR LIBRARY AUTOMATION
OAIS Producer (archive) Consumer Management
Building A Repository for Digital Objects
DAITSS: Dark Archive in the Sunshine State
An Introduction to Tessella and The Safety Deposit Box Platform
Integrating PREMIS and METS
Implementing an Institutional Repository: Part II
Metadata in Digital Preservation: Setting the Scene
Robin Dale RLG OAIS Functionality Robin Dale RLG
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

PREMIS, METS and preservation metadata: emerging trends and future directions Eld Zierau The Royal Library of Denmark

Introduction My background Masters in Computer Science in 1989 At the Royal Library of Denmark since 2007 ◦ Strategy and design of preservation systems ◦ Creation of preservation policies and strategies ◦ Policies of using preservation metadata PhD in Digital Preservation in 2011 Currently at the Royal Library SIFD – the digital library Management, dissemination and preservation Packaging and re-packaging for Bit Repository WARC, METS, PREMIS Framework for OAIS and Distributed Digital Preservation

Contents of this presentation Practices at The Royal Library ◦ Strategies and policies ◦ Putting it into practice Challenges ◦ Expressing preservation levels and intellectual entities ◦ Preserving preservation metadata ◦ Expressing preservation levels and intellectual entities over time An example on the bit level ◦ Risks mitigated in bit preservation ◦ Bit integrity/safety, confidentiality and availability Types of preservation Levels ◦ How to express them – also over time Identification of intellectual entities ◦ How to express them – also over time Summary

Preservation Strategies Logical preservation ◦ Migration ◦ Emulation ◦ Technology preservation Bit preservation Digital preservation Logical preservation Bit preservation …

Currently at The Royal Library Strategy and policies ◦ Bit preservation ◦ Logical preservation Putting it into practice ◦ The chosen Metadata Standards ◦ The Digital Library infrastructure ◦ The Danish Bit Repository Framework

Metadata Standards and use Inspired by the Australian way METS header Descriptive metadata File metadata Structural Map Structural link metadata Behavior metadata Technical metadata Rights metadata Analog/digital source metadata Digital provenance metadata METS document Wrapped MODS Wrapped PREMIS object part Wrapped ??? Video Wrapped MODS Wrapped PREMIS rights part Wrapped ??? Wrapped PREMIS agent part Wrapped PREMIS event part Wrapped PREMIS preservationLevel part Administrativ metadata Wrapped AES sound Wrapped MIX images … METS element MODS element PREMIS element MIX element AES element Will be included May be left out To be decided

Digital Library infrastructure PreservationDissemination Management Ingest Access Representations with metadata Standards Prefer static Simplicity Prefer dynamic New technology Add value Fast access BR

Challenges with metadata Preserving preservation metadata (if …) Expressing preservation levels Expressing preservation levels over time Expressing Intellectual Entity identifiers Expressing Intellectual Entity identifiers over time We need to look more closely on bit preservation to define levels and levels over time

A General View of a Bit Repository Elements in bit preservation: Number of copies Independence between copies Frequency of integrity checks Bit Repository General System Layer Pillar Layer … Pillar1 Pillar 2 Pillar 3 Pillar 4 Pillar 5 Pillar 6Pillar 7 Bit sequences Integrity check Coordination Organisation & techniques designed and arranged and used for long-term bit preservation Pillar 8

Integrity – Bit error File in form of bit stream Risk: Bits can change value 1. Error has occurred in Backup 2. File is corrupted 3. Error is not discovered 4. Cannot determine which file is intact Solutions: 1. No backup. All are copies of data 2. Vote on which copy that is the right one

Bit error – System Layer Solutions: 1. System layer checks and follow-up on basis of comparing copies 2. Minimum three voters, optimize by checksums Bit Repository (BR) System Layer Pillar Layer …

A0953B 7 Integrity – Bit error File in form of bit stream Solutions: 1. No backup. All are copies of data 2. Vote on which copy that is the right one 3. Introduce checksums of files to discover errors Risk: Bits can change value 1. Error has occurred in Backup 2. File is corrupted 3. Error is not discovered 4. Cannot determine which file is intact

Integrity – Bit error Risk: The same error occurs for more copies 1. Same hardware 2. Same software 3. Same vendor

Integrity – Bit error Solutions: 1. Different hardware solutions 2. Different vendors 3. Different software (OS, interpreters, etc.) Risk: The same error occurs for more copies 1. Same hardware 2. Same software 3. Same vendor Windows … Unix … Mac OS …

Integrity – Disasters Risk: All copies are damaged at the same time 1. Natural disasters 2. Attacks in connection with war or terror

Integrity – Disasters Solutions: 1. Different geographical locations Risk: All copies are damaged at the same time 1. Natural disasters 2. Attacks in connection with war or terror

Integrity – Organisation Risk: Errors/mistakes are made by the same person/org. 1. The same person has access and has delete rights 2. The same person makes procedural mistakes

Integrity – Organisation Solutions: 1. Different organisations Risk: Errors/mistakes are made by the same person/org. 1. The same person has access and has delete rights 2. The same person makes procedural mistakes

A0953B 7 Risk: Unauthorised gets access to confidential data 1. Unauthorised gets access to Bit Repository 2. Unauthorised gets access to data from Bit Repository Confidentiality Solutions: 1. Authentication of users of pillars with copies 2. Encryption internally on pillar 3. Hardware secured in locked rooms Encryption can conflict integrity

Bit Repository (BR) System Layer Pillar Layer … Confidentiality – System Layer A09537 System Layer … Likewise on System layer

Solutions : 1. Specialised pillar with distributed architecture Risk: Cannot get access as required 1. Cannot get any response on request 2. Processing not possible in reality Availability Processing can conflict confidentiality and bit safety

Availability Solutions: 1. Redirection if access to a pillar is down 2. Distributed requests to different pillars 3. Scaling 4. Diversity, 5. … Bit Repository (BR) System Layer … Pillar Layer … Depends on what is required

Bit Repository Offering Solutions Media Data safety Access speed On-line Off-line Organisational placement Geographical placement … BR General System Layer Pillar Layer … Pillar 5Pillar 3Pillar 1Pillar 4Pillar 2Pillar 8Pillar 6 Bit safety Availability Costs Confidentiality Pillar 7 Pillar 9

Bit Safety ValueComment for preservationLevelType = BitSafety MaxMaximum bit safety VeryHighVery high bit safety HighHigh bit safety MediumMedium bit safety LowLow bit safety VeryLowVery low bit safety MinMinimum bit safety from

Bit Safety ValueComment for preservationLevelType = BitSafety MaxMaximum bit safety VeryHighVery high bit safety HighHigh bit safety MediumMedium bit safety LowLow bit safety VeryLowVery low bit safety MinMinimum bit safety Strategy 2050: 8 copies ; at lest 2 on Mars, at least two written to DNA, checked every … Strategy 2013: 10 copies spread over 3 continents, both optical and magnetic medias, checked every … Policy: As high bit safety that we can get

Confidentiality ValueComment for preservationLevelType = Confidentiality MaxMaximum confidentiality VeryHighVery high confidentiality HighHigh confidentiality MediumMedium confidentiality LowLow confidentiality VeryLowVery low confidentiality MinMinimum confidentiality from

Confidentiality ValueComment for preservationLevelType = Confidentiality MaxMaximum confidentiality VeryHighVery high confidentiality HighHigh confidentiality MediumMedium confidentiality LowLow confidentiality VeryLowVery low confidentiality MinMinimum confidentiality Strategy 2050: ??? … Strategy 2013: No more than 2 copies, that are secured on off-line medias … Policy: Only restricted access, where it is as hard as it can get for others when skipping encryption

Availability ValueComment for preservationLevelType = Availability MaxMaximum availability VeryHighVery high availability HighHigh availability MediumMedium availability LowLow availability VeryLowVery low availability MinMinimum availability …

Logical Preservation ValueComment for preservationLevelType = logicalStrategy MigrationMigration of digital material to keep data interpretable EmulationEmulation of digital material to keep data interpretable TechnicalTechnology preservation to keep data interpretable …

Preservation Level information preservationLevelTypeComment Bit safetyBit preservation ConfidetialityBit preservation AvailabilitylBit preservation Logical Preservation StrategyLogical Preservation … Policy With institution preservation policies Same over time Express values Strategy Requirements for fulfilment with current technologies …

Preservation Level in metadata bitSafetyHigh T19:28: :00 logicalStrategyMigration T19:28: :00 confidentialityLow T19:28: :00 bitSafetyHigh T19:28: :00 logicalStrategyMigration T19:28: :00 confidentialityLow T19:28: :00

Identification in the future 15AE AE9513 Service Provider Producer Con- sumer Object Object id. Object id. & Service Object

Preservation Level in metadata UUID 41d153d e b67 … UUID 41d153d e b67 … UUID UUID 41d153d e b67 41d153d e b67

Summary PREMIS, METS and preservation metadata: emerging trends and future directions Preserving preservation metadata ◦ Which and how Some challenges for the future ◦ Definition of preservation levels and intellectual entities ◦ Expressing preservation levels and intellectual entities ◦ Expressing preservation levels and intellectual entities over time

Questions and Comments

Extra Slides

Institution Repository (IR) A Holistic Approach: Three Main Results Bit Repository (BR) Pillar Layer … Pillar 1 Pillar 2 Pillar 3 General System Layer √… IR-BR Model to define and delimit BR on a conceptual level from other aspects of digital preservation Representation Concept to define repr. that are bit preserved and to find requirements to bit preservation Evaluation Methodology to choose between different bit preservation solutions, as part of finding the optimal solution … ? ? ? √√

Research project that investigate combination of OAIS reference model Distributed Digital Preservation (DDP) Participating projects/institutions: OAIS/DDP research project …

Position based like ARC and WARC WARC is an ISO standardised enhancement of ARC WARC/1.0 WARC-Type: warcinfo WARC-Date: T15:50:16Z WARC-Record-ID: Content-Type: application/warc-fields Content-Length: 46 application: id.kb.dk/gatekeeper/releasetest17 WARC/1.0 WARC-Type: resource WARC-Target-URI: urn:uuid:15AE9513 WARC-Date: T15:50:14Z WARC-Record-ID: Content-Type: image/tiff Content-Length: II*1214i eeciRGB v2 P`p¡²ÃÔå,>PcuÁÕèü$8Ma … 15AE9513 WARC package ID IKKE FÆRDIG: Måske ikke med Summary

SLA 2 System: Technique, Organisation, Costs Pillar 2: Technique, Organisation, Costs Pillar 3: Technique, Organisation, Costs SLA 1 System: Technique, Organisation, Costs Pillar 1: Technique, Organisation, Costs Pillar 3: Technique, Organisation, Costs The DK Bit Repository case General System Layer Bit Repository (BR) General System Layer Technique, Organisation, Costs Pillar Layer Pillar 1 -Technique -Organisation -Costs Pillar 2 -Technique -Organisation -Costs Pillar 3 -Technique -Organisation -Costs Pillar 4 -Technique -Organisation -Costs Pillar 5 -Technique -Organisation -Costs … Royal Library State & Univ. Library National Archives State & Univ. Library

Services An institution makes service level agreement about specific services A service level agreement per ”source” of data with specific requirements The service level agreement will be based on: ◦ The institutions need for services ◦ Risk analysis made for the institution ◦ Cost/benefit-analysis made for the institution The service level agreement is based on the different components in the Bit Repository Bit Repository (BR) SLA 2: low bit safety level, high confidentiality Bit Repository (BR) SLA 1: Medium bit safety level, fast access Bit Repository (BR) …

Package and storage related requirem. Requirement 1: Independence of storage platform Requirement 1: Independence of storage platform Requirement 2: Package format allows flexible packaging Requirement 2: Package format allows flexible packaging Requirement 3: Allow update records Requirement 3: Allow update records Bit Repository … Different package sizes optimal for different media/storage 12

Preservation related requirements Requirement 4: Must be standardised format Requirement 4: Must be standardised format Requirement 5: Must be open Requirement 5: Must be open Requirement 6: Must be easy to understand Requirement 6: Must be easy to understand Requirement 7: Must be widely used in bit repositories Requirement 7: Must be widely used in bit repositories Requirement 8: Must be supported by existing tools Requirement 8: Must be supported by existing tools Requirement 9: Must be able to include digital files unchanged Requirement 9: Must be able to include digital files unchanged As for preservation formats mitigate risk of losing means to understand the format As for preservation formats mitigate risk of loosing access to documentation and tools for interpretation As for preservation formats mitigate the risk of introducing errors or later difficulties in interpreting As for pres. formats If adopted widely, it may stay longer As for preservation formats trust in quality and future existence of the format No conversion e.g. compression needed in XML Fasdnigfndsgnjdflvknswlæg,åpw6i3s fmweklFMwlfme

Requirement 10: Must facilitate identifiers for a digital object Requirement 10: Must facilitate identifiers for a digital object Identification related requirements Especially a challenge when the object is ’a file’ 15AE AE9513 Service Provider Producer Con- sumer Object Object id. Object id. & Service Object 15AE9513 ?

The PREMIS standard PREMIS Preservation Metadata: Implementation Strategies Primarily used for Digital provenance Example of fields of special interest: ◦ preservationLevel ◦ linkingIntellectualEntityIdentifier Hosted at Library of Congress

Reasons for id. requirements 1. Leave it 100% to the bit preservation solution ◦ Risk since it is crucial information in preservation – outsource of responsibility ◦ Eliminate possible optimisation of packaging more files or files and metadata in the same packages 2. Naming files with the identifier ◦ file name is not part of the file itself ◦ restrictions to how files are named ◦ may not make same sense in the future 3. Put identifier into files as inherited metadata ◦ would need to change original bits ◦ knowledge of how to extract identifiers from file formats 4. Wrap files and identifier in a package format ◦ requirements for the abilities of the package format put the id. with the file 15AE ae9513.abc 15  9513  ? Year AE9513 FileId: 15AE9513… Year 2052 ? 15AE AE9513.ABC

List of Preservation related requirements Req.1: Independence of storage platform Req.1: Independence of storage platform Req. 2: Package format allows flexible packaging Req. 2: Package format allows flexible packaging Req. 3: Allow update records Req. 3: Allow update records Req. 4: Must be standardised format Req. 4: Must be standardised format Req. 5: Must be open Req. 5: Must be open Req. 6: Must be easy to understand Req. 6: Must be easy to understand Req. 7: Must be widely used in bit repositories Req. 7: Must be widely used in bit repositories Req. 8: Must be supported by existing tools Req. 8: Must be supported by existing tools Req. 9: Must be able to include digital files unchanged Req. 9: Must be able to include digital files unchanged Req. 10: Must facilitate identifiers for a digital object Req. 10: Must facilitate identifiers for a digital object Not exhaustive list of requirements!!!

Main contents of PREMIS object ◦ Describes whether it is a 'file', 'representation', or 'bitstream‘ ◦ Contains e.g. object identifier, preservation level, significant properties and relationships event ◦ Aggregates metadata about actions. ◦ Contains e.g. event identifier, event type (creation, migration, …), and other event details agent ◦ Describes the agent that can have different roles in relation to an event or object ◦ Contains agent identifier and possibly other informaiton like name and type etc. rights ◦ Describes rights and permissions specifically related to digital preservation ◦ Contains e.g. rights statements and possible extions with other metadata

Can be wrapped in METS Metadata Encoding and Transmission Standard Container format with: Descriptive metadata Information describing intellectual contents Administrative metadata Necessary information for preservation: ◦ Technical metadata Technical information ◦ Rights management The information necessary to restrict its delivery to those entitled to access it ◦ Digital provenance The information on the creation and subsequent treatment of the digital object Structural metadata The information on the internal structure of a digital object …

Preservation Metadata Challenges Currently at The Royal Library Strategies and policies Putting it to practice Challenges Preserving preservation metadata Expressing preservation levels over time An example Digital preservation strategies Bit Repository with bit preservation Risks mitigated in bit preservation Confidentiality and availability Preservation Levels How to express them How to express them over time PREMIS, METS at the Royal Library of Denmark

Partners: The Royal Library, The State and University Library, The National Archives Goals : Cost effective shared solution Receive bits and deliver bits in intact form (>100 years) Challenges : Mission and goals Value of material (bit safety – preservation levels) Confidentiality requirements Service requirements (ingest, access, processing, …) Inspiration: The Danish Bit Repository Only bits

Ref: “Cross Institutional Cooperation on a Shared Bit Repository” by Eld Zierau and Ulla Kejser ICDL 2010 Best international paper The IR-BR Model – consider place IR in org. C IR in org. B IR in org. A IR: Institution Repository BR MANAGEMENT BR CON- SUMER BR PRO- DUCER …… IR Archival Storage BR Ingest BR Access BR Preservation Planning BR Data Management BR Administration BR Archival Storage IR Ingest IR Access IR Preservation Planning IR Data Management IR Administration Logical object, metadata, … Bit streams, id., origin, audit trail Technology watch: Media … Planning & strategy … BR as ‘IR archival storage’ but there are more to it!!! Technology watch Planning & strategy … Logical Object Bit streams, id.Bit streams, id., origin Policies: storage, disaster recovery, … Policies: storage (BR), disaster recovery IR level), … Technology watch: Logical preservation Planning & strategy … Policies: storage (BR), disaster recovery IR level), … BR: Bit Repository

Digital Library infrastructure PreservationDissemination Management Ingest Access Representations with metadata Standards Prefer static Simplicity Prefer dynamic New technology Add value Fast access Fedora solutions Just started development BR Danish bit repository being put to production Repackaging metadata – into WARC