Metadata for digital long-term preservation


Similar presentations
A centre of expertise in digital information management UKOLN is supported by: Models for integrating institutional repositories and research.

Issues and approaches to preservation metadata Michael Day UKOLN: UK Office for Library and Information Networking University of Bath
The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
Joint Information Systems Committee Digital Library Services BL/JISC Workshop Rachel Bruce JISC Programme Director The Digital Library and its Services,
The metadata challenge for libraries: a view from Europe Michael Day UKOLN: The UK Office for Library and Information Networking, University of Bath
An overview of collection-level metadata Applications of Metadata BCS Electronic Publishing Specialist Group, Ismaili Centre, London, 29 May 2002 Pete.
Metadata and the description of digital images Michael Day UKOLN, University of Bath International Digital Image Symposium London,
Preservation Metadata Initiatives: Practicality, Sustainability, and Interoperability Michael Day UKOLN, University of Bath ERPANET Training.
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University.
Towards consensus on collection-level description Collection Description Focus Briefing Day 1 British Library, St Pancras, London 22 October 2001 Bridget.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
INFSO-RI Enabling Grids for E-sciencE Grid & Data Preservation Boon Low System Development, EGEE Training National.
The JISC IE Metadata Schema Registry Pete Johnston UKOLN, University of Bath JISC Joint Programmes Meeting Brighton, 6-7 July 2004
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
Common Use Cases for Preservation Metadata Deborah Woodyard-Robinson Digital Preservation Consultant Long-term Repositories:
3. Technical and administrative metadata standards Metadata Standards and Applications.
VidArch Preserving Video Objects and Context: A Demonstration Project Helen R. Tibbo School of Information and Library Science University of North Carolina.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by #APARSEN.
Digital | Curation | Centre The UK Digital Curation Centre Michael Day UKOLN, University of Bath (with thanks to Peter Burnhill, Chris Rusbridge, et al.)
Metadata for preservation Michael Day, UKOLN, University of Bath Chinese-European Workshop on Digital Preservation,
Documenting to preserve your data: metadata in support of digital preservation Michael Day, UKOLN, University of Bath
A Lightweight Approach To Support of Resource Discovery Standards The Problem Dublin Core is an international standard for resource discovery metadata.
Metadata in support of digital preservation Michael Day, UKOLN, University of Bath Beginners Guide to Metadata:
Franklin Consulting Programme X The Innovation Base The e-Framework: What do they mean for programme management? Tom Franklin Franklin Consulting Richard.
Jenn Riley Metadata Librarian Indiana University Digital Library Program.
ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics.
Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.
The JISC IE Metadata Schema Registry and IEEE LOM Application Profiles Pete Johnston UKOLN, University of Bath CETIS Metadata & Digital Repositories SIG,
Database System Development Lifecycle 1.  Main components of the Infn System  What is Database System Development Life Cycle (DSDLC)  Phases of the.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
Life Cycle Models & Principles Jake Carlson Associate Professor of Library Science Data Services Specialist Purdue University Libraries.
Joint Information Systems Committee Supporting Higher and Further Education Catherine Grout Assistant Director for Development, JISC/DNER
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
From small beginnings: Developing collection level description Mapping the Information Landscape Showcase day British Library Conference Centre, London,25.
Metadata for digital preservation: a review of recent developments Michael Day UKOLN, University of Bath ECDL2001, 5th European Conference.
PREMIS Data Dictionary and the Future of Preservation Metadata Brian Lavoie Research Scientist OCLC Research Society of American Archivists.
JISC/CNI Conference Edinburgh, 26th June 2002 Challenges of Digital Preservation – do we have a road map? Maggie Jones.
The OAIS Reference Model Michael Day, Digital Curation Centre UKOLN, University of Bath Reference Models meeting,
Preservation metadata and the Cedars project Michael Day UKOLN: UK Office for Library and Information Networking University of Bath
CombeDay Making Data Openly Available Simon Coles.
Preservation Metadata Initiatives: Status and Direction Brian Lavoie Senior Research Scientist Office of Research OCLC Archiving Web Resources Canberra.
ARIADNE is funded by the European Commission's Seventh Framework Programme Archiving and Repositories Holly Wright.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Surveying the landscape: collection-level description & resource discovery JISC/NSF DLI Projects meeting, Edinburgh, 24 June 2002 Pete Johnston UKOLN,
Cedars work on metadata Michael Day UKOLN, University of Bath Cedars Workshop Manchester, February 2002.
Long-term preservation and access: the UK context Michael Day, UKOLN, University of Bath RCUK Workshop on Publication.
An overview of the Reference Model for an Open Archival Information System (OAIS) Michael Day, Digital Curation Centre UKOLN, University.
An Introduction to PREMIS Jenn Riley Metadata Librarian IU Digital Library Program.
Research Data Management 26 th April 2016 Federica Fina, Data Scientist, University of St Andrews Library.
Metadata Schema Registries: background and context MEG Registry Workshop, Bath, 21 January 2003 Rachel Heery UKOLN, University of Bath Bath, BA2 7AY UKOLN.
Archiving CAD in Archaeology: Ingest to Dissemination (or The ADS experience to date) Kieron Niven Archaeology Data Service, University of York, UK.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
SciDataCon 2014, WDS Forum, Dehli WDS Certification Objective: building trust in the usage of data & data services Michael Diepenbroek Rorie Edmunds Mustapha.
School on Grid & Cloud Computing International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics.
Applying preservation metadata to repositories The British Library, 21 January 2008 Led by Steve Hitchcock With Bill Hubbard, Gareth Johnson.
An Approach to Software Preservation
Data Use and Re-Use within an Applied Science Research Cluster
Building A Repository for Digital Objects
Active Data Management in Space 20m DG
Accessing a national digital library: an architecture for the UK DNER
VI-SEEM Data Repository
Outline Pursue Interoperability: Digital Libraries
Metadata for preservation
ESciDoc Introduction M. Dreyer.
Metadata in Digital Preservation: Setting the Scene
Oya Y. Rieger Cornell University Library May 2004
Open Archival Information System
Research Data Dr Aoife Coffey, Research Data Coordinator
Presentation transcript:

Metadata for digital long-term preservation Michael Day, Digital Curation Centre UKOLN, University of Bath MPG eScience Seminar 2008: Aspects of long-term archiving, GWDG Göttingen, 19-20 June 2008

Presentation outline: Some definitions An abstract approach: OAIS A framework for practical implementation: the PREMIS Data Dictionary Some open questions for e-research data Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

Definitions (1) Metadata: A relatively new term that is used to describe a very old concept We primarily need to think about the different functions it enables, e.g. discovery and access management, the management of resources, long-term preservation, etc. Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

Definitions (2) Preservation metadata: “... the information a repository uses to support the digital preservation process” (PREMIS Data Dictionary) Potentially very wide scope: Technical information on data structures or formats Information to help better understand the content Information on contexts and provenance Information on preservation processes Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

Definitions (3) Metadata for research data: Metadata are fundamentally important to the continued understanding and exploitation of research data It is “impossible to conduct a correct analysis of a data set without knowing how the data was cleaned, calibrated, what parameters were used in the process” (Deelman, et al 2004) In some cases, extremely detailed documentation will be required Captured from various stages of lifecycle Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

The OAIS Information Model (1) General OAIS background: An ISO standard (ISO 14721:2003) Development led by the Consultative Committee on Space Data Systems Provides standard terminology and defines two interrelated models (functional model, information model) Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

The OAIS Information Model (2) Some general principles: OAIS entities (Data Objects and Content Information) are conceptually bound together with information that provides additional meaning There are two main classes of this: Representation Information Preservation Description Information Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

The OAIS Information Model (3) Representation Information: Is tightly bound with the Data Object Provides a bridge between the bit-level information being stored in an OAIS and something that can be understood Describing data structure concepts, or formats (Structure Information) Providing additional information on semantics (Semantic Information) Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

The OAIS Information Model (4) Preservation Description Information: The additional information “needed to make the Content Information meaningful for the indefinite long-term” (p. 4-33) For example, the information “needed to preserve the Content Information, to ensure that it is clearly identified, and to understand the environment in which the Content Information was created” (p. 2-6) Reference, Context, Provenance, Fixity Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

The OAIS Information Model (5) Lessons from OAIS (1): Data objects (and content) need to be closely coupled with additional layers of information (metadata) that will help provide meaning and context, etc. These layers broadly reflect the main characteristics of digital information (physical, logical, intellectual) Produces self-documenting objects Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

The OAIS Information Model (6) Lessons from OAIS (2): It highlights the importance of preserving context and provenance (but these are quite vaguely defined) OAIS works on an abstract level, but there is a need to think about what needs to be done in practical terms to develop preservation metadata schemata ... Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

PREMIS Data Dictionary (1) Background (1): PREMIS Working Group (2003-2005) An attempt to develop something that would be implementable Development informed by OAIS model Built upon on several initiatives that had been developing preservation metadata schemas and frameworks prior to 2003 Data Dictionary first published in May 2005; v. 2.0 in March 2008 Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

PREMIS Data Dictionary (2) Background (2): PREMIS Maintenance Activity set up by Library of Congress PREMIS Implementers Group (open discussion list) Recent revision of PREMIS takes account of the experiences of implementers Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

PREMIS Data Dictionary (3) What PREMIS aims to do: The Data Dictionary is specifically focused on defining the core metadata needed for long-term preservation “... the information a repository uses to support the digital preservation process” Related to a series of verbs: “... functions to maintain viability, renderablility, understandability, authenticity, and identity in a preservation context” Based on a data model Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

PREMIS Data Dictionary (4) PREMIS Data Model: Recognises that digital preservation is as much about describing processes as well as objects Five entities Intellectual Entities Objects Events Agents Rights Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

PREMIS Data Dictionary (5) PREMIS 2.0 Data Model Intellectual entities Rights Objects Agents Events Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

PREMIS Data Dictionary (6) PREMIS usage (1): Survey undertaken for PREMIS Maintenance Activity (2007) 16 repositories and projects surveyed (mostly dealing with documents rather than data) Survey noted much diversity in the way PREMIS had been implemented Tools were being used to capture technical metadata automatically Formats could be identified using tools like JHOVE and PRONOM DROID Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

PREMIS Data Dictionary (7) PREMIS usage (2): No major eScience input into PREMIS PREMIS is occasionally used to help inform the preservation of research data: The National Snow and Ice Data Centre has used PREMIS as a way of evaluating its own OAIS-inspired metadata schema The Stanford Digital Repository has experimented with the using PREMIS for geospatial resources Experiments with the Yale Social Science Data Archive Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

PREMIS Data Dictionary (8) Lessons from PREMIS: The Data Model demonstrates the importance of recording the contexts of preservation (events, agents), not just metadata on the objects Currently little used in the e-research domain, but it has some potential where structured metadata already exists in some form (e.g., CSDGM, DDI) Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

Implications for e-research (1) The role of standards The development of standards (e.g. PREMIS) assumes that there is some level of commonality between domains However, generic solutions are not really feasible for e-research data because of the diversity and complexity of: Research data (content) Research contexts Stakeholders Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

Diversity and complexity (1) Diversity of content (1) Research data is “... any information that can be stored in digital form, including text, numbers, images, video or movies, audio, software, algorithms, equations, animations, models, simulations, etc.” (National Science Board, Long- lived digital data collections, 2005) Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

Diversity and complexity (2) Diversity of content (2): Research data is extremely diverse - not really a single category of material tabular data, images, GIS, etc. raw machine output vs, derived data varying levels of structure (XML, legacy formats, etc.) many different standards Research data is not homogeneous No one-size-fits-all approach possible Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

Diversity and complexity (3) There is an even wider range of social contexts in which data is used (and shared) DCC SCARP project has been exploring disciplinary factors in curation practice Practice even within single disciplines is very fragmented Case studies ongoing Big-science archives, medical and social sciences, architecutre and engineering, biological images Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

Diversity and complexity (4) Major disciplinary differences: Attitudes towards data sharing Some are very open, some cannot see the point Existence of data centre infrastructures In UK some centrally funded data centres, not universal Where do institutions fit? The existence of standards Already present in social sciences (DDI), the geospatial domain (FGDC), and many others Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

Diversity and complexity (5) Diversity of stakeholders: The many different actors that have an interest in data curation means that metadata requirements may differ Dealing with data (2007): Scientist, Institution, Data centre, User, Funder, Publisher Long-lived data collections (2005): Data authors, Data managers, Data scientists, Data users, Funding agencies Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

Implications for e-research (2) Metadata for digital curation or for long-term preservation? The concept of digital curation focuses on reuse and adding value - long-term preservation is not always the aim PREMIS metadata is focused on particular things (viability, renderablility, understandability, authenticity and integrity) What metadata do we need for digital curation? Could this ever be generic? Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

Implications for e-research (3) Metadata can be difficult to identify Difficult sometimes to work out where data ends and metadata begins Depends on the point of view of the researcher Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

Implications for e-research (4) Lifecycle view Metadata has to be captured at multiple places in the scientiic workflow Need to capture: Processes (can be driven by instrumentation) Provenance Context Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

Implications for e-research (5) Big science, little science: Big science is by its nature data driven, and will often develop appropriate frameworks for its management and reuse (data centres, data grids) Other scientific domains (e.g, ecology, biodiversity, chemistry) are moving in the same direction, but data retain a high-level of diversity and complexity Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

Summing-up The OAIS Information Model provides an abstract framework for thinking about preservation metadata PREMIS provides an implementation framework that is beginning to be adoped in some domains There are still many unresolved questions when it comes to defining metadata for research data Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008

Acknowledgements The Digital Curation Centre is funded by the JISC and the UK Research Councils' e-Science Core Programme. UKOLN is funded by the Museums, Libraries and Archives Council, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC, the European Union, and other sources. UKOLN also receives support from the University of Bath, where it is based. Aspects of Long-Term Archiving, Göttingen, 19-20 June 2008