IS-ENES Cases Seven use cases are listed as data lifecycle steps A B C

Slides:



Advertisements
Similar presentations
UKOLN is supported by: Put functionality Augmenting interoperability across scholarly repositories 20/21 April 2006 Rachel Heery, UKOLN, University of.
Advertisements

Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
METS In order to reconstruct the archive, we will need to understand the METS files. METS is schema that provides a flexible mechanism for encoding descriptive,
Applying Theoretical Archival Principles and Policies to Actual Born Digital Collections LEIGH ROSIN | Digital Archivist | National Library of New Zealand.
Preservation Strategies: What do long-term archives do with my data? Jeff Arnfield NOAA’s National Climatic Data Center Version 1.0 Review Date.
1 Institutional Repository (IR) Models Rutgers University Community Repository (RUcore) A digital library perspective (objects and collections) Flexible.
R.Jantz, August 31, Two-day forum on PREMIS Preservation Metadata and the Trusted Digital Repositories August 31, September 1 National Library of.
US GPO AIP Independence Test CS 496A – Senior Design Fall 2010 Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
What are the key improvements in web content management?
Biostatistics Analysis Center Center for Clinical Epidemiology and Biostatistics University of Pennsylvania School of Medicine Minimum Documentation Requirements.
January, 23, 2006 Ilkay Altintas
A Dynamic Solution for Electronic Records: The National Archives & Records Administration’s Electronic Records Archives Kenneth Thibodeau, Director Electronic.
Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
Provenance Metadata for Shared Product Model Databases Etiel Petrinja, Vlado Stankovski & Žiga Turk University of Ljubljana Faculty of Civil and Geodetic.
The Lifecycle of Embedded Image Metadata within Digital Photographs: Challenges and Best Practices. - or - The Secret Life of Photo Metadata To promote,
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
Call with D. Maraun Statistical Downscaling Controlled Vocabulary 5 DEC 2013.
A survey based analysis on training opportunities Dr. Jūratė Kuprienė Framing the digital curation curriculum International Conference Florence, Italy.
Creating documentation and metadata: Recording provenance and context Jeff Arnfield National Climatic Data Center Version a1.0 Review Date.
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California EDGE: The Multi-Metadata.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
A Fedora 3 to 4 Migration Case Study for UNSW Australia Library Fedora 4 Training Workshop, eResearch Australasia 2015, Brisbane UNSW Library Arif Shaon,
A Fedora 3 to 4 Migration Case Study for UNSW Australia Library Fedora 4 Training Workshop, eResearch Australasia 2015, Brisbane UNSW Library Arif Shaon,
MTF - Travellers 28/2/2001 S. Mallon + P. Martel EDMS Doc Id Contents Problem and scope Timing of MTF usage Definition of a standard MTF Creation.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
M. Stockhause 1, G. Levavasseur 2, K. Berger 1 1 Deutsches Klimarechenzentrum (DKRZ) 2 Institute Pierre Simon Laplace (IPSL) ESGF-QCWT Quality Control.
Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Preservation Strategies: Data transfer & submission agreements Ronald Weaver National Snow and Ice Data Center Version 1.0 Review Date.
Discussion of Data Fabric Terms & Preparation for RDA P7 Virtual Meeting Monday, January 25, 2016 Organized by Gary Berg-Cross (DFT-IG) and Peter Wittenburg.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
The 2007 Microsoft Office System Servers Enterprise Content Management, Workflow and Forms Martin Parry Developer and Platform Group, Microsoft Ltd
PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA …………………………………………………………………………………………………… LOUISE CORTI …………………….…………………………….… UK DATA ARCHIVE.
OAIS (archive) Producer Management Consumer. Representation Information Data Object Information Object Interpreted using its Yields.
Records Management with MOSS, K2, & PsiGen Deepa Patadia
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
TRIG: Truckee River Info Gateway Dave Waetjen Graduate Student in Geography Information Center for the Environement (ICE) University of California, Davis.
Making FAAM Flights Discoverable
AP7/AP8: Long-Term Archival of CMIP6 Data
World Conference on Climate Change October 24-26, 2016 Valencia, Spain
OAIS Producer (archive) Consumer Management
Building A Repository for Digital Objects
Exercise: understanding authenticity evidence
Data Ingestion in ENES and collaboration with RDA
Integrating Data for Archaeology
Active Data Management in Space 20m DG
An introduction to MEDIN Data Guidelines.
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
Linking persistent identifiers at the British Library
VI-SEEM Data Repository
C2CAMP (A Working Title)
Outline Pursue Interoperability: Digital Libraries
at Statistics Netherlands
What’s changed in the Shibboleth 1.2 Origin
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
2. An overview of SDMX (What is SDMX? Part I)
Repository Platforms for Research Data Interest Group: Requirements, Gaps, Capabilities, and Progress Robert R. Downs1, 1 NASA.
DLCF Enabling technologies
EDDI2016 Esra Akdeniz, Wolfgang Zenk-Möltgen
Batch Setup.
RDA uptake activities and plans: ESGF
Reportnet 3.0 Database Feasibility Study – Approach
IPP Job Storage 2.0: Fixing JPS2
ESTP course on Statistical Metadata – Introductory course
Presentation transcript:

IS-ENES Cases Seven use cases are listed as data lifecycle steps A B C Data Generation Data Postprocessing / Homogenisation Data Ingest into Data Centres Data Publication into data federation E F G Versioning Errata information Processing and derived data products Long Term Archival and DOI assignment

IS-ENES Cases The seven use cases have provenance requirements A B C D Data Generation Data Postprocessing / Homogenisation Data Ingest into Data Centres Data Publication into data federation Formal model documentation (ES-DOC) Files organized in collections characterized by facets defined in CVs Data center specific ingest workflow logs Connection of files with PIDs and ES-DOC E F G Versioning Errata information Processing and derived data products Long Term Archival and DOI assignment Errata documents connected with PIDs Derived data products, processing logs (input data, tool info) Author information, DOI for collections

IS-ENES Cases – Provenance Challenges Different ENES use cases for provenance collection and management along the complete data life cycle. Large amount of provenance related information artefacts collected along the data life cycle. A coherent, formal model based on overall provenance architecture is missing.

IS-ENES Cases – Challenge 3 Suggest a coherent, formal model based on overall provenance architecture A B C D Data Generation Data Postprocessing / Homogenisation Data Ingest into Data Centres Data Publication into data federation Formal model documentation (ES-DOC) Files organized in collections characterized by facets defined in CVs Data center specific ingest workflow logs Connection of files with PIDs and ES-DOC E F G Versioning Errata information Processing and derived data products Long Term Archival and DOI assignment Errata documents connected with PIDs Derived data products, processing logs (input data, tool info) Author information, DOI for collections

IS-ENES Cases – Challenge 3 Why is a coherent, formal model based on overall provenance architecture needed? A B C D Data Generation Data Postprocessing / Homogenisation Data Ingest into Data Centres Data Publication into data federation Formal model documentation (ES-DOC) Files organized in collections characterized by facets defined in CVs Data center specific ingest workflow logs Connection of files with PIDs and ES-DOC E F G Versioning Errata information Processing and derived data products Long Term Archival and DOI assignment Errata documents connected with PIDs Derived data products, processing logs (input data, tool info) Author information, DOI for collections

Specific requirements for provenance? Since each step already collects some type of provenance information, is it OK to just map them to PROV independently? PROV metadata for Generation PROV metadata for Postprocessing PROV metadata for Data Centres Ingest PROV metadata for Data Publication Formal model documentation (ES-DOC) Files organized in collections characterized by facets defined in CVs Data center specific ingest workflow logs Connection of files with PIDs and ES-DOC PROV metadata for Versioning Errata PROV metadata for Processing PROV metadata for LTArchival/DOI Errata documents connected with PIDs Derived data products, processing logs (input data, tool info) Author information, DOI for collections

There are commonalities on the provenance metadata NOTE: Common in the sense of they store similar metadata, not implying that they are the same metadata PROV metadata for Postprocessing PROV metadata for Data Centres Ingest Files organized in collections characterized by facets defined in CVs Data center specific ingest workflow logs PROV metadata for Generation PROV metadata for Data Publication Formal model documentation (ES-DOC) Connection of files with PIDs and ES-DOC ES-DOC Log Metadata PIDs PROV metadata for Versioning Errata PIDs PROV metadata for LTArchival/DOI PROV metadata for Processing Errata documents connected with PIDs Author information, DOI for collections Derived data products, processing logs (input data, tool info)

Could we suggest a common PROV schema? PROV metadata for Postprocessing PROV metadata for Data Centres Ingest PROV metadata for Generation PROV metadata for Data Publication Common PROV Schema PROV metadata for Versioning Errata PROV metadata for LTArchival/DOI PROV metadata for Processing

Could we suggest a common PROV schema? PROV metadata for Postprocessing PROV metadata for Data Centres Ingest PROV metadata for Generation PROV metadata for Data Publication Common PROV Schema Question: Is PROV flexible enough to Map to different institutions, processes and datasets? include reference to input to build a provenance chain? create a provenance registry? PROV metadata for Versioning Errata PROV metadata for LTArchival/DOI PROV metadata for Processing Idea: Handle each case as a black box Keep the output metadata include reference to input Resolve input reference if more metadata is needed

In this context: What are the actual requirements of a provenance architecture? B C D Data Generation Data Postprocessing / Homogenisation Data Ingest into Data Centres Data Publication into data federation Formal model documentation (ES-DOC) Files organized in collections characterized by facets defined in CVs Data center specific ingest workflow logs Connection of files with PIDs and ES-DOC E F G Versioning Errata information Processing and derived data products Long Term Archival and DOI assignment Errata documents connected with PIDs Derived data products, processing logs (input data, tool info) Author information, DOI for collections

A proposed set of requirements for provenance Architecture Provenance metadata should be lightweight The contents of the provenance metadata should contain only the metadata corresponding to the latest state and a reference to the source. The detailed history is only needed locally and can be unpacked on request. Provenance metadata should be self contained Limit the need of external entities/systems to interpret provenance provenance data should be retained / preserved close to the place at which it is generated / relevant. Provenance metadata should be resolvable Every link in the provenance chain should point to its origin Provenance should be backwards compatible Adding or changing provenance metadata should preserve the compatibility with previous instances, not requiring them to be updated.