Download presentation
Presentation is loading. Please wait.
Published byBarrie Horn Modified over 6 years ago
1
IS-ENES Cases Seven use cases are listed as data lifecycle steps A B C
Data Generation Data Postprocessing / Homogenisation Data Ingest into Data Centres Data Publication into data federation E F G Versioning Errata information Processing and derived data products Long Term Archival and DOI assignment
2
IS-ENES Cases The seven use cases have provenance requirements A B C D
Data Generation Data Postprocessing / Homogenisation Data Ingest into Data Centres Data Publication into data federation Formal model documentation (ES-DOC) Files organized in collections characterized by facets defined in CVs Data center specific ingest workflow logs Connection of files with PIDs and ES-DOC E F G Versioning Errata information Processing and derived data products Long Term Archival and DOI assignment Errata documents connected with PIDs Derived data products, processing logs (input data, tool info) Author information, DOI for collections
3
IS-ENES Cases – Provenance Challenges
Different ENES use cases for provenance collection and management along the complete data life cycle. Large amount of provenance related information artefacts collected along the data life cycle. A coherent, formal model based on overall provenance architecture is missing.
4
IS-ENES Cases – Challenge 3
Suggest a coherent, formal model based on overall provenance architecture A B C D Data Generation Data Postprocessing / Homogenisation Data Ingest into Data Centres Data Publication into data federation Formal model documentation (ES-DOC) Files organized in collections characterized by facets defined in CVs Data center specific ingest workflow logs Connection of files with PIDs and ES-DOC E F G Versioning Errata information Processing and derived data products Long Term Archival and DOI assignment Errata documents connected with PIDs Derived data products, processing logs (input data, tool info) Author information, DOI for collections
5
IS-ENES Cases – Challenge 3
Why is a coherent, formal model based on overall provenance architecture needed? A B C D Data Generation Data Postprocessing / Homogenisation Data Ingest into Data Centres Data Publication into data federation Formal model documentation (ES-DOC) Files organized in collections characterized by facets defined in CVs Data center specific ingest workflow logs Connection of files with PIDs and ES-DOC E F G Versioning Errata information Processing and derived data products Long Term Archival and DOI assignment Errata documents connected with PIDs Derived data products, processing logs (input data, tool info) Author information, DOI for collections
6
Specific requirements for provenance?
Since each step already collects some type of provenance information, is it OK to just map them to PROV independently? PROV metadata for Generation PROV metadata for Postprocessing PROV metadata for Data Centres Ingest PROV metadata for Data Publication Formal model documentation (ES-DOC) Files organized in collections characterized by facets defined in CVs Data center specific ingest workflow logs Connection of files with PIDs and ES-DOC PROV metadata for Versioning Errata PROV metadata for Processing PROV metadata for LTArchival/DOI Errata documents connected with PIDs Derived data products, processing logs (input data, tool info) Author information, DOI for collections
7
There are commonalities on the provenance metadata
NOTE: Common in the sense of they store similar metadata, not implying that they are the same metadata PROV metadata for Postprocessing PROV metadata for Data Centres Ingest Files organized in collections characterized by facets defined in CVs Data center specific ingest workflow logs PROV metadata for Generation PROV metadata for Data Publication Formal model documentation (ES-DOC) Connection of files with PIDs and ES-DOC ES-DOC Log Metadata PIDs PROV metadata for Versioning Errata PIDs PROV metadata for LTArchival/DOI PROV metadata for Processing Errata documents connected with PIDs Author information, DOI for collections Derived data products, processing logs (input data, tool info)
8
Could we suggest a common PROV schema?
PROV metadata for Postprocessing PROV metadata for Data Centres Ingest PROV metadata for Generation PROV metadata for Data Publication Common PROV Schema PROV metadata for Versioning Errata PROV metadata for LTArchival/DOI PROV metadata for Processing
9
Could we suggest a common PROV schema?
PROV metadata for Postprocessing PROV metadata for Data Centres Ingest PROV metadata for Generation PROV metadata for Data Publication Common PROV Schema Question: Is PROV flexible enough to Map to different institutions, processes and datasets? include reference to input to build a provenance chain? create a provenance registry? PROV metadata for Versioning Errata PROV metadata for LTArchival/DOI PROV metadata for Processing Idea: Handle each case as a black box Keep the output metadata include reference to input Resolve input reference if more metadata is needed
10
In this context: What are the actual requirements of a provenance architecture?
B C D Data Generation Data Postprocessing / Homogenisation Data Ingest into Data Centres Data Publication into data federation Formal model documentation (ES-DOC) Files organized in collections characterized by facets defined in CVs Data center specific ingest workflow logs Connection of files with PIDs and ES-DOC E F G Versioning Errata information Processing and derived data products Long Term Archival and DOI assignment Errata documents connected with PIDs Derived data products, processing logs (input data, tool info) Author information, DOI for collections
11
A proposed set of requirements for provenance Architecture
Provenance metadata should be lightweight The contents of the provenance metadata should contain only the metadata corresponding to the latest state and a reference to the source. The detailed history is only needed locally and can be unpacked on request. Provenance metadata should be self contained Limit the need of external entities/systems to interpret provenance provenance data should be retained / preserved close to the place at which it is generated / relevant. Provenance metadata should be resolvable Every link in the provenance chain should point to its origin Provenance should be backwards compatible Adding or changing provenance metadata should preserve the compatibility with previous instances, not requiring them to be updated.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.