Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Data preparation Initial registration Version updates End-user tools Versioning for CMIP6 in the Earth System Grid Federation EUDAT2 perspective EUDAT2 perspective Policies Introduction Tobias Weigel, Katharina Berger, Stephan Kindermann, Michael Lautenschlager German Climate Computing Center (DKRZ) Prototype impression Prototype impression... and PIDs! Cliparts from References This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit
Weigel, Berger, Kindermann, Lautenschlager Motivation No common ESGF approach to versioning, unclear processes Demonstrate usefulness of wide-scale low-level PID usage within operational e-infrastructure Controlled versioning at this scale will be new for CMIP EGU Versioning for CMIP6 in the Earth System Grid Federation
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent Policies CMOR DataCite DOI assignment process
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation What is required? Technical development (esgf publisher) Agreement on pioneering nodes Definition of policies to be enforced DKRZ Handle service and future coordination Until end of Home Policies
Weigel, Berger, Kindermann, Lautenschlager Essential versioning policies Versioning can only be trustworthy if everyone adheres to the policies. Enforce use of ESGF tools as opposed to unmonitored changes in the file system Unified version numbers: YYYYMMDDxx recommended for all future projects using ESGF mandatory if automated version managament is to be used EGU Versioning for CMIP6 in the Earth System Grid Federation Policies Home
Weigel, Berger, Kindermann, Lautenschlager Prototype impression EGU Versioning for CMIP6 in the Earth System Grid Federation Home PID := prefix+tracking_id Dataset PID File PID What happens when clicking on a PID? What happens when clicking on a PID?
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent CMOR Data preparation DataCite DOI assignment process
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent Raw files Write PID in netcdf header Register PIDsFiles visible Register additional PIDs for aggregates e.g. using CMOR; also determine version number Configuration of concrete PID syntax according to common agreements (e.g. within CMIP6) Handle Server Data Node provider using PID tools to be provided by EUDAT ESGF publishing process Add replica locations to PID records It is also possible to let data owners add additional locations through a dedicated service (e.g. provided by EUDAT) Data provider / modelling center Home What‘s in a PID record? What‘s in a PID record? CMOR Data preparation
Weigel, Berger, Kindermann, Lautenschlager Example PID records (DWD obs4MIPs prototype) KeyValue URLhttp://bmbf-ipcc-ar5.dkrz.de/thredds/esgcet/3/obs4MIPs.FUB-DWD.SSMI-MERIS.mon.v html DRS nameobs4MIPs/observations/FUB-DWD/Obs-SSMI-MERIS/obs/mon/atmos/prw Publication date Version number Children["10876/ESGF/a9b1bfbc-4b ed6-6b586bf1be02",... ] Dataset: 10876/ESGF/4ee9d37b bf-b3ef-e738b2ecedb4 KeyValue URLhttp://bmbf-ipcc-ar5.dkrz.de/thredds/fileServer/obs4MIPs/observations/FUB-DWD/Obs-SSMI- MERIS/obs/mon/atmos/prw/prwErr_SSMI-MERIS_L3_v1-00_ nc DRS nameprwErr_SSMI-MERIS_L3_v1-00_ nc Publication date Checksum (MD5)F49ee38e24e819b5d04c534f6ed7b375 Size Parent10876/ESGF/4ee9d37b bf-b3ef-e738b2ecedb4 File: 10876/ESGF/a9b1bfbc-4b ed6-6b586bf1be EGU Versioning for CMIP6 in the Earth System Grid Federation Back
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent Initial registration CMOR DataCite DOI assignment process
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent Raw files Write PID in netcdf header Register PIDsFiles visible Register additional PIDs for aggregates e.g. using CMOR; also determine version number Configuration of concrete PID syntax according to common agreements (e.g. within CMIP6) Handle Server Data Node provider using PID tools to be provided by EUDAT ESGF publishing process Add replica locations to PID records It is also possible to let data owners add additional locations through a dedicated service (e.g. provided by EUDAT) Data provider / modelling center Initial registration Home What‘s in a PID record? What‘s in a PID record? CMOR
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent Version updates CMOR DataCite DOI assignment process
Weigel, Berger, Kindermann, Lautenschlager.nc CMOR EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent Version updates PID in headers; version number defined esg publish: auto-detect new versions of registered old files Register PIDsFiles visible Assemble aggregates from old and new PIDs Handle Server On updates, the initial publication process is largely repeated, but the publisher detects the existing files and arranges old and new files in a collection accordingly. Home
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent End-user tools Node tools CMOR DataCite DOI assignment process
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent PID quality management Automated PID verification service Other external trigger factors Issue manager Determine action Mark PID as tombstone; provide tombstone record info Update PID record with new location … based on additional knowledge to be acquired Possibly include reference to new version/replacement Parts of this process should be supported by EUDAT/ESGF tools to make it more scalable and reduce current manual effort Handle Server Node tools
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent End-user tools hdl:10876/ESGF-2b8e6aef d9-9eda-d10d3c2befce ? Aggregate? Singleton? Tombstone? singleton aggregate tombstone Individual information page services provided by ESGF based on EUDAT tools Handle Server Basic PID resolution Web landing pages could offer: data download, versioning information, replication information,... File PID Dataset PID
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent End-user tools Possible command line tools: wget for PID‘ed data with smooth authentication info tool get latest version... all Python-based! Possible web tools: Generic viewer across communities (PIT use case) Provenance tracing tool Home CMOR
Weigel, Berger, Kindermann, Lautenschlager Envisioned EUDAT2 PID services architecture B2* Services EUDAT PID service (epicclient.py) PID system base services (CRUD, distribution) Advanced PID services (viewer, reverse lookup, queueing system,...) Verification tools and services Future EPIC service concept? (focus however on organizational aspects/QA) Operational tools (monitoring, siteinfo,...) HSv8 native REST Solr indexing servlet Reverse-lookup servlet Apache solr Relational DB (*SQL) Handle System 8 with embedded Jetty Mass management tools Collection service (lapis / Collection WG) ? Home EGU Versioning for CMIP6 in the Earth System Grid Federation
Weigel, Berger, Kindermann, Lautenschlager Index Home Motivation Architecture overview Requirements Policies Prototype impression Data preparation Modelling center perspective Example PID records Initial registration Data node perspective EGU Versioning for CMIP6 in the Earth System Grid Federation Version updates Version update process End-user tools PID quality management Basic PID resolution Possible CLI and web tools EUDAT2 perspective References
Weigel, Berger, Kindermann, Lautenschlager References Meehl, Moss, Tayor, Eyring, Stouffer, Bony, Stevens (2014): Climate Model Intercomparisons: Preparing for the Next Phase. EOS Trans. AGU, Vol. 9, No. 9. doi: /2014eo Weigel, Lautenschlager, Toussaint, Kindermann (2013): A Framework for Extended Persistent Identification of Scientific Assets. Data Science Journal, Vol. 12. doi: /dsj Weigel, Kindermann, Lautenschlager (2013): Actionable Persistent Identifier Collections. Data Science Journal, Vol. 12. doi: /dsj Weigel, DiLauro, Zastrow: RDA Recommendation: PID Information Types. Under review. EGU Versioning for CMIP6 in the Earth System Grid Federation