Weigel, Berger, Kindermann, Lautenschlager 17.04.2015 - EGU2015-9445Versioning for CMIP6 in the Earth System Grid Federation Data preparation Initial registration.

Slides:



Advertisements
Similar presentations
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,
Advertisements

A Very Brief Introduction to iRODS
Digital Object Identifiers for EOSDIS data HDF Workshop April 17, 2012 John Moses, ESDIS
Digital Object Identifiers for EOSDIS data ESDSWG TIWG November 2, 2011 John Moses, ESDIS
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech.
M. Stockhause et al. Martina Stockhause, Michael Lautenschlager, Frank Toussaint Deutsches Klimarechenzentrum (DKRZ) World Data Centre for Climate (WDCC)
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, S. Kindermann, M. Lautenschlager,
DataCite: Making Data Citable Jan Brase (DataCite/TIB Hannover) Brigitte Hausstein (GESIS) Wolfgang Zenk-Möltgen (GESIS)
Tobias Weigel (DKRZ) Tobias Weigel Deutsches Klimarechenzentrum (DKRZ) Persistent Identifiers Solving a number of problems through a simplistic mechanism.
January, 23, 2006 Ilkay Altintas
Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
DOI Registration for Social and Economic Data da|ra Brigitte Hausstein GESIS Leibniz-Institute for the Social Sciences, Berlin.
Digital Object Identifiers for EOSDIS data ESIP Winter Meeting Jan 6, 2011 John Moses, ESDIS
Using IESR Ann Apps MIMAS, The University of Manchester, UK.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
VAMDC use-case for the RDA Data Citation Working Group C.M. Zwölf and VAMDC consortium 6 th RDA Plenary PARIS September 2015.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
Leveraging Globus Services to Support Climate Model Data Access Through the Earth System Grid Federation (ESGF) Brian Knosp 1, Luca Cinquini 1, Lukasz.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Processing services.
Data formats and requirements in CMIP6: the climate-prediction case Pierre-Antoine Bretonnière EC-Earth meeting, Reading, May 2015.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The pan-European.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
M. Stockhause 1, G. Levavasseur 2, K. Berger 1 1 Deutsches Klimarechenzentrum (DKRZ) 2 Institute Pierre Simon Laplace (IPSL) ESGF-QCWT Quality Control.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Data Preservation.
CERN 21 January 2005Piotr Nyczyk, CERN1 R-GMA Basics and key concepts Monitoring framework for computing Grids – developed by EGEE-JRA1-UK, currently used.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager.
International Planetary Data Alliance Registry Development and Coordination Project Report 7 th IPDA Steering Committee Meeting July 13, 2012.
Approaches to Making Data Citeable Recommendations of the RDA Working Group Andreas Rauber, Ari Asmi, Dieter van Uytvanck Stefan Pröll.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
RDA WG on Dynamic Data Citation
RDA Europe: Views about PID Systems
The Components of Information Systems
Approaches and Challenges in Managing Persistent Identifiers
AP7/AP8: Long-Term Archival of CMIP6 Data
World Conference on Climate Change October 24-26, 2016 Valencia, Spain
Current and Upcoming RDA Recommendations Dr. ir. Herman Stehouwer
Data Citation Service for CMIP6 and IPCC DDC Aspects
Updates on HPC and Data management at NCI
WG Research Data Collections RDA P10 Montréal – September 2017
Data Ingestion in ENES and collaboration with RDA
Policy-Based Data Management integrated Rule Oriented Data System
PID centric fabric constructed piece by piece
The Components of Information Systems
CMIP6 / ENES Data TF Meeting: DKRZ
OpenML Workshop Eindhoven TU/e,
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
WG Research Data Collections Draft outputs of a RDA bottom-up effort P9 - April 2017 Co-chairs: Bridget Almas, Frederik Baumgardt, Tobias Weigel, Thomas.
WG Research Data Collections An overview of the recommendation
Using the RDA Collections API to Shape Humanities Data
Tech introduction.
Publishing data and metdata From iRODS to repositories
Agenda (AM) 9:30-10:15 Introduction to RDA
CMIP6 use case and adoption of RDA outputs
EUDAT Site and Service Registry
Information System Building Blocks
RDA uptake activities and plans: ESGF
Digital Object Management for ENES: Challenges and Opportunities
Leveraging PIDs for object management in data infrastructures RDA UK Node Workshop, July Tobias Weigel (DKRZ)
Microsoft Azure Data Catalog
EOSC-hub Contribution to the EOSC WGs
Presentation transcript:

Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Data preparation Initial registration Version updates End-user tools Versioning for CMIP6 in the Earth System Grid Federation EUDAT2 perspective EUDAT2 perspective Policies Introduction Tobias Weigel, Katharina Berger, Stephan Kindermann, Michael Lautenschlager German Climate Computing Center (DKRZ) Prototype impression Prototype impression... and PIDs! Cliparts from References This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit

Weigel, Berger, Kindermann, Lautenschlager Motivation  No common ESGF approach to versioning, unclear processes  Demonstrate usefulness of wide-scale low-level PID usage within operational e-infrastructure  Controlled versioning at this scale will be new for CMIP EGU Versioning for CMIP6 in the Earth System Grid Federation

Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent Policies CMOR DataCite DOI assignment process

Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation What is required?  Technical development (esgf publisher)  Agreement on pioneering nodes  Definition of policies to be enforced  DKRZ Handle service and future coordination Until end of Home Policies

Weigel, Berger, Kindermann, Lautenschlager Essential versioning policies  Versioning can only be trustworthy if everyone adheres to the policies.  Enforce use of ESGF tools as opposed to unmonitored changes in the file system  Unified version numbers: YYYYMMDDxx  recommended for all future projects using ESGF  mandatory if automated version managament is to be used EGU Versioning for CMIP6 in the Earth System Grid Federation Policies Home

Weigel, Berger, Kindermann, Lautenschlager Prototype impression EGU Versioning for CMIP6 in the Earth System Grid Federation Home PID := prefix+tracking_id Dataset PID File PID What happens when clicking on a PID? What happens when clicking on a PID?

Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent CMOR Data preparation DataCite DOI assignment process

Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent Raw files Write PID in netcdf header Register PIDsFiles visible Register additional PIDs for aggregates e.g. using CMOR; also determine version number Configuration of concrete PID syntax according to common agreements (e.g. within CMIP6) Handle Server Data Node provider using PID tools to be provided by EUDAT ESGF publishing process Add replica locations to PID records It is also possible to let data owners add additional locations through a dedicated service (e.g. provided by EUDAT) Data provider / modelling center Home What‘s in a PID record? What‘s in a PID record? CMOR Data preparation

Weigel, Berger, Kindermann, Lautenschlager Example PID records (DWD obs4MIPs prototype) KeyValue URLhttp://bmbf-ipcc-ar5.dkrz.de/thredds/esgcet/3/obs4MIPs.FUB-DWD.SSMI-MERIS.mon.v html DRS nameobs4MIPs/observations/FUB-DWD/Obs-SSMI-MERIS/obs/mon/atmos/prw Publication date Version number Children["10876/ESGF/a9b1bfbc-4b ed6-6b586bf1be02",... ] Dataset: 10876/ESGF/4ee9d37b bf-b3ef-e738b2ecedb4 KeyValue URLhttp://bmbf-ipcc-ar5.dkrz.de/thredds/fileServer/obs4MIPs/observations/FUB-DWD/Obs-SSMI- MERIS/obs/mon/atmos/prw/prwErr_SSMI-MERIS_L3_v1-00_ nc DRS nameprwErr_SSMI-MERIS_L3_v1-00_ nc Publication date Checksum (MD5)F49ee38e24e819b5d04c534f6ed7b375 Size Parent10876/ESGF/4ee9d37b bf-b3ef-e738b2ecedb4 File: 10876/ESGF/a9b1bfbc-4b ed6-6b586bf1be EGU Versioning for CMIP6 in the Earth System Grid Federation Back

Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent Initial registration CMOR DataCite DOI assignment process

Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent Raw files Write PID in netcdf header Register PIDsFiles visible Register additional PIDs for aggregates e.g. using CMOR; also determine version number Configuration of concrete PID syntax according to common agreements (e.g. within CMIP6) Handle Server Data Node provider using PID tools to be provided by EUDAT ESGF publishing process Add replica locations to PID records It is also possible to let data owners add additional locations through a dedicated service (e.g. provided by EUDAT) Data provider / modelling center Initial registration Home What‘s in a PID record? What‘s in a PID record? CMOR

Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent Version updates CMOR DataCite DOI assignment process

Weigel, Berger, Kindermann, Lautenschlager.nc CMOR EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent Version updates PID in headers; version number defined esg publish: auto-detect new versions of registered old files Register PIDsFiles visible Assemble aggregates from old and new PIDs Handle Server On updates, the initial publication process is largely repeated, but the publisher detects the existing files and arranges old and new files in a collection accordingly. Home

Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent End-user tools Node tools CMOR DataCite DOI assignment process

Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent PID quality management Automated PID verification service Other external trigger factors Issue manager Determine action Mark PID as tombstone; provide tombstone record info Update PID record with new location … based on additional knowledge to be acquired Possibly include reference to new version/replacement Parts of this process should be supported by EUDAT/ESGF tools to make it more scalable and reduce current manual effort Handle Server Node tools

Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent End-user tools hdl:10876/ESGF-2b8e6aef d9-9eda-d10d3c2befce ? Aggregate? Singleton? Tombstone? singleton aggregate tombstone Individual information page services provided by ESGF based on EUDAT tools Handle Server Basic PID resolution Web landing pages could offer: data download, versioning information, replication information,... File PID Dataset PID

Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Under development ESGF publisher: publish unpublish update metadata ESGF publisher: publish unpublish update metadata esgscan tool ESGF DB ESGF web GUI (via cog) ESGF web GUI (via cog) ESGF user collection builder Stand-alone PID web tools Future options (>2016) End-user CLI tools Automated verification service.nc Collections API Handle System PIT API Handle API v8 (REST) Type Registry register populate query/search query create collection read, check conformance query query additional information (if available) solr Generic / community- independent End-user tools Possible command line tools: wget for PID‘ed data with smooth authentication info tool get latest version... all Python-based! Possible web tools: Generic viewer across communities (PIT use case) Provenance tracing tool Home CMOR

Weigel, Berger, Kindermann, Lautenschlager Envisioned EUDAT2 PID services architecture B2* Services EUDAT PID service (epicclient.py) PID system base services (CRUD, distribution) Advanced PID services (viewer, reverse lookup, queueing system,...) Verification tools and services Future EPIC service concept? (focus however on organizational aspects/QA) Operational tools (monitoring, siteinfo,...) HSv8 native REST Solr indexing servlet Reverse-lookup servlet Apache solr Relational DB (*SQL) Handle System 8 with embedded Jetty Mass management tools Collection service (lapis / Collection WG) ? Home EGU Versioning for CMIP6 in the Earth System Grid Federation

Weigel, Berger, Kindermann, Lautenschlager Index  Home  Motivation  Architecture overview  Requirements  Policies  Prototype impression  Data preparation  Modelling center perspective  Example PID records  Initial registration  Data node perspective EGU Versioning for CMIP6 in the Earth System Grid Federation  Version updates  Version update process  End-user tools  PID quality management  Basic PID resolution  Possible CLI and web tools  EUDAT2 perspective  References

Weigel, Berger, Kindermann, Lautenschlager References  Meehl, Moss, Tayor, Eyring, Stouffer, Bony, Stevens (2014): Climate Model Intercomparisons: Preparing for the Next Phase. EOS Trans. AGU, Vol. 9, No. 9. doi: /2014eo  Weigel, Lautenschlager, Toussaint, Kindermann (2013): A Framework for Extended Persistent Identification of Scientific Assets. Data Science Journal, Vol. 12. doi: /dsj  Weigel, Kindermann, Lautenschlager (2013): Actionable Persistent Identifier Collections. Data Science Journal, Vol. 12. doi: /dsj  Weigel, DiLauro, Zastrow: RDA Recommendation: PID Information Types. Under review.   EGU Versioning for CMIP6 in the Earth System Grid Federation