Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data.

Slides:



Advertisements
Similar presentations
Std-doi Publication of Climate Data at WDCC DataCite Summer Meeting 7./8. June 2010 Publication of climate data Heinke Höck World Data Center for Climate.
Advertisements

Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,
1 Archiving Workflow between a Local Repository and the National Library Archive Experiences from the DiVA Project Eva Müller, Peter Hansson, Uwe Klosa,
M.Lautenschlager (WDCC/MPI-M) / / 1 The CEOP Model Data Archive at the World Data Center for Climate as part of the CEOP Data Network CEOP / IGWCO.
CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten zmaw.de NCAR, October 27th – 29th, 2008.
1 CS 502: Computing Methods for Digital Libraries Lecture 22 Repositories.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
M. Stockhause et al. Martina Stockhause, Michael Lautenschlager, Frank Toussaint Deutsches Klimarechenzentrum (DKRZ) World Data Centre for Climate (WDCC)
M.Lautenschlager (WDCC / MPI-M) / / 1 WS Spatiotemporal Databases for Geosciences, Biomedical sciences and Physical sciences Edinburgh, November.
German Cluster of WDCs for Earth System Research - Entwurf - Michael Lautenschlager 1, Michael Diepenbroek 2, Hannes Grobe 2, Michael Bittner 3, Jens Klump.
M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin World Data Center Cluster.
Review on 5 Years DataCite and 10 Years DOI Registration for Data DataCite Annual Conference 2014 Nancy, August 25th – 26th Michael Lautenschlager (DKRZ.
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, S. Kindermann, M. Lautenschlager,
M.Lautenschlager (WDCC / MPI-M) / / 1 GO-ESSP at LLNL Livermore, June 19th – 21st, 2006 World Data Center Climate: Status and Portal Integration.
INFSO-RI Enabling Grids for E-sciencE Intelligent Distributed Data Management in Earth system science K. Ronneberger, DKRZ, Germany.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Persistent Identifiers Reinhard.
M.Lautenschlager (WDCC / MPI-M) / / 1 AGU Fall Meeting, San Francisco, December 2005 Michael Lautenschlager - WDC Climate (Max-Planck-Institut.
M. Lautenschlager (M&D/MPIM)1 The CERA Database Michael Lautenschlager Modelle und Daten Max-Planck-Institut für Meteorologie Workshop "Definition.
Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
Johannes Spitzbart Phonogrammarchiv, Austrian Academy of Sciences Österreichische Tage der Digitalen Geisteswissenschaften save the data - workshop on.
F. Toussaint (WDCC, Hamburg) / / 1 CERA : Data Structure and User Interface Frank Toussaint Michael Lautenschlager World Data Center for Climate.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
World Data Center for Marine Environmental Sciences.
Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute for Meteorology German Climate Computing Centre (DKRZ)
Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
Publication and Citation of Scientific Primary Data at WDC Climate (WDCC ) Michael Lautenschlager (WDCC) Heinke Höck (WDCC) Jan Brase (TIB) Susanne Waszkewitz.
Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Wolfgang.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
M.Lautenschlager (WDCC, Hamburg) / / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
M.Lautenschlager (WDCC, Hamburg) / / 1 ICSU World Data Center For Climate Semantic Data Management for Organising Terabyte Data Archives Michael.
Semantic linking of data and journal publications in the STD-DOI project Jens Klump and STD-DOI Team European GeoInformatics Workshop Edinburgh, 7 March.
International Data Exchange Workshop, Kiel, PANGAEA Publishing Network for Geoscientific & Environmental Data.
| Ingest Levels and Persistent Identification | October Ingest Levels and Persistent Identification Services for R & D and heritage organisations.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
The CERA2 Data Base Data input – Data output Hans Luthardt Model & Data/MPI-M, Hamburg Services and Facilities of DKRZ and Model & Data Hamburg,
Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,
H. Thiemann (M&D) / / 1 Hannes Thiemann M&D Statusseminar, 22. April 2004.
IPCC TGICA and IPCC DDC for AR5 Data GO-ESSP Meeting, Seattle, Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute.
INFSO-RI Enabling Grids for E-sciencE A service oriented framework to create, manage and update metadata for earth system science.
The Repository of the World Data Centre for Climate Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie Repositories in Research.
INFSO-RI Enabling Grids for E-sciencE Intelligent Distributed Data Management in Earth System Science S. Kindermann, DKRZ, Germany.
WP6/SA2: Access to IS-ENES Data Federation SA2 is a European distributed data infrastructure providing access to data from ESM simulations produced in.
Lautenschlager + Thiemann (M&D/MPI-M) / / 1 Introduction Course 2006 Services and Facilities of DKRZ and M&D Integrating Model and Data Infrastructure.
Create XML from a template Browse available records WDCC Metadata Generation with GeoNetwork Hans Ramthun, Michael Lautenschlager, Hans-Hermann Winter.
The Research Data Archive at NCAR: A System Designed to Handle Diverse Datasets Bob Dattore and Steven Worley National Center for Atmospheric Research.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
IPCC WG II + III Requirements for AR5 Data Management GO-ESSP Meeting, Paris, Michael Lautenschlager, Hans Luthardt World Data Center Climate.
Data Discovery and Access to The International Surface Pressure Databank (ISPD) 1 Thomas Cram Gilbert P. Compo* Doug Schuster Chesley McColl* Steven Worley.
Hannes Thiemann Michael Lautenschlager Deutsches Klimarechenzentrum GmbH, Germany EGU 2010.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
M. Lautenschlager (M&D/MPIM)1 WDC on Climate as Part of the CERA 1 Database System Michael Lautenschlager Modelle und Daten Max-Planck-Institut.
Open Access data at VLIZ Experience in retrieving data from EMODnet “Data ingestion, archiving, citation and DOI” June 26, 2014.
CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Preservation Functionality in a Digital Archive Erik Oltmans Koninklijke Bibliotheek Raymond J. van Diessen IBM Business Consulting Services Hilde van.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
2005 – 06 – - ESSP1 WDC Climate : Web Access to Metadata and Data Frank Toussaint World Data Center for Climate (M&D/MPI-Met, Hamburg)
AP7/AP8: Long-Term Archival of CMIP6 Data
World Conference on Climate Change October 24-26, 2016 Valencia, Spain
DIAS & DIAS data release 2 years DIAS-GCI Cooperation Hiroko KINUTANI DIAS (Data Integration and Analysis System in Japan) , St. Petersburg.
Data Citation Service for CMIP6 and IPCC DDC Aspects
VI-SEEM Data Repository
VI-SEEM Data Repository
WGISS Connected Data Assets Oct 24, 2018 Yonsook Enloe
Presentation transcript:

Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data Management Workshop (Köln, )

DKRZ: Earth system model development Simulations of past, present and future climate WDC Climate: Long-term data archiving Inter-disciplinary data dissemination Structure 2009

Diagram of Climate System

Diagram of the Hamburg IPCC- Climate Model ECHAM5/MPI-OM

Forcing of Climate Projetions for IPCC AR4

Near surface temperature change for the scenarios A1B und B1. Presented is the difference of the 30-year-means minus

Comparison of the present-day sea ice cover In March and September (oben) with the climate projection for the scenario A1B (unten) in Additionally the snow over land can be obtained.

HLRE-II Architecture ( blizzard: /work /pf /scratch /work /pf /scratch tape:/hpss/arch /hpss/doku /dxul/ut /dxul/utf /dxul/utd tape:/hpss/arch /hpss/doku /dxul/ut /dxul/utf /dxul/utd xtape: ssh blizzard (sftp xtape.dkrz.de) „get /hpss/arch/ / “ pftp HPSS (10 Pbyte /a ) HPSS (10 Pbyte /a ) GPFS (3 Pbyte) GPFS (3 Pbyte) IBM Power6 2 x Login 250 x Compute 150 TFlops peak IBM Power6 2 x Login 250 x Compute 150 TFlops peak StorageTek Silos Total Capacity: Tapes Approx. 60 PB (LTO and Titan)

Data production on IBM-P6: 50 PB/year Limit for mass storage archive (HPSS): 10 PB/year Scientific project data archive with expiration date Limit long-term data archive (WDCC): 1 PB/year Required is a complete data catalogue entry in WDCC (metadata) Decision procedure for long-term archive transition is not finally implemented (data storage policy). Accessible via WDCC infrastructure Searchable data catalogue (GUI) Field-based and file-based data access (Internet) Storage time period: at least 10 years (no expiration date) Development of data archive at DKRZ (German Climate Computing Centre)

Development of mass storage archive Oct Mid of 2009: 10 PB

Data documentation requirements are accomplished by using the WDCC infrastruture CERA-2 metadata model developed in 1999  Catalogue interface: cera.wdc-climate.de  Input interface: input.wdc-climate.de CERA-2 metadata content is complete with respect to browse, to discover and to use climate data which are stored in the database system or outside in flat files The WDCC matches international description standards like ISO 19115, Dublin Core or GCMD and is integrated in international data federations Data storage structure assembles field-based storage of climate time series per variable in database tables. This allows for web-based data catalogue search and data access in small data granules.

CERA Data Model Entry Reference Status Distribution Contact Coverage Parameter Spatial Reference Local Adm. Data Access Data Org

Coloured columns correspond to BLOB data tables in WDCC. Collections of matrix rows represents storage in model raw data files (complete model output storage time step by storage time step).

WDCC Developement Future annual growth rate: 1 PB / year

2008 WDCC Users (authorised for data download)

WDCC Data Downloads in 2008

WDCC / CERA: General Statistics at :00:10 Database Size (TByte): 404 Number of blobs: (8.2 billion) Number of experiments: 1378 Number of datasets: Total size divided by number of BLOBs gives the average size of data access granules: 50 kB/BLOB (field-based data access)

WDCC Content ERA40 IPCC CEOP BALTEX HOAPS CARIBIC WOCE ERA15/40 NCEP GEBCO COSMOS MPI, GKSS,… Data from Earth System Modelling and Related Observations EH5/MPI-OM IPCC-AR4 Regional Climate Scenarios IPCC-AR4 (CCLM + REMO)

Oracle BLOB-DB: data access via http and Java-API

WDCC Catalogue search and data access interface (URL: cera.wdc-climate.de) Access to 97 model experiments

WDCC Project-based Data Access (IPCC AR4 Hamburg, Results from Introduction)

WDCC major accomplishments Offering many TB of data by a standard web-browser interface and a Java API for direct data download. Entering the interdisciplinary e-science environment by the primary data publication service. Independent data entities of more general interest are placed in library catalogues in order to make them searchable with and citable in classical scientific literature WDCC has more than 50 data entities registered in TIBORDER which are connected to appr. 1.5 TB data volume. Networking with other topic related WDCs and long-term data archives. German WDC Cluster Earth System Research (WDC MARE, WDC RSAT and WDCC) Data sharing with British Atmospheric Data Centre (BADC) Offering data management services to scientific research projects for long-term archiving and dissemination of research results

Primary data publication service Following the STD-DOI concept (Scientific and Technical Data – Digital Object Identifier, URL: Important aspects of the publication process are  The identification of independent data entities which are suitable for publication at the level of scientific literature,  The execution of an elaborated review process for metadata and climate data (quality control),  The assigment of additional metadata for electronic publication (ISO 690-2) and of persistent identifiers (DOI / URN) and  The integration of publication metadata and persistent identifiers into the TIB-Order library catalogue (German National Library of Science and Technology, Hannover) so that primary data entities are searchable and citable together with scientific literature.  Quality characteristic is presently “approved by author”, could be “peer reviewed” with ESSD (Earth System Science Data Journal).  Published data entities cannot be modified any longer.  They are freely available via Internet..

STD-DOI data publication workflow

TIB WDCC

Data infrastructure integrates data stewardship in the long-term archive Bit-stream preservation Quality assurance Usability enabling

Long-term archive data stewardship Bit-stream preservation Secondary tape copies on different tapes and technology at separate location Copy to new tapes after maximum number of tape accesses are reached (Refreshment) Quality assurance Semantic examinations: behavior of a numerical model compared to observations and to other models, part of the scientific evaluation process Syntactic examinations: formal aspects of data archiving and ensurance that data archiving is free of errors as far as possible  Consitency between metadata and climate data  Completeness of climate data  Standard range of values  Spatial and temporal data arrangement

Long-term archive data stewardship (continued) Usability enabling Complete and searchable documenation of climate data entities (database tables and flat files) in the catalogue system of the WDCC WDCC offers web-based data access to small data granules (individual entries in BLOB DB tables) Archive technology transfer must be downward compatible to keep old data technically readable Data processing tools and data format access libraries must be migrated to new architectures

Summary long-term archiving services at WDCC/DKRZ: Long-term data storage at WDCC/DKRZ is thematically focused to Earth system research (modeling and related observations) WDCC provides a fully documented data archive including a web- based searchable data catalogue and web-based data access WDCC supports field-based data access including server side data processing (extraction of geographical regions and single time steps, format conversion) WDCC is integrated in national (WDC-Cluster Germany, C3-Grid) and international data federations (IPCC AR5). WDCC/DKRZ offer within the existing infrastructure long-term data storage for topic related external data entities at net cost basis.