Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,

Slides:



Advertisements
Similar presentations
How to Set Up a System for Teaching Files, Conferences, and Clinical Trials Medical Imaging Resource Center.
Advertisements

Std-doi Publication of Climate Data at WDCC DataCite Summer Meeting 7./8. June 2010 Publication of climate data Heinke Höck World Data Center for Climate.
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
V-GISC Presentation – ET_WISC – Geneva - February v-GISC key functionalities ET_WISC meeting 2-5 February 2010 Jean-Pierre Aubagnac, Jacques Roumilhac.
GTS MetaData Generation data GTS data bases GTS Switch Volume C1 Central Support Office Information Classes white-list Metadata Synchronization.
DOIs for Tracking and Citing Scientific Data J. Klump, J. Wächter and M. Lautenschlager CODATA Conference 2006 Beijing, PR China.
Geospatial One-Stop A Federal Gateway to Federal, State & Local Geographic Data
Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data.
M.Lautenschlager (WDCC/MPI-M) / / 1 The CEOP Model Data Archive at the World Data Center for Climate as part of the CEOP Data Network CEOP / IGWCO.
CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten zmaw.de NCAR, October 27th – 29th, 2008.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
M. Stockhause et al. Martina Stockhause, Michael Lautenschlager, Frank Toussaint Deutsches Klimarechenzentrum (DKRZ) World Data Centre for Climate (WDCC)
M.Lautenschlager (WDCC / MPI-M) / / 1 WS Spatiotemporal Databases for Geosciences, Biomedical sciences and Physical sciences Edinburgh, November.
German Cluster of WDCs for Earth System Research - Entwurf - Michael Lautenschlager 1, Michael Diepenbroek 2, Hannes Grobe 2, Michael Bittner 3, Jens Klump.
Review on 5 Years DataCite and 10 Years DOI Registration for Data DataCite Annual Conference 2014 Nancy, August 25th – 26th Michael Lautenschlager (DKRZ.
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, S. Kindermann, M. Lautenschlager,
M.Lautenschlager (WDCC / MPI-M) / / 1 GO-ESSP at LLNL Livermore, June 19th – 21st, 2006 World Data Center Climate: Status and Portal Integration.
INFSO-RI Enabling Grids for E-sciencE Intelligent Distributed Data Management in Earth system science K. Ronneberger, DKRZ, Germany.
M. Lautenschlager (M&D/MPIM)1 The CERA Database Michael Lautenschlager Modelle und Daten Max-Planck-Institut für Meteorologie Workshop "Definition.
Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
Hussein Suleman University of Cape Town Department of Computer Science Advanced Information Management Laboratory High Performance.
CIM – The Common Information Model in Climate Research
Metadata Concepts / Use in Climate Research Stephan Kindermann, Martina Stockhause German Climate Computing Center (DKRZ) Hamburg, Germany.
F. Toussaint (WDCC, Hamburg) / / 1 CERA : Data Structure and User Interface Frank Toussaint Michael Lautenschlager World Data Center for Climate.
Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute for Meteorology German Climate Computing Centre (DKRZ)
Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Wolfgang.
M.Lautenschlager (WDCC, Hamburg) / / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure.
VO Sandpit, November 2009 CEDA Metadata Steve Donegan/Sam Pepler.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
M.Lautenschlager (WDCC, Hamburg) / / 1 ICSU World Data Center For Climate Semantic Data Management for Organising Terabyte Data Archives Michael.
20-May-2003HEPiX Amsterdam EDG Fabric Management on Solaris G. Cancio Melia, L. Cons, Ph. Defert, I. Reguero, J. Pelegrin, P. Poznanski, C. Ungil Presented.
IODE Ocean Data Portal - ODP  The objective of the IODE Ocean Data Portal (ODP) is to facilitate and promote the exchange and dissemination of marine.
The CERA2 Data Base Data input – Data output Hans Luthardt Model & Data/MPI-M, Hamburg Services and Facilities of DKRZ and Model & Data Hamburg,
- Vendredi 27 mars PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL.
Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,
H. Thiemann (M&D) / / 1 Hannes Thiemann M&D Statusseminar, 22. April 2004.
The Repository of the World Data Centre for Climate Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie Repositories in Research.
INFSO-RI Enabling Grids for E-sciencE Intelligent Distributed Data Management in Earth System Science S. Kindermann, DKRZ, Germany.
PSI Meta Data meeting, Toulouse - 15 November The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
WP6/SA2: Access to IS-ENES Data Federation SA2 is a European distributed data infrastructure providing access to data from ESM simulations produced in.
Lautenschlager + Thiemann (M&D/MPI-M) / / 1 Introduction Course 2006 Services and Facilities of DKRZ and M&D Integrating Model and Data Infrastructure.
1 Overall Architectural Design of the Earth System Grid.
Adrian Janson, Melbourne High School Information Systems, Data and Information, The IPC and Organisations For VCE Software Development ¾, 2007.
Create XML from a template Browse available records WDCC Metadata Generation with GeoNetwork Hans Ramthun, Michael Lautenschlager, Hans-Hermann Winter.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
IPCC WG II + III Requirements for AR5 Data Management GO-ESSP Meeting, Paris, Michael Lautenschlager, Hans Luthardt World Data Center Climate.
Hannes Thiemann Michael Lautenschlager Deutsches Klimarechenzentrum GmbH, Germany EGU 2010.
 A content management system ( CMS ) is a system providing a collection of procedures used to manage work flow in a collaborative environment. These.
M. Lautenschlager (M&D/MPIM)1 WDC on Climate as Part of the CERA 1 Database System Michael Lautenschlager Modelle und Daten Max-Planck-Institut.
What was done for AR4. Software developed for ESG was modified for CMIP3 (IPCC AR4) Prerelease ESG version 1.0 Modified data search Advance search Pydap.
1 Collaboration for Beijing and Tokyo GISC prototypes Akira Nakamori JMA ET-WISC-III Jun.2008.
CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Data preparation Initial registration.
2005 – 06 – - ESSP1 WDC Climate : Web Access to Metadata and Data Frank Toussaint World Data Center for Climate (M&D/MPI-Met, Hamburg)
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
Approaches and Challenges in Managing Persistent Identifiers
AP7/AP8: Long-Term Archival of CMIP6 Data
World Conference on Climate Change October 24-26, 2016 Valencia, Spain
Data Citation Service for CMIP6 and IPCC DDC Aspects
CMIP6 / ENES Data TF Meeting: DKRZ
SCALABLE OPEN ACCESS Hussein Suleman
Title Month Year Chris Patel EMC Centera Strategic Alliance Manager
Presentation transcript:

Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt, H.Ramthun, M. Stockhause, H. Thiemann World Data Centre for Climate at the German Climate Computing Centre (DKRZ) Hamburg, Germany

Overview The WDC for Climate in several collaborations Data Storage: Technology – Tapes and Disks Data Storage: LObStER – the Tape Storage Tool Storage Policy Long Term Archiving DOI - Digital Object Identifier

WDCC: General Layout

The World Data Centre for Climate The German Climate Computing Centre (DKRZ) is held by… Max Planck Society, University of Hamburg, and others. Mission: Provide HP computing power and storage for the German Earth Science community

The WDCC as WIS Data Collection & Production Centre WMO Information System (WIS) National Centres Global Information System Centres Data Collection and Production Centres Components of the WIS NC National Centres GISC Global Information System Centres (12..14) DCPC Data Collection and Production Centres (>100) Some institutes are more than one of these (NC, GISC, DCPC).

The WDCC in the ICSU World Data System International Council for Science (ICSU) World Data System (WDS) World Data Centres (WDC) WDC Cluster Earth System Research: WDC-Mare, WDC-RSAT, WDC-Climate

Replicated model output CMIP5 Data Nodes Replicated model output CMIP5/IPCC-AR5 PCMDI, BADC, & WDCC form a data federation About 1 PB Data are replicated UK: BADC ~ 1 PByte HD US: PCMDI: ~1 PByte HD CMIP5/IPCC Data Federation DE: WDCC ~1 PByte HD 7 7

Evolution of Data Quantities Climate Model Data: Relative homogeneous but huge amounts! Needed: Tape access (nearline)

TAPE STORAGE: Hardware Basis

1 Petabyte disks, 9 PB tapes Web access to 500 Terabytes Hardware @ WDCC 1 Petabyte disks, 9 PB tapes Web access to 500 Terabytes

Data Flows CERA Midtier Storage@DKRZ TDS Archive: files Appl. Server Container: Blobs Appl. Server Storage@DKRZ TDS LobServer HPSS 9 PB CERA DB Layer What Where Who When How Midtier

LOBSTER: DtaStreams & ContainerFmt

LObStER: Large Object Storage and Efficient Retrieval Huge amounts of data in each container file Very different sizes of records: 64b .. 2 Gb Efficient administration of all records Irregular access patterns (access latency independent of the record position) Transactional behaviour for read/write Fault tolerance for HD, controller, tapes, etc

Lobster configuration manager specific JDBC- drivers loaded Application generic JDBC-driver  Lobster configuration manager  Application Applic. Server (lks) Intranet Internet specific JDBC- drivers loaded

Lobster object manager Oracle RDB (or other)‏ Cache Lobster object manager show-container read-record fetch-records

LObStER: The Data Containers Container files with blocked format 64-bit files and 64-bit internal position referencing Max file size: 16384 PBytes Entries stored in ≥1 blocks Block sizes 2k, k ∈ { 8, 9, 10, …, 62 }

indirect-pointer-block LObStER: The Data Containers header-blocks direct-pointer-blocks indirect-pointer-block data-blocks

MD: CERA & Catalogues

Insert/Update on views/tables Metadata input Input.wdc-climate.de XML templates validate Experiment Dataset Dataset_group Additional_info.. upload (ftp, http, WebDAV) CERA2 XML Repository Cera2_temp xmlload Split xml files Insert/Update on views/tables xsl Tools Editors CDO Ncdump/ncgen ESG Publisher.. GeoNetwork xmlspy,Oxygen Attarabi xforms …

LTA: Storage Policy

Long Term Archiving Several steps: specification & concept filling of metadata & data quality checks & DOI LTA for, e.g., EUCLIPSE, MedCLIVAR, combine

Storage Concept for Projects Tape space distribution to archive classes at DKRZ part of the “work” space on tape because GFS too small “docu” domain consists of WDCC no expiration dates in “arch” domain parts of “arch” domain belong to “docu” but not yet documented

LTA Costs depend on complexity and efforts at our site: metadata reformatting etc

Long Term Archiving Quality Checks on three levels QC L1: conformity to general standards (format, ...) QC L2: coarse automated content checks QC L3: detailed spot checks: TQA – Technical Quality Assurance SQA – Scientific Quality Assurance

QC: Example CMIP5

LTA: CMIP5 as an Example of a Federated Activity Distributed QC Level2 Checks at Multiple Sites Central QC Repository Central QC Level3 Checks DOI Publication Agency Long-Term Archive QC services QC Service Layer QC services QC Service Layer Project QC Metadata Repository QC L3 Tools QC L2 Tool SQA GUI

Data Long Term Archive (LTA) LTA: CMIP5 as an Example Data Nodes IDF Data Catalogue MD Input DOI Catalogue Data Quality Control MD on model & simulation MD on data MD on quality Project MD Repository Registration MD harvest during project Data from nodes MD export DOI Publication Agency with Long Term Archive TQA SQA by Author Data Long Term Archive (LTA) MD LTA DOI Target Page DOI access MD harvest after archiving

DOI: Publishing, IDF & Catalogues

International DOI Foundation WDC-Climate as Publishing Agency of the IDF doi.org DataCite.org tib-hannover.de wdc-climate.de International DOI Foundation Registration Agencies National Organizations Publisher International DOI Foundation DataCite TIB, BL, … WDCC, …

Visibility of LTA Data in Public Catalogues DOI is given Catalogue metadata is sent to the Registration Agency via the national organization

SUMMRAY: Data Life Cycle

Virtual Research Environment The Data Life Cycle Management Virtual Research Environment Data Dissemination Data Production Data Evaluation Long Term Archive

E N D

Thank you, Questions?