M.Lautenschlager (WDCC, Hamburg) / 19.01.04 / 1 ICSU World Data Center For Climate Semantic Data Management for Organising Terabyte Data Archives Michael.

Slides:



Advertisements
Similar presentations
Std-doi Publication of Climate Data at WDCC DataCite Summer Meeting 7./8. June 2010 Publication of climate data Heinke Höck World Data Center for Climate.
Advertisements

H. Thiemann (M&D) / / 1 CERA (Climate and Environmental Retrieval and Archive) Hannes Thiemann (M&D/MPIMET, Hamburg) Kiel,
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,
Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data.
ERA-Interim and ASR Data Management at NCAR
M.Lautenschlager (WDCC/MPI-M) / / 1 The CEOP Model Data Archive at the World Data Center for Climate as part of the CEOP Data Network CEOP / IGWCO.
CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten zmaw.de NCAR, October 27th – 29th, 2008.
M. Lautenschlager (M&D/MPIM) Changing Weather Patterns Weather: Yesterday - Today - Tomorrow Michael Lautenschlager Modelle und Daten / Max-Planck-Institut.
Активное распределенное хранилище для многомерных массивов Дмитрий Медведев ИКИ РАН.
The Earth System Grid Discovery and Semantic Web Technologies Line Pouchard Oak Ridge National Laboratory Luca Cinquini, Gary Strand National Center for.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
M.Lautenschlager (WDCC / MPI-M) / / 1 WS Spatiotemporal Databases for Geosciences, Biomedical sciences and Physical sciences Edinburgh, November.
German Cluster of WDCs for Earth System Research - Entwurf - Michael Lautenschlager 1, Michael Diepenbroek 2, Hannes Grobe 2, Michael Bittner 3, Jens Klump.
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, S. Kindermann, M. Lautenschlager,
M.Lautenschlager (WDCC / MPI-M) / / 1 GO-ESSP at LLNL Livermore, June 19th – 21st, 2006 World Data Center Climate: Status and Portal Integration.
INFSO-RI Enabling Grids for E-sciencE Intelligent Distributed Data Management in Earth system science K. Ronneberger, DKRZ, Germany.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
M.Lautenschlager (WDCC / MPI-M) / / 1 AGU Fall Meeting, San Francisco, December 2005 Michael Lautenschlager - WDC Climate (Max-Planck-Institut.
Scientific Investigations; Support from Research Data Archives for Computing in Atmospheric Sciences October, 2001 Steven Worley National Center.
M. Lautenschlager (M&D/MPIM)1 The CERA Database Michael Lautenschlager Modelle und Daten Max-Planck-Institut für Meteorologie Workshop "Definition.
October 16-18, Research Data Set Archives Steven Worley Scientific Computing Division Data Support Section.
Metadata (for the data users downstream) RFC GIS Workshop July 2007 NOAA/NESDIS/NGDC Documentation.
Coordinated Energy and water-cycle Observations Peroject A Well Organized Data Archive System Data Integrating/Archiving Center at University of Tokyo.
F. Toussaint (WDCC, Hamburg) / / 1 CERA : Data Structure and User Interface Frank Toussaint Michael Lautenschlager World Data Center for Climate.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
Scientific Investigations; Support from Research Data Archives for Joint Office for Science Support 26 February, 2002 Steven Worley SCD/DSS.
Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute for Meteorology German Climate Computing Centre (DKRZ)
Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Wolfgang.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
M.Lautenschlager (WDCC, Hamburg) / / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
Using the Global Change Master Directory (GCMD) to Promote and Discover ESIP Data, Services, and Climate Visualizations Presented by GCMD Staff January.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
The CERA2 Data Base Data input – Data output Hans Luthardt Model & Data/MPI-M, Hamburg Services and Facilities of DKRZ and Model & Data Hamburg,
M. Lautenschlager (M&D) / / 1 ENES: The European Earth System GRID ENES – Alcatel WS , ANTWERPEN Michael Lautenschlager Model and.
Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,
H. Thiemann (M&D) / / 1 Hannes Thiemann M&D Statusseminar, 22. April 2004.
IPCC TGICA and IPCC DDC for AR5 Data GO-ESSP Meeting, Seattle, Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute.
WGISS and GEO Activities Kathy Fontaine NASA March 13, 2007 eGY Boulder, CO.
The Repository of the World Data Centre for Climate Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie Repositories in Research.
INFSO-RI Enabling Grids for E-sciencE Intelligent Distributed Data Management in Earth System Science S. Kindermann, DKRZ, Germany.
PSI Meta Data meeting, Toulouse - 15 November The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,
Lautenschlager + Thiemann (M&D/MPI-M) / / 1 Introduction Course 2006 Services and Facilities of DKRZ and M&D Integrating Model and Data Infrastructure.
SCD Research Data Archives; Availability Through the CDP About 500 distinct datasets, 12 TB Diverse in type, size, and format Serving 900 different investigators.
Welcome to the PRECIS training workshop
Trials and Tribulations of a Small Archive Presented at the THIC Conference, NCAR, Boulder CO June 30, 2004 Presented at the THIC Meeting at the National.
Create XML from a template Browse available records WDCC Metadata Generation with GeoNetwork Hans Ramthun, Michael Lautenschlager, Hans-Hermann Winter.
The Research Data Archive at NCAR: A System Designed to Handle Diverse Datasets Bob Dattore and Steven Worley National Center for Atmospheric Research.
Global Change Master Directory (GCMD) Mission “To assist the scientific community in the discovery of Earth science data, related services, and ancillary.
IPCC WG II + III Requirements for AR5 Data Management GO-ESSP Meeting, Paris, Michael Lautenschlager, Hans Luthardt World Data Center Climate.
Hannes Thiemann Michael Lautenschlager Deutsches Klimarechenzentrum GmbH, Germany EGU 2010.
Distributed Data Servers and Web Interface in the Climate Data Portal Willa H. Zhu Joint Institute for the Study of Ocean and Atmosphere University of.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
M. Lautenschlager (M&D/MPIM)1 WDC on Climate as Part of the CERA 1 Database System Michael Lautenschlager Modelle und Daten Max-Planck-Institut.
5-7 May 2003 SCD Exec_Retr 1 Research Data, May Archive Content New Archive Developments Archive Access and Provision.
CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
2005 – 06 – - ESSP1 WDC Climate : Web Access to Metadata and Data Frank Toussaint World Data Center for Climate (M&D/MPI-Met, Hamburg)
Simulation Production System
AP7/AP8: Long-Term Archival of CMIP6 Data
World Conference on Climate Change October 24-26, 2016 Valencia, Spain
Flanders Marine Institute (VLIZ)
CEOP/IGWCO Joint Meeting, Feb.28  March 4, University of Tokyo, Japan
School of Information Studies, Syracuse University, Syracuse, NY, USA
Robert Dattore and Steven Worley
Data Curation in Climate and Weather
Presentation transcript:

M.Lautenschlager (WDCC, Hamburg) / / 1 ICSU World Data Center For Climate Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center for Climate (M&D/MPIMET, Hamburg) CEOP Workshop, Hamburg,

M.Lautenschlager (WDCC, Hamburg) / / 2 Data Group maintaining the WDCC Michael Kurtz Hans Luthardt Michael Lautenschlager Heinke Höck Hannes Thiemann Hermann Winter Jörg Wegner Frank Toussaint Peter Lenzen (Order: from left to right)

M.Lautenschlager (WDCC, Hamburg) / / 3 Content: General remarks DKRZ archive development CERA 1) concept CERA data model and structure Automatic fill process (not presented) CERA user interface 1) Climate and Environmental data Retrieval and Archiving

M.Lautenschlager (WDCC, Hamburg) / / 4 Semantic data management Data consist of numbers and metadata. Metadata construct the semantic data context. Metadata form a data catalogue which makes data searchable. Data are produced, archived and extracted within their semantic context. Data without explanation are only numbers. Problems: Metadata are of different complexity for different data types. Consistency between numbers and metadata have to be ensured.

M.Lautenschlager (WDCC, Hamburg) / / 5 DKRZ Architecture Proc.: 24 nodes 192 CPU's Memory: 1.5 TeraByte Perform.: 1.5 TeraFLOPS (peak) 500 GigaFLOPS (sust.) Tape Archive: > 3.4 PetaByte Disk Cache: 60 TeraByte Bandwidth Comp.S. – Data S.: 450 Mbyte/sec 155 Mbs

M.Lautenschlager (WDCC, Hamburg) / / 6 DKRZ Archive Development Basics observations and assumptions: 1)Unix-File archive content end of 2002: 600 TB including Backup's 2) Observed archive rate (Jan. - May 2003): 40 TB/month 3) System changes: 50% compute power increase in August ) CERA DB size end of 2002: 12 TB 5) Observed Increase (Jan. - May 2003): 1 TB/month 6) Automatic fill process into CERA DB is going to become operational with 4 TB/month this year and should increase from 10% of the archiving rate to approx. 30% end of 2004

M.Lautenschlager (WDCC, Hamburg) / / 7 DKRZ Archive Development

M.Lautenschlager (WDCC, Hamburg) / / 8 Problems in file archive access:  Missing Data Catalogue Directory structure of the Unix file system is not sufficient to organise millions of files.  Data are not stored application-oriented Raw data contain time series of 4D data blocks. Access pattern is time series of 2D fields.  Lack of experience with climate model data Problems in extracting relevant information from climate model raw data files.  Lack of computing facilities at client site Non-modelling scientists are not equipped to handle large amounts of data (1/2 TB = 10 years T106 or 50 years T42 in 6 hour storage intervals). Year Estimated File Archive Size 1,2 PB1,9 PB2,6 PB3,4 PB4,1 PB

M.Lautenschlager (WDCC, Hamburg) / / 9 Limits of model resolution ECHAM4(T42) Grid resolution: 2.8° Time step: 40 min ECHAM4(T106) Grid resolution: 1.1° Time step: 20 min Noreiks (MPIM), 2001

M.Lautenschlager (WDCC, Hamburg) / / 10 (I) Data catalogue and Unix files (pointer or BLOB-table- entry)  Enable search and identification of data  Allow for data access as they are (II) Application-oriented data storage  Time series of individual variables are stored as BLOB entries in DB Tables Allow for fast and selective data access  Storage in standard file-format (GRIB, NetCDF) Allow for application of standard data processing routines (PINGOs) CERA Concept: Semantic Data Management

M.Lautenschlager (WDCC, Hamburg) / / 11 CERA Database: 7.1 TB ( ) * Data Catalogue * Processed Climate Data * Pointer to Raw Data files Mass Storage Archive: 210 TB neglecting Security Copies ( ) CERA Database System Web-Based User Interface Catalogue Inspection Climate Data Retrieval DKRZ Mass Storage Archive InternetAccess Current database size is Terabyte Number of experiments: 318 Number of datasets: Number of blob within CERA at 19-JAN-04: Typical BLOB sizes: 17 kB and 100 kB Number of data retrievals: 1500 – 8000 / month Parts of CERA DB Web access to entire CERA DB content

M.Lautenschlager (WDCC, Hamburg) / / 12 CERA Data: Jan. Temp.

M.Lautenschlager (WDCC, Hamburg) / / 13 CERA Data: Jan. Wind (2 x 250 MB)

M.Lautenschlager (WDCC, Hamburg) / / 14 Complete with respect to IEEE’s Reference Model for Metadata (Bretherton, 1994) Browse, Search and Retrieval Ingest, Quality Assurance, Reprocessing Application to Application Transfer Storage and Archive Supports interoperability due to inclusion of international standards Directory Interchange Format (NASA, 1998) FGDC Metadata Content Standard (FGDC, 1996) ISO Metadata Standard for Geographic Information (ISO 19115) Reference “The CERA-2 Data Model” (DKRZ-Report No. 15, 1998) URL: CERA-2 Data Model

M.Lautenschlager (WDCC, Hamburg) / / 15 Metadata Entry This is the central CERA Block, providing information on the entry's title type and relation to other entries the project the data belong to a summary of the entry a list of general keywords related to data creation and review dates of the metadata Additionally: Modules and Local Extensions Module DATA_ORGANIZATION (grid structure) Module DATA_ACCESS (physical storage) Local extension for specific information on (e.g.) data usage data access and data administration Coverage Information on the volume of space-time covered by the data Reference Any publication related to the data togehter with the publication form Status Status information like data quality, processing steps, etc. Distribution Distribution information including access restrictions, data format and fees if necessary Contact Data related to contact persons and institutes like distributor, investigator, and owner of copyright Parameter Block describes data topic, variable and unit Spatial Reference Information on the coordinate system used CERA-2 Data Model Blocks

M.Lautenschlager (WDCC, Hamburg) / / 16 Level 1 - Interface: Metadata entries (XML, ASCII) + Data Files Level 2 – Interf.: Separate files containing BLOB table data in application adapted structure (time series of single variables) Experiment Description Unix-Files Table / Pointer Dataset 1 Description Dataset n Description BLOB Data Table BLOB Data Table CERA Structure

M.Lautenschlager (WDCC, Hamburg) / / 17 Climate Model Raw Data Application-oriented Data Storage (Interface level 2) Primary Data Processing

M.Lautenschlager (WDCC, Hamburg) / / 18 Start: Approved in January 2003 Maintenance: Model and Data (M&D/MPIMET) and German Climate Computing Centre (DKRZ) Mission: Data for climate research are collected, stored and disseminated ICSU Policy: long-term archiving and unrestricted data access for scientists Restriction: Only climate data products in CERA DB, no raw data storage. Content: Emphasis is spent on climate modelling and related data products. Co-operation: with thematically corresponding data centres like WDC- MARE (Bremen) and WDC-RSAT (Oberpfaffenhofen) URL:

M.Lautenschlager (WDCC, Hamburg) / / 19 WDC-CLIMATE Data Content  Climate Model Data (Continuous stream of new data)  IPCC DDC (Data Distribution Centre) Will be continued for the Fourth Assessment Report  CEOP (Coordinated Enhanced Observing Period) Model output retention and handling Centre Part of WCRP that was motivated by GEWEX with focus on water and energy cycles within the climate system ( – )  Observational Data Model related observations: ERA15/40 (ECMWF), NCEP 40 Y. Reanal. Instrumental data: WOCE (World Ocean Circulation Experiment) Earth observations: Access to SST's from NOAA AVHRR in cooperation with WDC RSAT (distributed archive)  Project Support (encourage Good Scientific Practice) HOAPS (Hamburg Ocean Atmosphere Parameters and Fluxes from Satellite Data) CARIBIC (Civil Aircraft for Regular Investigation of the Atmosphere Based on an Instrumentation Container), MPI Mainz Different model applications

M.Lautenschlager (WDCC, Hamburg) / / 20 Experiment Exp.-Acronym: EH5_T63L19_AMIP_6H Exp.-Name: ECHAM5_T63L19_AMIP Control Run 6H values Exp.-Description: Simulation of current climate using ECHAM5.2 forced with observed monthly sea surface temparatures and sea-ice concentrations (AMIP-2). The simulation was run on a NEC-SX6 (hurrikan). Atmospheric data is stored every 6 hours. Monthly means are available, too. Related experiments: - ECHAM5_TTTLLL_AMIP in where TTTLLL is: T21L19, T31L19, T42L19, T85L19, T106L19, T42L31, T63L31, T85L31 and T106L31 The output from the model run: schauer.dkrz.de:/pf/m/m214002/NEWEXP/EXP300/run365 Project: Climate Model Simulations at MPI Keyword: AMIP2 WDCC Example

M.Lautenschlager (WDCC, Hamburg) / / 21 Experiment Exp.-Acronym: EH5_T63L19_AMIP_6H WDCC Example Dataset (BLOB-Table) DS-Acronym: EH5_T63L19_R365_TEMP2 Variable: 2m temperature Dataset (BLOB-Table) DS-Acronym: EH5_T63L19_R365_WIND10M Variable: 10m wind speed Number of datasets: 350 time series of 2D global fields Total amount of GRIB data: 350 * 1.6 GB = 560 GB schauer.dkrz.de:/pf/m/m214002/ NEWEXP/EXP300/run365

M.Lautenschlager (WDCC, Hamburg) / / 22 Dataset DS-Acronym: EH5_T63L19_R365_TEMP2 DS-Name: EH5_T63L19_R365_TEMP2 DS-Summary: See summary of corresponding experiment. This dataset contains 6H values. Creation Date: 25-MAI-2003 Format: GRIB Size (Bytes): Storage: Model and Data: DB Internal Storage; Nearline Download Permission: No Topic / Parameter / Variable / Unit: atmosphere / atmospheric temperature / 2m temperature / Kelvin Code Type / Code # / Code Acronym: Echam5 / 167 / TEMP2 Temporal Structure: length of time series and storage intervalls Spatial Structure: precise definition of 3D grid points WDCC Example

M.Lautenschlager (WDCC, Hamburg) / / 23

M.Lautenschlager (WDCC, Hamburg) / / 24 Inclusion of other Data Sources Client applet receives foreign data URI from CERA-2 DB Foreign server provides DB data by http: German Aerospace Centre

M.Lautenschlager (WDCC, Hamburg) / / 25 CERA Access Statistic

M.Lautenschlager (WDCC, Hamburg) / / 26 CERA DB using countries