PSI Meta Data meeting, Toulouse - 15 November The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke, F. Toussaint, M. Lautenschlager
PSI Meta Data meeting, Toulouse - 15 November Content History, Architecture, Usage of the CERA DB WDCC, IPCC/DDC, CEOP : data archives hosted by CERA Core and Extensions of the CERA meta data model Relations with other meta data standards
PSI Meta Data meeting, Toulouse - 15 November CERA compliant with DIF (DirectoryInterchangeFormat), NASA Hierachic 2-layer structure: Experiments => Datasets Shortcomings: - static 2-layer horizontal structure of climate model data - restructuring needed History Architecture Usage
PSI Meta Data meeting, Toulouse - 15 November CERA-21997, compliant in addition with FGDC meta data standard 1-layer structure: RDBMS with tree-like / hierachical / network relations between entities Requirements: - geographically distributed archives - common meta data model for all archives => simple but extendible - one GUI for all archives History Architecture Usage Unchanged since 7 years
PSI Meta Data meeting, Toulouse - 15 November History Architecture Usage User Application Server DBMS (Oracle): 12 TB in 10/2002 Metadata, Blob-Data, Processing Fileserver (Unitree) Processed + Raw Data Mass Storage Archive ( 0.5 PB in 10/2002) FTP Data Migration SQL*Net IIOP CORBA-Client RMI/IIOP http, jdbc, iiop Direct file access 177 TB in 11/ PB in 11/2005
PSI Meta Data meeting, Toulouse - 15 November Mass Storage capacity/load " tape archive: STK Tape Silo > 3.4 PB " disks: 177 TB in Oracle RDBMS (web accessible; applet or servlet) " Bandwidth compute - data server 450 MB / sec " 1 TB/day automated filling at model run time (IPCC) " 3.4 PB data in files (no.=67263) " No. of experiments: 570 " > 1000 requests per day History Architecture Usage
PSI Meta Data meeting, Toulouse - 15 November WDCC IPCC/DDC CEOP Other CERA is hosting the data of World Data Centre of Climate Maintained by M&D in cooperation with DKRZ and MPI-Met Collection and dissemination of data related to climate change (focus on georeferenced data) Access: WWW or FTP (on request)
PSI Meta Data meeting, Toulouse - 15 November WDCC IPCC/DDC CEOP Other M&D and its CERA DB is acknowledged as Data Distribution Centre for IPCC model data Hosting (and distributing) a subset of IPCC data all monthly mean model data of AR4, TAR, SAR
PSI Meta Data meeting, Toulouse - 15 November WDCC IPCC/DDC CEOP Other
PSI Meta Data meeting, Toulouse - 15 November CERA-2 holds the CEOP data archive (Coordinated Enhanced Observing Period) " " Strong cooperation with GEWEX, CLIVAR, CLiC, IGOS-P, CEOS " web based access to xml meta data and data files WDCC IPCC/DDC CEOP Other
PSI Meta Data meeting, Toulouse - 15 November The Winter TopTen Program identifies the world’s largest and most heavily used databases. reached in September, 13 th : ….. Congratulations on achieving Grand Prize award winner status (1) in Database Size, Other, All and TopTen Winner status Database Size, Other, Linux;Workload, Other, Linux in Winter Corp.'s 2005 TopTen Program! (1) Grand prizes are awarded for first place winners in the All Environments categories only. WDCC's CERA DB has been identified as the largest Linux DB.
PSI Meta Data meeting, Toulouse - 15 November Collaborations within Climate Community Data Archive Initiative " DFD/DLR " IPA/DLR " DOD " DWD " GFZ " PANGAEA/AWI " xDAT/PIK " CERA-2/PIK " ECMWF " CERA-2/DKRZ " BADC Distributed Archive
PSI Meta Data meeting, Toulouse - 15 November CERA-2 Metat data model Core scheme: - valid for all entries Extensions: - community defined Module (e.g. PIK, DKRZ, PRISM to be defined?) - user defined local extension Structural flexibility: - definable fields, tables, entry types & various other - flexible lists of valid values (LOV): extensible but controlled Simple structure: - blockwise table groups - all CERA-2 blocks have a similar structure - more complex structures go into CERA Modules Core and Extensions
PSI Meta Data meeting, Toulouse - 15 November The CERA Core meta data: " only data common to most data in geophysics " compliant with 1 st level of FGDC standard " sufficient to answer: " What data are stored? " How to get assistance? " How to get the data? Little information is requireable, in order to make the model applicable for as many institutions/data as possible ! Schema and example at The core meta data system is extendible but not changeable (e.g. the CERA Core table structure may not be changed) Core and Extensions
PSI Meta Data meeting, Toulouse - 15 November Parameter Block describes data topic, variable and unit Metadata Entry This is the central CERA Block, providing information on the entry's title type and relation to other entries the project the data belong to a summary of the entry a list of general keywords related to data creation and review dates of the metadata Coverage Information on the volume of space-time covered by the data Reference Any publication related to the data together with the publication form Status Status information like data quality, processing steps, etc. Distribution Distribution information including access restrictions, data format and fees if necessary Contact Data related to contact persons and institutes like distributor, investigator, and owner of copyright Spatial Reference Information on the coordinate system used Core and Extension FGDC level 1 Extension needed for Grid description
PSI Meta Data meeting, Toulouse - 15 November The Core structure
PSI Meta Data meeting, Toulouse - 15 November Parameter Block describes data topic, variable and unit Metadata Entry This is the central CERA Block, providing information on the entry's title type and relation to other entries the project the data belong to a summary of the entry a list of general keywords related to data creation and review dates of the metadata Additionally: Modules / Local Extensions Module DATA_ORGANIZATION (grid structure) Module DATA_ACCESS (physical storage) Local extension for specific information on (e.g.) data usage data access and data administration Coverage Information on the volume of space-time covered by the data Reference Any publication related to the data together with the publication form Status Status information like data quality, processing steps, etc. Distribution Distribution information including access restrictions, data format and fees if necessary Contact Data related to contact persons and institutes like distributor, investigator, and owner of copyright Spatial Reference Information on the coordinate system used Core and Extension
PSI Meta Data meeting, Toulouse - 15 November Core and Extensions ENTRY entry_id. PARAMETER entry_id. data_org_id data_access_id.. DATA_ORG data_org_id data_org_descr space_id time_id DATA_ACCESS data_access_id access_structure_id storage1_id storage2_id storage3_id storage4_id rec_structure_id modification_date CORE
PSI Meta Data meeting, Toulouse - 15 November CERA: Module Example
PSI Meta Data meeting, Toulouse - 15 November Core and Extensions DATA_ORG module data_org_descr/name/acronym space_id: key of table with space information gridded or point data (station data, buoys, ships, …) gridded data only if lat/lon coordinates time_id : key of table with time information (grid) => any data value locatable in space / time
PSI Meta Data meeting, Toulouse - 15 November Meta data not in the CERA core can be defined in new modules. Presently: " DATA_ORG module " DATA_ACCESS module Presently there is little information on model code (= NMM code base) or on configurations of models (=NMM models) in CERA => define model meta data module A minimum of specifications should be required (allowing to exactly reproduce a model run) Most specifications should be optional Core and Extensions
PSI Meta Data meeting, Toulouse - 15 November A minimum of specifications should be required (allowing to exactly reproduce a model run) " Components involved " Code repository for each component " Code release numbers for each component " Compile scripts " Namelists " Initial data files " Forcing data files Core and Extensions
PSI Meta Data meeting, Toulouse - 15 November Most specifications should be optional: " All the required from above can be split into small pieces of informations and included to the right place of the meta data / tables Core and Extensions
PSI Meta Data meeting, Toulouse - 15 November CF standard CF standard compliancy: Any data file with any file format can be an entry of CERA CERA is primarily containing GRIB single variable data files Support for NetCDF/CF file format is being implemented: - adding meta data elements for the NetCDF/CF attributes if needed - e.g. additional CF_UNIT table - optional retrieval of data time windows of fine granularity - search along NetCDF-CF attributes
PSI Meta Data meeting, Toulouse - 15 November Other standards xsl scripts exists to transfer the CERA meta data into other standards/formats: xhtml DIF (NASA) - xml CSDGM (FGDC) - xml ISO/TC211 (19115/19139) - xml Dublin Core – xml
PSI Meta Data meeting, Toulouse - 15 November The End