The Repository of the World Data Centre for Climate Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie Repositories in Research.

Slides:



Advertisements
Similar presentations
Std-doi Publication of Climate Data at WDCC DataCite Summer Meeting 7./8. June 2010 Publication of climate data Heinke Höck World Data Center for Climate.
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,
TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 22 World Wide Web and HTTP.
Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
M.Lautenschlager (WDCC/MPI-M) / / 1 The CEOP Model Data Archive at the World Data Center for Climate as part of the CEOP Data Network CEOP / IGWCO.
Pilot Implementation: Publication and Citation of Scientific Primary Data Result of CODATA WG, supported by DFG Jan Brase Learning Lab Lower Saxony, Uni.
CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten zmaw.de NCAR, October 27th – 29th, 2008.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
M.Lautenschlager (WDCC / MPI-M) / / 1 WS Spatiotemporal Databases for Geosciences, Biomedical sciences and Physical sciences Edinburgh, November.
German Cluster of WDCs for Earth System Research - Entwurf - Michael Lautenschlager 1, Michael Diepenbroek 2, Hannes Grobe 2, Michael Bittner 3, Jens Klump.
M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin World Data Center Cluster.
Review on 5 Years DataCite and 10 Years DOI Registration for Data DataCite Annual Conference 2014 Nancy, August 25th – 26th Michael Lautenschlager (DKRZ.
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, S. Kindermann, M. Lautenschlager,
M.Lautenschlager (WDCC / MPI-M) / / 1 GO-ESSP at LLNL Livermore, June 19th – 21st, 2006 World Data Center Climate: Status and Portal Integration.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Persistent Identifiers Reinhard.
M. Lautenschlager (M&D/MPIM)1 The CERA Database Michael Lautenschlager Modelle und Daten Max-Planck-Institut für Meteorologie Workshop "Definition.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
Java Omar Rana University of South Asia. Course Overview JAVA  C/C++ and JAVA Comparison  OOP in JAVA  Exception Handling  Streams  Graphics User.
11/16/2012ISC329 Isabelle Bichindaritz1 Web Database Application Development.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
F. Toussaint (WDCC, Hamburg) / / 1 CERA : Data Structure and User Interface Frank Toussaint Michael Lautenschlager World Data Center for Climate.
How to Adapt existing Archives to VO: the ISO and XMM-Newton cases Research and Scientific Support Department Science Operations.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
World Data Center for Marine Environmental Sciences.
Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute for Meteorology German Climate Computing Centre (DKRZ)
Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
Project Overview Graduate Selection Process Project Goal Automate the Selection Process.
Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Wolfgang.
Project Overview Graduate Selection Process Project Goal Automate the Selection Process.
M.Lautenschlager (WDCC, Hamburg) / / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
Opendap dev - meeting, Boulder, Feb 2007 OPeNDAP infrastructure in European Operational Oceanography T Loubrieu (IFREMER) T Jolibois (CLS)
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
IODE Ocean Data Portal - ODP  The objective of the IODE Ocean Data Portal (ODP) is to facilitate and promote the exchange and dissemination of marine.
The CERA2 Data Base Data input – Data output Hans Luthardt Model & Data/MPI-M, Hamburg Services and Facilities of DKRZ and Model & Data Hamburg,
- Vendredi 27 mars PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,
IPCC TGICA and IPCC DDC for AR5 Data GO-ESSP Meeting, Seattle, Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute.
Chapter 29 World Wide Web & Browsing World Wide Web (WWW) is a distributed hypermedia (hypertext & graphics) on-line repository of information that users.
PSI Meta Data meeting, Toulouse - 15 November The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,
Lautenschlager + Thiemann (M&D/MPI-M) / / 1 Introduction Course 2006 Services and Facilities of DKRZ and M&D Integrating Model and Data Infrastructure.
27.1 Chapter 27 WWW and HTTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
ORNL DAAC SPATIAL DATA ACCESS TOOL Open Geospatial Consortium (OGC) Services Bruce E. Wilson Suresh K. Santhana Vannan Yaxing Wei Tammy W. Beaty National.
IPCC WG II + III Requirements for AR5 Data Management GO-ESSP Meeting, Paris, Michael Lautenschlager, Hans Luthardt World Data Center Climate.
Hannes Thiemann Michael Lautenschlager Deutsches Klimarechenzentrum GmbH, Germany EGU 2010.
CGI – GeoSciML Testbed 3 Status for BRGM Jean-Jacques Serrano.
M. Lautenschlager (M&D/MPIM)1 WDC on Climate as Part of the CERA 1 Database System Michael Lautenschlager Modelle und Daten Max-Planck-Institut.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
2005 – 06 – - ESSP1 WDC Climate : Web Access to Metadata and Data Frank Toussaint World Data Center for Climate (M&D/MPI-Met, Hamburg)
AP7/AP8: Long-Term Archival of CMIP6 Data
World Conference on Climate Change October 24-26, 2016 Valencia, Spain
DIAS & DIAS data release 2 years DIAS-GCI Cooperation Hiroko KINUTANI DIAS (Data Integration and Analysis System in Japan) , St. Petersburg.
EVLA Archive The EVLA Archive is the E2E Archive
Advisor: Prof. Sudha Ram Jeffrey Abbruzzi, MS/MIS candidate
Presentation transcript:

The Repository of the World Data Centre for Climate Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie Repositories in Research Institutions MPG eScience Seminar, 25./ Garching

Content Structure and mission of WDC-Climate Access structure and interoperability Central information storage in RDBMS Accounting and rights management Availability and permanency Quality control Outlook

Data Services at DKRZ / WDCC hosts Computational support for the German climate science community Earth System modelling and data management has department Store & disseminate climate research data for the scientific community

inter- national integration

WDCC as IPCC Data Node UN WMO / UNEP IPCC UK: BADC 0.5 PByte HD DE: WDCC 0.5 PByte HD +1.4 PBytes tape US: PCMDI: 0.5 PByte HD IPCC Data Federation model output data evaluation paper evaluation:

access structure

dynamic html pages http: html Servlet / JSP lnternet Application Server web browser Catalogue Access: the GUI Catalogue access via WWW URL parsed by JSP integrated DB retrieval by JSP response in standard html efficient administration of detailed meta information request: URL

raw xml xhtml ISO xml DC xml... various metadata formats http: XML xsl – mapping xsql –query see wini.wdc-climate.de lnternet Application Server http Metadata Output request: URL user applications Metadata access via WWW: xsql query to DB xml output from DB xsl mapping to any metadata format

write to client disk http: file download Servlet / JSP lnternet Application Server web browser Data Download request handled by JSP return of binary file request: html form jdbc file download request: jdbc write to client disk progr. „jblob“ Data download via WWW standard client side jdbc retrieval return of binary file Data download via script/batch

internal structure

WDCC Data Access: Autumn 2009 Archive: files Container: Blobs Appl. Server TDS LobServer HPSS 9 PB CERA DB Layer What Where Who When How Midtier

inter- operability

Interoperability Metadata interoperability xml – TDS: defined protocol for http download including: –OAI / OpenDAP –Web services WMS / WCS –harvesting –spatiotemporal selections Data interoperability limited by data formats: WMO-Grib, netCDF not yet GIS formats (arcInfo)

DB as internal brain

The Relational Data Base CERA DB Layer What Where Who When How A priori Efficient handling of authentication authorization detailed metadata fine granularity data  independent of DKRZ accounts A posteriori download statistics accounting

DB Data Holding: A & A Handling of authentication & authorization Authentication DB login by http / servlet DB login by direct connection to the DB (old script tool jblob) DB login by http connection to the DB (new script tool jblob) Authorization Control of access rights (authorization) at level of finest granularity.

DB Data Holding: Metadata Efficient handling of detailed metadata easy and structured administration of > 60 tables detailed catalog information for users access support: Java Server Pages (JSP), Servlets, jdbc, xsql including standard DB features (views,... )

DB Data Holding: Data Efficient handling of fine granularity data field based data access to arbitrary time steps of single parameters access support: Servlets, jdbc, web download including standard DB features transparent migration of bulk data between tape & disk (nearline access)

statistics

User Statistics... by continent... by project... by time... by country

User Statistics When are times of low traffic / are good times for maintenace? Who downloaded special classified data (accounting for data providers)? Who downloaded which data / is to notify in case of data withdrawal? What data quantities have been downloaded by how many logins?

accounts

Accounting free personal accounts for named user access to most of the data metadata visible for all accounts presently not basis for data charges for classified data: WDC-Climate establishes the contact to the data providers for permission

rights of use

Necessity to Handle Different Rights all rights are project/provider dependent ICSU-WDC requirement: open data access no common EU policy (e.g., like Crown Copyright in UK and others)

availability

The Increasing Data Amount Tape space distribution to archive classes at DKRZ part of the “work” space on tape because GFS too small “docu” domain consists of WDCC no expiration dates in “arch” domain parts of “arch” domain belongs to “docu” but not yet documented

Ways to Drain the Data Tsunami A) Introduction of three data classes 1.Test datalife cycle: weeks to months 2.Project datalife cycle: 3 – 5 years 3.Final resultslife cycle: 10 years and longer B) Introduction of four archive classes 1.Temp(orary)scratch discs at compute server 2.Workfixed disc space at project level 3.Arch(ive)tape, single copy, expiring 4.Docu(mentation)tape, security copy, long-term documented fixed data

perma- nency DOI

Digital Object Identifiers (DOI) GFZ Geophysics International DOI Foundation TIB Hannover Registration Agency M&D/MPIM Climate Models Marum/AWI Observations Data Storage Long-term Archiving In WDC Data Storage Long-term Archiving In WDC Data Storage Long-term Archiving Global Handle System DDB URN-Knot DFG Project "Publication and Citation of Scientific Primary Data" TIB-ORDER Library Catalogue

Connections to Central Catalogues WDC-Climate hosted data in the catalogue at TIB Hannover

quality

Data Quality Control at WDC-Climate Quality control of metadata some general topics by WDC-Climate more detailed by client users via GUI (GeoNetwork) final QC in the STD-DOI data publication process Quality control of data in cooperation of data providers and WDC-Climate for internally generated data (climate models) by WDC-Climate (automated process) final QC in the STD-DOI data publication process data from scientific projects are in the responsibility of the data providers

Main Challenges - Metadata Increasing interdisciplinarity will lead to increasing semantic problems in the metadata and in their presentation  ontologies needed Interdisciplinary catalogue access  more central catalogues & data federations needed more standardisation projects  Inhomogeneity within one discipline seems to be decreasing

Main Challenges - Data The Earth System Sciences viewpoint Data quantities will keep growing –  huge storage media with application adapted data structures (selected data access) Grow of data transport means seems to follow games and video –  better access performance Data inhomogeneities will – grow with growing interdisciplinarity shrink with growing number of standards  flexible access tools