M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 1 WS Spatiotemporal Databases for Geosciences, Biomedical sciences and Physical sciences Edinburgh, November.

Slides:



Advertisements
Similar presentations
Std-doi Publication of Climate Data at WDCC DataCite Summer Meeting 7./8. June 2010 Publication of climate data Heinke Höck World Data Center for Climate.
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
CNES implementation of the ISO standard An extension of the current CNES implementation of the ISO metadata standard.
Database System Concepts and Architecture
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
Metadata at ICPSR Sanda Ionescu, ICPSR.
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,
1 CEOS/WGISS20 – Kyiv – September 13, 2005 Paul Kopp SIPAD New Generation: Dominique Heulet CNES 18, Avenue E.Belin Toulouse Cedex 9 France
Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data.
M.Lautenschlager (WDCC/MPI-M) / / 1 The CEOP Model Data Archive at the World Data Center for Climate as part of the CEOP Data Network CEOP / IGWCO.
CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten zmaw.de NCAR, October 27th – 29th, 2008.
Fundamentals, Design, and Implementation, 9/e Chapter 14 JDBC, Java Server Pages, and MySQL.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Активное распределенное хранилище для многомерных массивов Дмитрий Медведев ИКИ РАН.
F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, S. Kindermann, M. Lautenschlager,
M.Lautenschlager (WDCC / MPI-M) / / 1 GO-ESSP at LLNL Livermore, June 19th – 21st, 2006 World Data Center Climate: Status and Portal Integration.
The UDK: The Environmental Data Catalog of Germany and Austria Dr. Fred Kruse Coordination Center UDK/GEIN.
Sys Prog & Scripting - HW Univ1 Systems Programming & Scripting Lecture 15: PHP Introduction.
Overview of the ODP Data Provider Sergey Sukhonosov National Oceanographic Data Centre, Russia Expert training on the Ocean Data Portal technology, Buenos.
M. Lautenschlager (M&D/MPIM)1 The CERA Database Michael Lautenschlager Modelle und Daten Max-Planck-Institut für Meteorologie Workshop "Definition.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
CEOS/WGISS 20, Kyev, September 12-16, WTF-CEOP Implementation Plan #1 Status (WTF-CEOP first prototype, by JAXA) September 12, 2005 Osamu Ochiai.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
F. Toussaint (WDCC, Hamburg) / / 1 CERA : Data Structure and User Interface Frank Toussaint Michael Lautenschlager World Data Center for Climate.
CS4273: Distributed System Technologies and Programming Lecture 13: Review.
The Network Performance Advisor J. W. Ferguson NLANR/DAST & NCSA.
Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
Project Overview Graduate Selection Process Project Goal Automate the Selection Process.
Oracle 10g Database Administrator: Implementation and Administration Chapter 2 Tools and Architecture.
Integrated Grid workflow for mesoscale weather modeling and visualization Zhizhin, M., A. Polyakov, D. Medvedev, A. Poyda, S. Berezin Space Research Institute.
M.Lautenschlager (WDCC, Hamburg) / / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure.
M.Lautenschlager (WDCC, Hamburg) / / 1 ICSU World Data Center For Climate Semantic Data Management for Organising Terabyte Data Archives Michael.
1 MSCS 237 Overview of web technologies (A specific type of distributed systems)
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
The CERA2 Data Base Data input – Data output Hans Luthardt Model & Data/MPI-M, Hamburg Services and Facilities of DKRZ and Model & Data Hamburg,
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,
H. Thiemann (M&D) / / 1 Hannes Thiemann M&D Statusseminar, 22. April 2004.
IPCC TGICA and IPCC DDC for AR5 Data GO-ESSP Meeting, Seattle, Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
The Repository of the World Data Centre for Climate Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie Repositories in Research.
PSI Meta Data meeting, Toulouse - 15 November The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,
Dataset registration process Sergey Sukhonosov, Dr. Sergey Belov National Oceanographic Data Centre, Russia Training course on establishment of the ODP.
Web Technologies Lecture 8 Server side web. Client Side vs. Server Side Web Client-side code executes on the end-user's computer, usually within a web.
JSP Server Integrated with Oracle8i Project2, CMSC691X Summer02 Ching-li Peng Ying Zhang.
H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19 th 2006 / 1 Data Discovery and Basic Processing within the German.
Lautenschlager + Thiemann (M&D/MPI-M) / / 1 Introduction Course 2006 Services and Facilities of DKRZ and M&D Integrating Model and Data Infrastructure.
MarLIN: a research data metadatabase for CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart contact:
Create XML from a template Browse available records WDCC Metadata Generation with GeoNetwork Hans Ramthun, Michael Lautenschlager, Hans-Hermann Winter.
The Research Data Archive at NCAR: A System Designed to Handle Diverse Datasets Bob Dattore and Steven Worley National Center for Atmospheric Research.
IPCC WG II + III Requirements for AR5 Data Management GO-ESSP Meeting, Paris, Michael Lautenschlager, Hans Luthardt World Data Center Climate.
Internet addresses By Toni Grey & Rashida Swan HTTP Stands for HyperText Transfer Protocol Is the underlying stateless protocol used by the World Wide.
M. Lautenschlager (M&D/MPIM)1 WDC on Climate as Part of the CERA 1 Database System Michael Lautenschlager Modelle und Daten Max-Planck-Institut.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
JAFER Toolkit Project Oxford University 1 JAFER Java-based high level Z39.50 toolkit Matthew Dovey; Colin Tatham; Antony Corfield; Richard Mawby Oxford.
A Presentation Presentation On JSP On JSP & Online Shopping Cart Online Shopping Cart.
ODP V2 Data Provider overview. 22 Scope Data Provider provides access to data and metadata of the local data systems. Data Provider is a wrapper, installed.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
2005 – 06 – - ESSP1 WDC Climate : Web Access to Metadata and Data Frank Toussaint World Data Center for Climate (M&D/MPI-Met, Hamburg)
MIKADO – Generation of ISO – SeaDataNet metadata files
Simulation Production System
World Conference on Climate Change October 24-26, 2016 Valencia, Spain
Flanders Marine Institute (VLIZ)
EVLA Archive The EVLA Archive is the E2E Archive
Data Management Components for a Research Data Archive
Presentation transcript:

M.Lautenschlager (WDCC / MPI-M) / / 1 WS Spatiotemporal Databases for Geosciences, Biomedical sciences and Physical sciences Edinburgh, November 1st + 2nd, 2005 World Data Center Climate: Terabyte Data Storage in a Relational Database System WDCC Home: / WDCC Contact: Michael Lautenschlager, Hannes Thiemann and Frank Toussaint ICSU World Data Center Climate Model and Data / Max-Planck-Institute for Meorology Hamburg, Germany

M.Lautenschlager (WDCC / MPI-M) / / 2 Content: Introduction of WDCC CERA2 Data Model Data Access Connection to Mass Storage Archive Summary

M.Lautenschlager (WDCC / MPI-M) / / 3

M.Lautenschlager (WDCC / MPI-M) / / 4 WDCC Content ERA40 IPCC CEOP BALTEX HOAPS CARIBIC WOCE ERA15/40 NCEP GEBCO COSMOS MPI, GKSS,… Data from Earth System Modelling and Related Observations EH5/MPI-OM IPCC-AR4 Start: Approved in January 2003 Maintenance: Model and Data (M&D/MPI-M) and German Climate Computing Centre (DKRZ) Oktober 2005: 580 Experiments / Data Sets

M.Lautenschlager (WDCC / MPI-M) / / 5 WDCC Access

M.Lautenschlager (WDCC / MPI-M) / / 6 WDCC Size 4.6 Billion BLOBs

M.Lautenschlager (WDCC / MPI-M) / / 7 WDCC DB Storage Storage of global coverages per file or BLOB : all levels, all parameters arbitrary time intervals all levels, all parameters 1 moment (6 by 6 hours) 1 level, 1 parameter 1 moment (= 1 BLOB = 1 global field) parameters levels days /4 parameters levels time how we get the grid data: Files from climate model postprocessing step 1: homogenizing time and calculation of diagnostics postprocessing step 2: isolation of levels & parameters and creation of BLOB table input

M.Lautenschlager (WDCC / MPI-M) / / 8 Data Model

M.Lautenschlager (WDCC / MPI-M) / / 9 (I) Data catalogue and Unix files (pointer or BLOB-table- entry)  Enable search and identification of data  Allow for data access as they are (coarse granularity) (II) Application-oriented data storage  Time series of individual variables are stored as BLOB entries in DB Tables (fine granularity) Allow for fast and selective data access  Storage in standard data format (GRIB, NetCDF) Allow for application of standard data processing routines (PINGOs, CDOs) CERA 1) Concept: Semantic Data Management 1) Climate and Environmental data Retrieval and Archiving

M.Lautenschlager (WDCC / MPI-M) / / 10 Level 1 - Interface: Metadata entries (XML, ASCII) + Data Files Level 2 – Interf.: Separate files containing BLOB table data in application adapted structure (time series of single variables) Experiment Description Unix-Files Table / Pointer Dataset 1 Description Dataset n Description BLOB Data Table BLOB Data Table WDCC Data Topology BLOB DB Table corresponds to scalable, virtual file at the operating system level.

M.Lautenschlager (WDCC / MPI-M) / / 11

M.Lautenschlager (WDCC / MPI-M) / / 12 CERA Data Model Entry Reference Status Distribution Contact Coverage Parameter Spatial Reference Local Adm. Data Access Data Org

M.Lautenschlager (WDCC / MPI-M) / / 13

M.Lautenschlager (WDCC / MPI-M) / / 14 CERA Modules 3 Modules: DATA_ACCESS for automatted data access (  remote data access) DATA_ORG organization of grid data (  geo-references of grid points in BLOBs) CODE matching of (internal) model code numbers

M.Lautenschlager (WDCC / MPI-M) / / 15 The CERA2 data model … allows for data search according to discipline, keyword, variable, project, author, geographical region and time interval and for data retrieval. allows for specification of data processing (aggregation and selection) without attaching the primary data. is flexible with respect to local adaptations, to storage of different types of geo-referenced data, and to definition of data topologies (hierarchical, network, ….). is open for cooperation and interchange with other database systems (e.g. FGDC metadata standard and ISO included). But: is not the simplest data model for each single application. Data Model Functions

M.Lautenschlager (WDCC / MPI-M) / / 16 Data Access

M.Lautenschlager (WDCC / MPI-M) / / 17 Web Access to WDCC METADATA:DATA: GUI:display in appletJDBC jblob-script:Search for DS names JDBC jblob –f … html-display - xml-download (ISO, DC, …) download http URL:

dynamic html pages http: html Servlet / JSP lnternet Application Server web browser Interactive Catalogue Access Catalogue access via WWW URL parsed by JSP integrated DB retrieval by JSP response in standard html efficient administration of detailed meta information request: URL

write to client disk http: file download Servlet / JSP lnternet Application Server web browser HTTP and JDBC Data Download request handeled by JSP return of binary file request: html form jdbc file download request: jdbc write to client disk progr. „jblob“ Data download via WWW standard client side jdbc retrieval return of binary file Data download via script/batch

raw xml xhtml ISO xml DC xml... various metadata formats http: XML xsl – mapping xsql –query see wini.wdc-climate.de lnternet Application Server Metadata access via WWW: xsql query to DB xml output from DB xsl mapping to any metadata format XML Interface for http Metadata Output request: URL user applications

plain ASCII html tables binary objects... various data formats http: plain, bin, html Java Servlet lnternet Application Server user applications Data access via WWW URL parsed by servlet query: DB access by jdbc response in any format http Data Output request: URL

M.Lautenschlager (WDCC / MPI-M) / / 22 Connection to Mass Storage Archive

M.Lautenschlager (WDCC / MPI-M) / / 23

M.Lautenschlager (WDCC / MPI-M) / / 24 Oracle DBMS + HSM DXDB: Unitree client on DB machines for communication between Oracle DB and tape archive TapesDisks

M.Lautenschlager (WDCC / MPI-M) / / 25 Use of DXDB DXDB is used for  Ordinary Oracle datafiles  Redo logs  Backup

M.Lautenschlager (WDCC / MPI-M) / / 26 TBS - RW Tbl Partition 1 TBS - RW Tbl Partition 2 dxdb TBS - RO Tbl Partition 1 All tablespaces are moved “at once” to dxdb MigoutMigin

M.Lautenschlager (WDCC / MPI-M) / / 27 Migout / Migin  Migout takes place after files haven’t been modified for x minutes  Only one migout process per dxdb-filesystem  Migin takes place immediately after a file is requested. Only parts accessed are retrieved from the backend storage.  One migin process per requested file.

M.Lautenschlager (WDCC / MPI-M) / / 28 dxdb LWM HWM Purging

M.Lautenschlager (WDCC / MPI-M) / / 29 Pro  It works  It’s fast  Applications don’t have to wait until files are completely restored from tapes.

M.Lautenschlager (WDCC / MPI-M) / / 30 Contra  It works  Dxdb not supported by Oracle  Oracle's officially supported Backend requirements do not necessarily match requirements from other applications like HSM systems (i.e. connection to Unitree is not standarised). - If the backend works

M.Lautenschlager (WDCC / MPI-M) / / 31 Summary Efficient handling of detailed metadata easy and structured administration of > 60 metadata tables access support: Java Server Pages (JSP), Servlets, jdbc, xsql including standard DB features (sql, views, triggers,... ) Efficient handling of fine granularity data random access to arbitrary time steps of single parameters access support: Java Server Pages (JSP), Servlets, jdbc including standard DB features (authorisation,... ) transparent migration of bulk data to tape

M.Lautenschlager (WDCC / MPI-M) / / 32 The Winter TopTen Program identifies the world’s largest and most heavily used databases. reached in September, 13 th : ….. Congratulations on achieving Grand Prize award winner status (1) in Database Size, Other, All and TopTen Winner status Database Size, Other, Linux;Workload, Other, Linux in Winter Corp.'s 2005 TopTen Program! (1) Grand prizes are awarded for first place winners in the All Environments categories only. WDCC's CERA DB has been identified as the largest Linux DB.