Download presentation
Presentation is loading. Please wait.
Published byBarnaby Carroll Modified over 9 years ago
1
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 1 WS Spatiotemporal Databases for Geosciences, Biomedical sciences and Physical sciences Edinburgh, November 1st + 2nd, 2005 World Data Center Climate: Terabyte Data Storage in a Relational Database System WDCC Home: www.wdcc-climate.de / WDCC Contact: data@dkrz.dewww.wdcc-climate.dedata@dkrz.de Michael Lautenschlager, Hannes Thiemann and Frank Toussaint ICSU World Data Center Climate Model and Data / Max-Planck-Institute for Meorology Hamburg, Germany
2
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 2 Content: Introduction of WDCC CERA2 Data Model Data Access Connection to Mass Storage Archive Summary
3
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 3
4
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 4 WDCC Content ERA40 IPCC CEOP BALTEX HOAPS CARIBIC WOCE ERA15/40 NCEP GEBCO COSMOS Simulations @ MPI, GKSS,… Data from Earth System Modelling and Related Observations EH5/MPI-OM IPCC-AR4 Start: Approved in January 2003 Maintenance: Model and Data (M&D/MPI-M) and German Climate Computing Centre (DKRZ) Oktober 2005: 580 Experiments / 68.000 Data Sets
5
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 5 WDCC Access
6
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 6 WDCC Size 4.6 Billion BLOBs
7
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 7 WDCC DB Storage Storage of global coverages per file or BLOB : all levels, all parameters arbitrary time intervals all levels, all parameters 1 moment (6 by 6 hours) 1 level, 1 parameter 1 moment (= 1 BLOB = 1 global field) parameters levels days /4 parameters levels time how we get the grid data: Files from climate model postprocessing step 1: homogenizing time and calculation of diagnostics postprocessing step 2: isolation of levels & parameters and creation of BLOB table input
8
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 8 Data Model
9
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 9 (I) Data catalogue and Unix files (pointer or BLOB-table- entry) Enable search and identification of data Allow for data access as they are (coarse granularity) (II) Application-oriented data storage Time series of individual variables are stored as BLOB entries in DB Tables (fine granularity) Allow for fast and selective data access Storage in standard data format (GRIB, NetCDF) Allow for application of standard data processing routines (PINGOs, CDOs) CERA 1) Concept: Semantic Data Management 1) Climate and Environmental data Retrieval and Archiving
10
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 10 Level 1 - Interface: Metadata entries (XML, ASCII) + Data Files Level 2 – Interf.: Separate files containing BLOB table data in application adapted structure (time series of single variables) Experiment Description Unix-Files Table / Pointer Dataset 1 Description Dataset n Description BLOB Data Table BLOB Data Table WDCC Data Topology BLOB DB Table corresponds to scalable, virtual file at the operating system level.
11
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 11
12
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 12 CERA Data Model Entry Reference Status Distribution Contact Coverage Parameter Spatial Reference Local Adm. Data Access Data Org
13
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 13
14
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 14 CERA Modules 3 Modules: DATA_ACCESS for automatted data access ( remote data access) DATA_ORG organization of grid data ( geo-references of grid points in BLOBs) CODE matching of (internal) model code numbers
15
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 15 The CERA2 data model … allows for data search according to discipline, keyword, variable, project, author, geographical region and time interval and for data retrieval. allows for specification of data processing (aggregation and selection) without attaching the primary data. is flexible with respect to local adaptations, to storage of different types of geo-referenced data, and to definition of data topologies (hierarchical, network, ….). is open for cooperation and interchange with other database systems (e.g. FGDC metadata standard and ISO 19115 included). But: is not the simplest data model for each single application. Data Model Functions
16
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 16 Data Access
17
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 17 Web Access to WDCC METADATA:DATA: GUI:display in appletJDBC jblob-script:Search for DS names JDBC jblob –f … http:- html-display - xml-download (ISO, DC, …) download http URL: http://…
18
dynamic html pages http: html Servlet / JSP lnternet Application Server web browser Interactive Catalogue Access Catalogue access via WWW URL parsed by JSP integrated DB retrieval by JSP response in standard html efficient administration of detailed meta information request: URL
19
write to client disk http: file download Servlet / JSP lnternet Application Server web browser HTTP and JDBC Data Download request handeled by JSP return of binary file request: html form jdbc file download request: jdbc write to client disk progr. „jblob“ Data download via WWW standard client side jdbc retrieval return of binary file Data download via script/batch
20
raw xml xhtml ISO xml DC xml... various metadata formats http: XML xsl – mapping xsql –query see wini.wdc-climate.de lnternet Application Server Metadata access via WWW: xsql query to DB xml output from DB xsl mapping to any metadata format XML Interface for http Metadata Output request: URL user applications
21
plain ASCII html tables binary objects... various data formats http: plain, bin, html Java Servlet lnternet Application Server user applications Data access via WWW URL parsed by servlet query: DB access by jdbc response in any format http Data Output request: URL
22
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 22 Connection to Mass Storage Archive
23
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 23
24
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 24 Oracle DBMS + HSM DXDB: Unitree client on DB machines for communication between Oracle DB and tape archive TapesDisks
25
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 25 Use of DXDB DXDB is used for Ordinary Oracle datafiles Redo logs Backup
26
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 26 TBS - RW Tbl Partition 1 TBS - RW Tbl Partition 2 dxdb TBS - RO Tbl Partition 1 All tablespaces are moved “at once” to dxdb MigoutMigin
27
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 27 Migout / Migin Migout takes place after files haven’t been modified for x minutes Only one migout process per dxdb-filesystem Migin takes place immediately after a file is requested. Only parts accessed are retrieved from the backend storage. One migin process per requested file.
28
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 28 dxdb LWM HWM Purging
29
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 29 Pro It works It’s fast Applications don’t have to wait until files are completely restored from tapes.
30
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 30 Contra It works Dxdb not supported by Oracle Oracle's officially supported Backend requirements do not necessarily match requirements from other applications like HSM systems (i.e. connection to Unitree is not standarised). - If the backend works
31
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 31 Summary Efficient handling of detailed metadata easy and structured administration of > 60 metadata tables access support: Java Server Pages (JSP), Servlets, jdbc, xsql including standard DB features (sql, views, triggers,... ) Efficient handling of fine granularity data random access to arbitrary time steps of single parameters access support: Java Server Pages (JSP), Servlets, jdbc including standard DB features (authorisation,... ) transparent migration of bulk data to tape
32
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 32 The Winter TopTen Program identifies the world’s largest and most heavily used databases. Email reached in September, 13 th : ….. Congratulations on achieving Grand Prize award winner status (1) in Database Size, Other, All and TopTen Winner status Database Size, Other, Linux;Workload, Other, Linux in Winter Corp.'s 2005 TopTen Program!....... (1) Grand prizes are awarded for first place winners in the All Environments categories only. WDCC's CERA DB has been identified as the largest Linux DB. http://www.wintercorp.com/VLDB/2005_TopTen_Survey/2005TopTenWinners.pdf
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.