Download presentation
Presentation is loading. Please wait.
1
CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008
2
Contents Statistics Requirements + Features General architecture Implementation (current and new) Migration Summary
3
Basic Statistics WDCC / CERA: General Statistics at 01-10-2008 00:00:10 Database Size (TByte): 370 Number of blobs: 6663287791 (6.6 billion) Data access by fields and not by files. Number of experiments: 1146 Number of datasets: 142062 Total size divided by number of BLOBs gives the average size of data access granules: 50 - 60 kB/BLOB
4
Users by continent Active Users 1-Jan-2008 until 14-Oct-2008
5
Download destinations Download destinations 1-Jan-2008 until 14-Oct-2008
6
Records per download
7
Recordsize
8
Requirements and constraints Access over WAN Downloads typically quite small, but huge downloads to some extent. Small downloads imply that users are not willing to wait long … We can not scan through large files for each download Granularity has to be small
9
Datatypes Model data Climatological runs (global and regional) (IPCC, …) Weather forecasts (DPHASE, CEOP, …) Reanalysis data Observational data (COPS, CARIBIC, …) Satellite data products
10
Formats CERA provides the ability to store data of any format: These are the formats used GRIB (60%) NetCDF (18%) Other (22%)
11
General Architecture Midtier Data
12
General Architecture MetadataData Proxy Webserver Appl. Server Entry Reference Status Distribution Contact Coverage Parameter Spatial Reference Local Adm. Data Access Data Org Select timestep + region Convert format
13
Storage within CERA 1Data of timestep i 2Data of timestep i+1 3Data of timestep i+2 nData of timestep i+n … Database Table Data of single variable Index
14
Handicap Handicap: not enough disk space available Data stored within database: approx. 400 TB Disks available: approx. 24 TB Database has been coupled transparently to the HSM system How do we avoid frequent tape accesses? Big cache Store data as close as possible according to the needs of users: split into single variables
15
TBS - RW Tbl Partition 1 TBS - RW Tbl Partition 2 dxdb TBS - RO Tbl Partition 1 All tablespaces are moved “at once” to dxdb MigoutMigin Data migration
16
Inside the datafile Primary Key Lob Index Table Blob data Header 128k
17
Frontend versus Backend Header 128k Filesystem FrontendHSM Backend Header 128k Part 1 = 512 MB Part 2 = 512 MB
18
Retrieving data 4 Header 128k 31 25 Tape Request
19
Warehouse features Compression – nothing special used within the server Partitioning – allow parts of data to be moved to HSM Backup Nologging - beware of crash … Read only - two copies on tape
20
New implementation Metadata database will stay as is Oracle Databases holding data will be replaced by a new, self-made development Why? There is a certain risk that a future version of Oracle may not work with a / any HSM system On the long run some license costs shall be saved
21
General Architecture - new MetadataData Webserver Appl. Server Oracle-DB Blobserver
22
CERA-Container Instead of keeping data within blobs in Oracle databases, data records will be kept within so called CERA Container Files. Ability to keep huge number of records. They provide fast access independent of position within file (granular access). Provided fault tolerance against tape damages by keeping checksums within the files. Enclose read/write operations against container files in transactions. Well known format
23
Migration Concept / Team (namely Peter Drakenberg, DKRZ) Not yet really finished Software First software ready, in order to migrate data Convert old data Started last week, but will take at least a year
24
Dataflow: outbound 1 2 Webserver Appl. Server 3 4 MetadataData 5 6 7 8 Processing
25
Dataflow: inbound MetadataDataserver Postprocessing Model run GFS
26
Summary CERA allows for the storage of data of different kind Format independent Metadata enables addressing of internal and external data Users are typically fetching only small amounts of data. System allows for efficient access to small data granules By using warehousing functions like Partitioning by using small Oracle database Blobs or - in future - CERA Container files.
27
Thank you !
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.