Presentation is loading. Please wait.

Presentation is loading. Please wait.

New ways in Big Data Management for NWP

Similar presentations


Presentation on theme: "New ways in Big Data Management for NWP"— Presentation transcript:

1 New ways in Big Data Management for NWP
Dr. Dieter Schröder, Dr. Jürgen Seib and Dr. Jochen Dibbern Deutscher Wetterdienst Frankfurter Straße 135 D Offenbach, Germany CBS TECO 21. – 22. Nov Guangzhou, China

2 Data Volume – How Big is Big?
Gigabyte - 210*3 Terabyte - 210*4 Petabyte - 210*5 Exabyte - 210*6 Zettabyte - 210*7 Yottabyte - 210*8 Brontobyte* - 210*9 Gegobyte*- 210*10 *This terminology is still subject to change. CBS TECO 21. – 22. Nov Guangzhou, China

3 Why is a new management for NWP data needed?
Increase of remote sensing data Higher resolution of NWP models Probabilistic vs. Deterministic forecasts Multi-variable data analysis in now-casting systems CBS TECO 21. – 22. Nov Guangzhou, China

4 How fast can we read a Petabyte?
Read speed of a storage disc: 100 Megabyte per second (MBps) Bytes Second Hour Month MB 0,01 GB 10,24 TB 10.485,76 2,91 PB ,24 2982,62 > 4 CBS TECO 21. – 22. Nov Guangzhou, China

5 File-based Management
Application Grib-Management-System File-Distribution-System Main memory Disc storage /tmp Input store Grib files Grib field store CBS TECO 21. – 22. Nov Guangzhou, China

6 Grib-Management-System
Database-oriented Management Database Application Grib-Management-System DB-Mirror Application DB-Mirror Application DB-Mirror Application Main memory Disc storage Grib files Grib field store CBS TECO 21. – 22. Nov Guangzhou, China

7 Grib Data Management System
Type of Grib data store: File-based Database-oriented Smallest access unit: Grib file Grib field Grib value at grid point CBS TECO 21. – 22. Nov Guangzhou, China

8 Goals Database-oriented management of Grib data such that
The smallest access unit will be the Grib value at a grid point The size of the database will not be bigger than the Grib store of a file-based Grib-Management-System The insert of Grib data into the database is not slower than the insert into a file-based Grib-Management-System Requests for spatial-temporal analytics can be formulated with SQL The database is also able to store vector data (polygons, lines, points, etc.) in order to have a common store for all types of meteorological data CBS TECO 21. – 22. Nov Guangzhou, China

9 Snowflake model grid point forecast step runtime forecast value
ensemble member level CBS TECO 21. – 22. Nov Guangzhou, China

10 GRIB_DATA table Coordinate values Grib values Grid point Runtime Forecast step Level Ensemble member Value 1 2 123 4 86 3 99 255 155 5 33 6 145 7 16 12 Coordinate values have to be stored for each Grib value More storage needed for the coordinates than for the Grib values CBS TECO 21. – 22. Nov Guangzhou, China

11 Revised relational data model
Grid Id integer Point geometry  Queries will be slow with classic row store database systems CBS TECO 21. – 22. Nov Guangzhou, China

12 Row storage Tables Tablespace select column1 from tab
CBS TECO 21. – 22. Nov Guangzhou, China

13 Column storage Tables Tablespace select column1 from tab
CBS TECO 21. – 22. Nov Guangzhou, China

14 PoC Hardware Environment
Fujitsu Server PRIMERGY RX900 S2 with 8 sockets / 80 cores / 160 threads Processor: Intel® Xeon® processor 2.27GHz Memory: 2 TB of RAM GB of SAN storage from a NetApp filer over Emulex Corporation Saturn: LightPulse Fibre Channel Host Adapter (rev 8Gbit CBS TECO 21. – 22. Nov Guangzhou, China

15 PoC Dataset All forecast data of the ensemble prediction system
COSMO-DE-EPS for one day Forecast range: 27 h / 45 h Forecast runs at: 00, 03, 06, 09, 12, 15, 18, 21 UTC Ensemble members: 20 Multi-level parameters: 9 on 50 vertical levels Single-level parameters: 101 Mesh size: 2.8 km (421 * 461 grid points) 4498 Grib2 files per day where each file contains 551 grib fields Size: 928 GB CBS TECO 21. – 22. Nov Guangzhou, China

16 Storage overview in SAP Hana
719 GB CBS TECO 21. – 22. Nov Guangzhou, China

17 Sample query 1 Get the predicted minimum, maximum and average values of the 2m temperature within a given area. CBS TECO 21. – 22. Nov Guangzhou, China

18 Sample query 2 Get the probability at each grid point that it will be warmer than a given threshold value. The calculation should be based on the results of two forecast runs. select point_id, 100 * count(value) / 40 from t_2m where (runtime = ? or runtime = ?) and forecasttime = ? and value > ? group by point_id order by point_id CBS TECO 21. – 22. Nov Guangzhou, China

19 Performance Comparison
Assume that the time for reading a Grib field from disk will take 10 ms CBS TECO 21. – 22. Nov Guangzhou, China

20 Summary Analysis of Grib fields with a database system is feasible
The size of the database will not be bigger than the Grib store of a file-based Grib-Management-System Time for database import of Grib data needs further optimisation Requests for spatial-temporal analytics of both vector and raster data can be formulated with SQL Relocation of meteorological analysis functionality into the database Full advantage of database features, e.g. replication, persistence, query optimisation, parallelisation, concurrent access, geo-spatial extensions, etc. CBS TECO 21. – 22. Nov Guangzhou, China


Download ppt "New ways in Big Data Management for NWP"

Similar presentations


Ads by Google