New ways in Big Data Management for NWP

Slides:



Advertisements
Similar presentations
Mark Holliman Wide Field Astronomy Unit Institute for Astronomy University of Edinburgh.
Advertisements

Big Data Working with Terabytes in SQL Server Andrew Novick
Computer Hardware.
Computer Hardware – Storage Systems.  Storage holds data, instructions, and information for future use  Consists of two parts Storage Device Storage.
ArcGIS Geodatabase Miles Logsdon Spatial Information Technologies, UW Garry Trudeau - Doonesbury.
Who Wants An A?. Put the following in order from smallest to largest A. gigabyte B. megabyte C. kilobyte D. terabyte.
Identify the categories of application software Explain ways software is distributed Explain how to work with application software Identify the key features.
Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.
© Hitachi Data Systems Corporation All rights reserved. 1 1 Det går pænt stærkt! Tony Franck Senior Solution Manager.
CREATED BY, MS. JENNIFER DUKE BITS, BYTES, AND UNITS OF MEASUREMENT.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
EGU 2011 TIGGE, TIGGE LAM and the GIFS T. Paccagnella (1), D. Richardson (2), D. Schuster(3), R. Swinbank (4), Z. Toth (3), S.
Chapter 2: CPU &Data Storage. CPU Each computer has at least one CPU Each computer has at least one CPU CPU execute instructions to carry out tasks –
COMPUTER TECHNOLOGY MRS. SEALE COMPUTER PERFORMANCE.
Databases - their construction and use John Dubery, U3A Intermediate Computing April 2011.
Inside your computer. Hardware Review Motherboard Processor / CPU Bus Bios chip Memory Hard drive Video Card Sound Card Monitor/printer Ports.
Computer Systems Nat 4/5 Computing Science Data Representation Lesson 2: Floating Point Representation.
“Live” Tomographic Reconstructions Alun Ashton Mark Basham.
Intro to GIS | Summer 2012 Attribute Tables – Part 1.
Lesson Objectives To understand the basic hardware of computers, and how they are made up To be able to compare performance of computers with price.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
Installation of Storage Foundation for Windows High Availability 5.1 SP2 1 Daniel Schnack Principle Technical Support Engineer.
Excellence Publication Co. Ltd. Volume Volume 1.
3-5/10/2005, LjubljanaEWGLAM/SRNWP meeting NWP related Slovak Hydrometeorological Institute 2005.
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
Computer Hardware – Storage Systems.  Storage holds data, instructions, and information for future use  Consists of two parts Storage Device Storage.
Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets By Yong Chen (with Jialin Liu) Data-Intensive Scalable Computing Laboratory.
Reminders Talk in English during all activities inside the class. Don’t answer the questions if you haven’t been asked. Don’t shout the answers.
Hadoop file format studies in IT-DB Analytics WG meeting 20 th of May, 2015 Daniel Lanza, IT-DB.
Cache Advanced Higher.
STORAGE DEVICES Towards the end of this unit you will be able to identify the type of storage devices and their storage capacity.
New ways in Big Data Management for NWP
Hardware specifications
Data Representation N4/N5.
Get more done with Windows 10 Pro for Workstations
Running virtualized Hadoop, does it make sense?
Lecture 16: Data Storage Wednesday, November 6, 2006.
Computer Memory Digital Literacy.
SQL Server 2000 and Access 2000 limits
Memory Parts of a computer
The physical parts of the computer
BACY = Basic Cycling A COSMO Data Assimilation Testbed for Research and Development Roland Potthast, Hendrik Reich, Christoph Schraff, Klaus.
What is Binary? Binary is a two-digit (Base-2) numerical system, which computers use to process and store data. The reason computers use the binary system.
Unit 2.6 Data Representation Lesson 1 ‒ Numbers
3 - STORAGE: DATA CAPACITY CALCULATIONS
Spatial Analysis With Big Data
Performance Performance is fundamentally limited by: Size of data
Unit 2 Computer Memory Computer Technology (S1 Obj 2-3)
A developers guide to Azure SQL Data Warehouse
Oracle Storage Performance Studies
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
STORAGE DEVICES Towards the end of this unit you will be able to identify the type of storage devices and their storage capacity.
Data Structures and Algorithms
How do computers work? Storage.
A developers guide to Azure SQL Data Warehouse
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Hardware.
Cloud computing mechanisms
In Memory OLTP Not Just for OLTP.
SAP HANA Cost-optimized Hardware for Non-Production
SAP HANA Cost-optimized Hardware for Non-Production
2nd Workshop on Short Range EPS 7th–8th April 2005, Bologna
Binary System.
Some Verification Highlights and Issues in Precipitation Verification
Software Development Environment, File Storage & Compiling
ESRM 250/CFR 520 Autumn 2009 Phil Hurvitz
Computer Systems Nat 4/5 Computing Science Data Representation
Fast Accesses to Big Data in Memory and Storage Systems
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Hybrid Buffer Pool The Good, the Bad and the Ugly
Presentation transcript:

New ways in Big Data Management for NWP Dr. Dieter Schröder, Dr. Jürgen Seib and Dr. Jochen Dibbern Deutscher Wetterdienst Frankfurter Straße 135 D-63067 Offenbach, Germany CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

Data Volume – How Big is Big? Gigabyte - 210*3 Terabyte - 210*4 Petabyte - 210*5 Exabyte - 210*6 Zettabyte - 210*7 Yottabyte - 210*8 Brontobyte* - 210*9 Gegobyte*- 210*10 *This terminology is still subject to change. CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

Why is a new management for NWP data needed? Increase of remote sensing data Higher resolution of NWP models Probabilistic vs. Deterministic forecasts Multi-variable data analysis in now-casting systems CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

How fast can we read a Petabyte? Read speed of a storage disc: 100 Megabyte per second (MBps) Bytes Second Hour Month MB 1.048.576 0,01 GB 1.073.741.824 10,24 TB 1.099.511.627.776 10.485,76 2,91 PB 1.125.899.906.842.624 10.737.418,24 2982,62 > 4 CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

File-based Management Application Grib-Management-System File-Distribution-System Main memory Disc storage /tmp Input store Grib files Grib field store CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

Grib-Management-System Database-oriented Management Database Application Grib-Management-System DB-Mirror Application DB-Mirror Application DB-Mirror Application Main memory Disc storage Grib files Grib field store CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

Grib Data Management System Type of Grib data store: File-based Database-oriented Smallest access unit: Grib file Grib field Grib value at grid point CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

Goals Database-oriented management of Grib data such that The smallest access unit will be the Grib value at a grid point The size of the database will not be bigger than the Grib store of a file-based Grib-Management-System The insert of Grib data into the database is not slower than the insert into a file-based Grib-Management-System Requests for spatial-temporal analytics can be formulated with SQL The database is also able to store vector data (polygons, lines, points, etc.) in order to have a common store for all types of meteorological data CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

Snowflake model grid point forecast step runtime forecast value ensemble member level CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

GRIB_DATA table Coordinate values Grib values Grid point Runtime Forecast step Level Ensemble member Value 1 2016-11-02 2 123 4 86 3 99 255 2016-11-03 155 5 33 6 145 7 16 12 Coordinate values have to be stored for each Grib value More storage needed for the coordinates than for the Grib values CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

Revised relational data model Grid Id integer Point geometry  Queries will be slow with classic row store database systems CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

Row storage Tables Tablespace select column1 from tab CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

Column storage Tables Tablespace select column1 from tab CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

PoC Hardware Environment Fujitsu Server PRIMERGY RX900 S2 with 8 sockets / 80 cores / 160 threads Processor: Intel® Xeon® processor E7-8860 @ 2.27GHz Memory: 2 TB of RAM 3298.5 GB of SAN storage from a NetApp filer over Emulex Corporation Saturn: LightPulse Fibre Channel Host Adapter (rev 03) @ 8Gbit CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

PoC Dataset All forecast data of the ensemble prediction system COSMO-DE-EPS for one day Forecast range: 27 h / 45 h Forecast runs at: 00, 03, 06, 09, 12, 15, 18, 21 UTC Ensemble members: 20 Multi-level parameters: 9 on 50 vertical levels Single-level parameters: 101 Mesh size: 2.8 km (421 * 461 grid points) 4498 Grib2 files per day where each file contains 551 grib fields Size: 928 GB CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

Storage overview in SAP Hana 719 GB CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

Sample query 1 Get the predicted minimum, maximum and average values of the 2m temperature within a given area. CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

Sample query 2 Get the probability at each grid point that it will be warmer than a given threshold value. The calculation should be based on the results of two forecast runs. select point_id, 100 * count(value) / 40 from t_2m where (runtime = ? or runtime = ?) and forecasttime = ? and value > ? group by point_id order by point_id CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

Performance Comparison Assume that the time for reading a Grib field from disk will take 10 ms CBS TECO 21. – 22. Nov. 2016 Guangzhou, China

Summary Analysis of Grib fields with a database system is feasible The size of the database will not be bigger than the Grib store of a file-based Grib-Management-System Time for database import of Grib data needs further optimisation Requests for spatial-temporal analytics of both vector and raster data can be formulated with SQL Relocation of meteorological analysis functionality into the database Full advantage of database features, e.g. replication, persistence, query optimisation, parallelisation, concurrent access, geo-spatial extensions, etc. CBS TECO 21. – 22. Nov. 2016 Guangzhou, China