New ways in Big Data Management for NWP

Slides:



Advertisements
Similar presentations
Mark Holliman Wide Field Astronomy Unit Institute for Astronomy University of Edinburgh.
Advertisements

Big Data Working with Terabytes in SQL Server Andrew Novick
Computer Hardware.
Computer Hardware – Storage Systems.  Storage holds data, instructions, and information for future use  Consists of two parts Storage Device Storage.
ArcGIS Geodatabase Miles Logsdon Spatial Information Technologies, UW Garry Trudeau - Doonesbury.
Who Wants An A?. Put the following in order from smallest to largest A. gigabyte B. megabyte C. kilobyte D. terabyte.
Identify the categories of application software Explain ways software is distributed Explain how to work with application software Identify the key features.
Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.
© Hitachi Data Systems Corporation All rights reserved. 1 1 Det går pænt stærkt! Tony Franck Senior Solution Manager.
Intro. To GIS Lecture 4 Data: data storage, creation & editing
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Database Management System Lecture 2 Introduction to Database management.
EGU 2011 TIGGE, TIGGE LAM and the GIFS T. Paccagnella (1), D. Richardson (2), D. Schuster(3), R. Swinbank (4), Z. Toth (3), S.
Chapter 2: CPU &Data Storage. CPU Each computer has at least one CPU Each computer has at least one CPU CPU execute instructions to carry out tasks –
COMPUTER TECHNOLOGY MRS. SEALE COMPUTER PERFORMANCE.
Databases - their construction and use John Dubery, U3A Intermediate Computing April 2011.
Inside your computer. Hardware Review Motherboard Processor / CPU Bus Bios chip Memory Hard drive Video Card Sound Card Monitor/printer Ports.
Computer Systems Nat 4/5 Computing Science Data Representation Lesson 2: Floating Point Representation.
Units of Storage 1.5 Types of Memory and Storage.
“Live” Tomographic Reconstructions Alun Ashton Mark Basham.
Intro to GIS | Summer 2012 Attribute Tables – Part 1.
Lesson Objectives To understand the basic hardware of computers, and how they are made up To be able to compare performance of computers with price.
Computer Software. 1.Name the 3 main types of software and describe how they are used. systems software : Includes the operating system and all the utilities.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
By Dzuryat Nugroho.  Bit ◦ The smallest size of data in computer 1 bit = 0 or 1  Character (a,b,c,d……z,1,2,3,&,%,?,/….) ◦ 1 character = 8 bit  So,…if.
3-5/10/2005, LjubljanaEWGLAM/SRNWP meeting NWP related Slovak Hydrometeorological Institute 2005.
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
CPSC8985 FA 2015 Team C3 DATA MIGRATION FROM RDBMS TO HADOOP By Naga Sruthi Tiyyagura Monika RallabandiRadhakrishna Nalluri.
Sql Server Architecture for World Domination Tristan Wilson.
Storage Devices. Store, to store and storage I have stored my pictures in a CD. I have to go to the store. Your storage device isn’t working, so you need.
Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets By Yong Chen (with Jialin Liu) Data-Intensive Scalable Computing Laboratory.
Reminders Talk in English during all activities inside the class. Don’t answer the questions if you haven’t been asked. Don’t shout the answers.
COMMON FEATURES. WHAT IS? Processors: To control the functions of the device Microprocessors Internal memory: To store the program instructions into the.
Cache Advanced Higher.
STORAGE DEVICES Towards the end of this unit you will be able to identify the type of storage devices and their storage capacity.
New ways in Big Data Management for NWP
How Computers Store Variables
Hardware specifications
Data Representation N4/N5.
Get more done with Windows 10 Pro for Workstations
Running virtualized Hadoop, does it make sense?
Bridging the Data Science and SQL Divide for Practitioners
Lecture 16: Data Storage Wednesday, November 6, 2006.
Memory Parts of a computer
The physical parts of the computer
BACY = Basic Cycling A COSMO Data Assimilation Testbed for Research and Development Roland Potthast, Hendrik Reich, Christoph Schraff, Klaus.
What is Binary? Binary is a two-digit (Base-2) numerical system, which computers use to process and store data. The reason computers use the binary system.
Unit 2.6 Data Representation Lesson 1 ‒ Numbers
3 - STORAGE: DATA CAPACITY CALCULATIONS
Spatial Analysis With Big Data
Performance Performance is fundamentally limited by: Size of data
A developers guide to Azure SQL Data Warehouse
Oracle Storage Performance Studies
STORAGE DEVICES Towards the end of this unit you will be able to identify the type of storage devices and their storage capacity.
Data Structures and Algorithms
A developers guide to Azure SQL Data Warehouse
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Hardware.
Cloud computing mechanisms
In Memory OLTP Not Just for OLTP.
SAP HANA Cost-optimized Hardware for Non-Production
SAP HANA Cost-optimized Hardware for Non-Production
2nd Workshop on Short Range EPS 7th–8th April 2005, Bologna
Binary System.
Software Development Environment, File Storage & Compiling
ESRM 250/CFR 520 Autumn 2009 Phil Hurvitz
Computer Systems Nat 4/5 Computing Science Data Representation
Deutscher Wetterdienst – Open Data
Fast Accesses to Big Data in Memory and Storage Systems
Hybrid Buffer Pool The Good, the Bad and the Ugly
Presentation transcript:

New ways in Big Data Management for NWP Dr. Dieter Schröder, Dr. Jürgen Seib and Dr. Jochen Dibbern Deutscher Wetterdienst Frankfurter Straße 135 D-63067 Offenbach, Germany

Data Volume – How Big is Big Gigabyte - 210*3 Terabyte - 210*4 Petabyte - 210*5 Exabyte - 210*6 Zettabyte - 210*7 Yottabyte - 210*8 Brontobyte* - 210*9 Gegobyte* - 210*10 *This terminology is still subject to change.

Why is a new management for NWP data needed? Increase of remote sensing data Higher resolution of NWP models Probabilistic vs. Deterministic forecasts Multi-variable data analysis in now-casting systems

How fast can we read a Petabyte? Read speed of a storage disc: 100 Megabyte per second (MBps) Bytes Second Hour Month MB 1.048.576 0,01 GB 1.073.741.824 10,24 TB 1.099.511.627.776 10.485,76 2,91 PB 1.125.899.906.842.624 10.737.418,24 2982,62 > 4

File-based Management File-Distribution-System Application Grib-Management-System Main memory Disc storage /tmp Input store Grib files Grib field store

Grib-Management-System Database-oriented Management Database Application Grib-Management-System DB-Mirror Application DB-Mirror Application DB-Mirror Application Main memory Disc storage Grib files Grib field store

Grib Data Management System Type of Grib data store: File-based Database-oriented Smallest access unit: Grib file Grib field Grib value at grid point

Goals Database-oriented management of Grib data such that The smallest access unit will be the Grib value at a grid point The size of the database will not be bigger than the Grib store of a file-based Grib-Management-System The insert of Grib data into the database is not slower than the insert into a file-based Grib-Management-System Requests for spatial-temporal analytics can be formulated with SQL The database is also able to store vector data (polygons, lines, points, etc.) in order to have a common store for all types of meteorological data

Snowflake model grid point forecast step runtime forecast value ensemble member level

GRIB_DATA table Coordinate values Grib values Grid point Runtime Forecast step Level Ensemble member Value 1 2016-11-02 2 123 4 86 3 99 255 2016-11-03 155 5 33 6 145 7 16 12 Coordinate values have to be stored for each Grib value More storage needed for the coordinates than for the Grib values

Revised relational data model Grid Id integer Point geometry ….  Queries will be slow with classic row store database systems

Row storage Tables Tablespace select column1 from tab

Column storage Tables Tablespace select column1 from tab

PoC Hardware Environment Fujitsu Server PRIMERGY RX900 S2 with 8 sockets / 80 cores / 160 threads Processor: Intel® Xeon® processor E7-8860 @ 2.27GHz Memory: 2 TB of RAM 3298.5 GB of SAN storage from a NetApp filer over Emulex Corporation Saturn: LightPulse Fibre Channel Host Adapter (rev 03) @ 8Gbit

PoC Dataset All forecast data of the ensemble prediction system COSMO-DE-EPS for one day Forecast range: 27 h / 45 h Forecast runs at: 00, 03, 06, 09, 12, 15, 18, 21 UTC Ensemble members: 20 Multi-level parameters: 9 on 50 vertical levels Single-level parameters: 101 Mesh size: 2.8 km (421 * 461 grid points) 4498 Grib2 files per day where each file contains 551 grib fields Size: 928 GB

Storage overview in SAP Hana 719 GB

Sample query 1 Get the predicted minimum, maximum and average values of the 2m temperature within a given area.

Sample query 2 Get the probability at each grid point that it will be warmer than a given threshold value. The calculation should be based on the results of two forecast runs. select point_id, 100 * count(value) / 40 from t_2m where (runtime = ? or runtime = ?) and forecasttime = ? and value > ? group by point_id order by point_id

Summary Analysis of Grib fields with a database system is feasible The size of the database will not be bigger than the Grib store of a file-based Grib-Management-System Time for database import of Grib data needs further optimisation Requests for spatial-temporal analytics of both vector and raster data can be formulated with SQL Relocation of meteorological analysis functionality into the database Full advantage of database features, e.g. replication, persistence, query optimisation, parallelisation, concurrent access, geo-spatial extensions, etc.