Активное распределенное хранилище для многомерных массивов Дмитрий Медведев ИКИ РАН
Scientific data arrays Arrays are widely used in environmental sciences to store modelling results, satellite observations, raster maps, etc. Datasets can be quite large, up to several terabytes. Most data are stored as file collections in proprietary formats or universally adopted formats like netCDF, GRIB, HDF5. File access can be problematic: Scientists need to know about too many file formats Usually files must be completely downloaded before they can be used Thousands of files can be processed in one data request; only a small portion of their contents appears in the result set Currently available database solutions do not have convenient array storage capabilities.
ActiveStorage ActiveStorage is a generic storage for arrays of primitive data types. Its data model is based on the Unidata’s Common Data Model, used in netCDF, HDF5 and OpenDAP. Basically, ActiveStorage is a SQL Server database with CLR stored procedures and a client library. The stored procedures and the client library provide an abstraction layer for data access. Large arrays are split into chunks and can be spread across several parallel database servers for better performance.
RDBMS Binary data, metadata Stored procedures RDBMS Binary data, metadata Client library Middleware Client library ActiveStorageRasDaManSciDB
Common Data Model This is the Common Data Model (CDM) used in the recent versions of OpenDAP, netCDF and HDF5. Its purpose is the representation of multidimensional scientific data.
Database schema
Splitting an array into chunks 1 seek8 seeks 4 seeks Chunked array Non-chunked array We store chunks in BLOB fields of a database table Chunks do not need to be the same size chunk_keychunk
Data and directory tables The data table stores data chunks in BLOB columns. The directory table contains information about chunk boundaries. A chunk consists of a header and a data block. Two tables are automatically created for each new variable: Data table Directory table
How it works SQL Server DB Client library 2. Issue commands to the database server 3. Select the requested data from several chunks 3. Return the data parts to the client library 4. Assemble the data parts into one multi-dimensional array 1. Pass multi-dimensional data request to the client library Application
Parallel query processing SQL Server DB 1 Client library Application SQL Server DB 2
Parallel query performance 1 database server 4 parallel database servers
NCEP/NCAR Weather Reanalysis Continually updating gridded data set Incorporates observations and global climate model output 74 weather parameters 5000 netCDF files, 30 – 500 MB each Time coverage: 1948 – hourly values Grids: Regular grid, 2.5 x 2.5 degrees T62 Gaussian grid, 192 x 94 points.
Database contents ns1 – Single-layer data on regular grid ns2 – Single-layer data on Gaussian grid ns3, ns4, ns5 – Multi-layer data on regular grid Group: “ns2” NCEP/NCAR Weather Reanalysis Database “time” Group: “ns1” “lat” “lon” data Group: “ns5” Group: “ns4” “time” Group: “ns3” “lat” “lon” data “level”
NCDC Integrated Surface Database 1901 – 2008 time coverage. 30 million sensors. 1.7 billion observations. Fixed ground stationsShipsMobile stationsBuoys FM V N N N ADDGA KA1120N datetimelatlon Mandatory data sectionAdditional data sectionSection marker Group marker Parameter group Control data section ASCII files packed with gzip. 50 GB packed; 400 GB unpacked. When you’ve downloaded and unpacked the data...
Fixed stations
ActiveStorage database for NCDC data The main challenges: Observation times are irregular Observations are distributed unevenly in time and space Different stations have different sets of observed parameters Huge number of observations
Modifications to ActiveStorage N0 0 M ActiveStorage was designed to handle dense multidimensional arrays, with only a small number of missing values. It works well for regularly gridded data. Some multidimensional data are sparse and can not be represented by a single data block.
Modifications to ActiveStorage Sparse arrays can be represented as a tree hierarchy of dense data blocks Some data blocks can be empty Hierarchy levels are treated as additional dimensions (3,0,x,y,z)
Modifications to ActiveStorage
Point IDs Time series Time series are stored as a set of 1D arrays 1 array → 1 geographical point One geographical point may have observations from several sensors Sensors can be distinguished by observation parameters (station code, observation type, call letters, etc.) Time series representation
Buckets latitude longitude time 1⁰ 1 month 1⁰ Bucket Bucket IDs Arrays of point IDs The whole spatio-temporal domain is divided into buckets Each bucket contains a subset of observations from several geographical points A set of IDs of geographical points is stored as a 1D array For each bucket we store only those points that have observations in this bucket
Database contents NCDC Integrated Surface Database “time” Group: “mandatory” “buckets” data “time” Group: additional “buckets” data The “coords” table helps to select time series by latitude/longitude
Request processing chart Get bucket ids Read point ids from bucket Filter points by coordinates for each bucket Read observation times Data storage Filter points by time Read observation data Read observation data for each point Return results
Request processing times LocationSensorsObservationsTime Moscow s Madrid s Gulf of Guinea s Moscow, Madrid – fixed stations Small number of sensors Large number of observations Gulf of Guinea – buoys, ships Large number of sensors Small number of observations * All requests are 2 x 2 degrees, 01/01/2007 – 12/31/2007
ActiveStorage on Windows Azure
How it works Queue1 Queue2 Web Role Worker Role Processed chunks Raw chunks Result BLOB Storage
ActiveStorage on Windows Azure Advantages Easy and natural implementation of parallel query execution. BLOB read rates are quite good: 6.5 MB/s s overhead. Very scalable. CTP problem: replication overhead BLOB writes are several times slower than SQL Server. Message exchange rate is slow (several seconds).