The EarthServer initiative: towards Agile Big Data Services 2nd GEOSS Science and Technology Stakeholder Workshop Bonn, Germany, 2012-aug-29 Peter Baumann Jacobs University | rasdaman GmbH Bremen, Germany p.baumann@jacobs-university.de
About the Presenter Professor of CS, Jacobs University Head, Large-Scale Scientific Information Systems research group Main outcome so far: rasdaman first „Big Raster Data Analytics“ server Standardization OGC: chair of raster-relevant working groups, editor of 12+ standards & candidate standards ISO: working on Raster („Array“) SQL INSPIRE: Invited expert for coverages www.jacobs-university.de/lsis, www.rasdaman.org
Roadmap OGC standards rasdaman EarthServer EarthServer & GEOSS Conclusions
Feature and Coverage Data Standards Core element in OGC: geographic feature = abstraction of a real world phenomenon associated with a location relative to Earth Special kind of feature: coverage = space-time varying multi-dimensional phenomenon Typical representative: raster image ...but there is more! Typically, coverages are Big Data
Coverage Types 5 as per GML 3.2.1 Abstract Coverage all n-D «FeatureType» Abstract Coverage all n-D New subtypes possible Discrete Coverage Continuous Coverage Rectified GridCoverage Referenceable GridCoverage Grid Coverage MultiSolid Coverage MultiSurface Coverage MultiCurve Coverage MultiPoint Coverage 5
Coverage Encoding Pure GML: complete coverage represented by GML Special Format: other suitable file format (ex: MIME type “image/tiff”) Multipart-Mixed: multipart MIME, type “multipart/mixed” GML Coverage Domain set Range type Range set App Metadata GML Coverage Domain set Range type xlink App Metadata NetCDF file NetCDF Domain set Range type Range set App Metadata GeoTIFF Range type Range set 6 6
Core OGC Service Standards data images data data feature coverage meta FE WCPS CQL … … … WFS-T WCS-T CS-T WFS WMS WCS CS-W WMS "portrays spatial data” pictures WCS "provides data + descriptions; data with original semantics, may be interpreted, extrapolated, etc.“ [09-110r4] 7
Web Coverage Service (WCS) Core: Simple & efficient access to multi-dimensional coverages subset = trim | slice WCS Extensions for additional functionality facets “band extraction”, scaling, reprojection, interpolation, query language, ... Application Profiles define domain-oriented bundling 8
Web Coverage Processing Service (WCPS) Raster Query Language: ad-hoc navigation, extraction, aggregation, analytics Time series Image processing Summary data Sensor fusion & pattern mining
EarthServer: Big Earth Data Analytics Scalable On-Demand Processing for the Earth Sciences EU funded, 3 years, 5.85 mEUR Platform: rasdaman (Array Analytics server) Distributed query processing, integrated data/metadata search, 3D clients Strictly open standards: OGC WMS+WCS+WCPS; W3C Xquery; X3D 6 * 100+ TB databases for all Earth sciences + planetary science in attachment slide 5 with our contribution. Meteorological / climate studies require 5D datasets, thus: 3D for space, 1D for time, and 1D for different variables (humidity, temperature, precipitation, and so on). The picture shows a thunderstorm simulation, with the solid surface representing a threshold in the 3D humidity filed, while colors represent temperature isosurfaces. In the bottom, there is the top view of the simulated thunderstorm to simulate satellite view, and the respective satellite observation.
The rasdaman Raster Analytics Server www.rasdaman.org Array DBMS for massive n-D raster data new database attribute type: array<celltype,extent> Data integration: rasters stored in standard database Extending ISO SQL with array operators: “tile streaming” architecture n-D array set of n-D tiles extensive optimization, hw/sw parallelization In operational use dozen-Terabyte objects Analytics queries in 50 ms on laptop select img.green[x0:x1,y0:y1] > 130 from LandsatArchive as img
Value-Added Satellite Image Archive [Diedrich et al 2001]
rasdaman: Distributed Query Processing WCPS peer-to-peer cloud each node accepts all requests Incoming node distributes query, semantics based Manifold optimization criteria coverage A for $a in ( A ) return encode( ($a.nir - $a.red) / ($a.nir + $a.red), “array-compressed“ ) for $a in ( A ), $b in ( B ) return encode( ( ($a.nir - $a.red) / ($a.nir + $a.red) - ($b.nir - $b.red) / ($b.nir + $b.red) ), “HDF5“ ) coverage B for $b in ( B ) return encode( ($b.nir - $b.red) / ($b.nir + $b.red), “array-compressed“ ) [Owonibi 2012]
EarthServer Contribution to GEOSS Integrated n-D coverage data / metadata search Smooth integration with Broker [Nativi, Mazzetti 2012]
EarthServer Contribution to GEOSS Including „reverse lookup“ queries: „give me metadata for data with specific properties“ Also integration with MapServer, GDAL, ... Scalable n-D interfaces, based on OGC standards Working „in situ“on existing archives; no copying! Flexible ad-hoc processing & filtering Through OGC standardized query language nD visual Web clients 1D diagrams, 2D maps, 3D data cubes, 3D timeseries sets, ... Dynymically composed from query results Integrated n-D coverage data / metadata search Smooth integration with Broker
Conclusion Sensor, image, & statistics data = a main source of Big Data in Earth Sciences Petrol industry has „more bytes than barrels“ OGC standards offer common platform spatio-temporal coverages – a unified, cross-domain data model Web Coverage Service suite – from simple download to flexible analytics www.ogcnetwork.net/wcs EarthServer can contribute Agile Analytics to GEOSS OGC coverage standards rasdaman technology www.earthserver.eu
Integration of OGC WCS and SWE SWE O&M and SensorML (+ friends): high flexibility to accommodate virtually any data structure → upstream integration GMLCOV and WCS (+WCPS): one generic schema for all coverage types; scalable; versatile processing → downstream services coverage server O&M + SensorML GMLCOV + WCS Semantic Web
VAROS (contd.d)
The Integrated Geo Warehouse nD 2D Compprehensive geophysics data mgmt seismic measurement, borehole data, geophone data, geo tomograms, stratigraphy layers, geological models, ... + annotations + meta data 1D 3D
Let’s Take a Closer Look... Divergent access patterns for ingest and retrieval Alternative 1: simple access service, let client chisel result Alternative 2: Deliver to exact needs no bandwidth waste, higher quality of service Server must mediate between access patterns (...later more) Intelligent access interfaces help
standard database system System Architecture petascope request translator rasdaman engine metadata standard database system WCS+WCPS WPS+WCPS interfaces: OGC or API Server: OGC interfaces as servlets: WCS 2.0, WCPS 1.0, WPS 1.0 Server engine: C++ Bindings to GDAL, MapServer, ERDAS (to be extended) Ex: VAROS project (ESA) Commercial client, ChartLink Open-source server, rasdaman
Just-In-Time Compilation Times [ms] for 5122 * n ops Observation: interpreted mode slows down Approach: cluster suitable operations compile & dynamically bind Benefit: Speed up complex, repeated operations Variation: compile code for GPU select x*x*...*x from float_matrix as x [Jucovschi, Stancu-Mara 2008]
Query Optimization – Ex. 1