Download presentation
Presentation is loading. Please wait.
Published byMildred Adams Modified over 8 years ago
1
Science SQL :: EGI-GEANT :: © 2014 P. Baumann EGI-GEANT Symposium 2014-sep-25, CWI, Amsterdam, The Netherlands Peter Baumann Jacobs University | rasdaman GmbH baumann@rasdaman.com Science SQL [gamingfeeds.com] Work in part funded by EU FP7 EarthServer, PublicaMundi
2
Science SQL :: EGI-GEANT :: © 2014 P. Baumann Structural Variety in Big Data Stock trading: 1-D sequences (i.e., arrays) Social networks: large, homogeneous graphs Ontologies: small, heterogeneous graphs Climate modelling: 4D/5D arrays Satellite imagery: 2D/3D arrays (+irregularity) Genome: long string arrays Particle physics: sets of events Bio taxonomies: hierarchies (such as XML) Documents: key/value stores = sets of unique identifiers + whatever etc.
3
Science SQL :: EGI-GEANT :: © 2014 P. Baumann Spatio-temporal sensor, image, model, & statistics data - Life Science: Pharma/chem, healthcare / bio research, bio statistics, genetics,... - Geo: Geodesy, geology, hydrology, oceanography, meteorology, earth system,... - Engineering & research: Simulation & experimental data in automotive/shipbuilding/ aerospace industry, turbines, process industry, astronomy, high energy physics,... - Management/Controlling: Decision Support, OLAP, Data Warehousing, census, statistics in industry and public administration,... - Multimedia: e-learning, distance learning, prepress,... „80% of all data have some spatial connotation“ [C&P Hane, 1992] Who Has Array Data?
4
Science SQL :: EGI-GEANT :: © 2014 P. Baumann Arrays in [Geo] Science & Engineering spatio-temporal sensor, image, simulation, statistics data(cubes) [OGC SWE] sensor feeds simulation data Big Data server
5
Science SQL :: EGI-GEANT :: © 2014 P. Baumann „raster data manager“: SQL + n-D raster objects Scalable parallel “tile streaming” architecture In operational use -OGC Web Coverage Service Core Reference Implementation rasdaman: Agile Array Analytics select img.green[x0:x1,y0:y1] > 130 from LandsatArchive as img where avg_cells( img.nir ) < 17
6
Science SQL :: EGI-GEANT :: © 2014 P. Baumann Adaptive Tiling Sample tiling strategies [Furtado]: - regular directional area of interest rasdaman storage layout language insert into MyCollection values... tiling area of interest [0:20,0:40], [45:80,80:85] tile size 1000000 index d_index storage array compression zlib
7
Science SQL :: EGI-GEANT :: © 2014 P. Baumann Value-Added Satellite Image Archive
8
Science SQL :: EGI-GEANT :: © 2014 P. Baumann Inset: Hadoop Not the Answer to All no builtin knowledge about structured data types -“Since it was not originally designed to leverage the structure […] its performance […] is therefore suboptimal” [Daniel Abadi] M. Stonebraker (XLDB 2012): „will hit a scalability wall“
9
Science SQL :: EGI-GEANT :: © 2014 P. Baumann Sample Application: Database Visualization select encode( struct { red: (char) s.image.b7[x0:x1,x0:x1], green: (char) s.image.b5[x0:x1,x0:x1], blue: (char) s.image.b0[x0:x1,x0:x1], alpha: (char) scale( d.elev, 20 ) }, "image/png" ) from SatImage as s, DEM as d [JacobsU, Fraunhofer; data courtesy BGS, ESA]
10
Science SQL :: EGI-GEANT :: © 2014 P. Baumann Use Case: Plymouth Marine Laboratory “Avg chlorophyll concentration for given area & time period, from x/y/t cube” -10, 60,120, 240 days Conclusions: -„we must minimise data transfer as well as [client] processing” -“standards such as WCPS provide the greatest benefit” [Oliver Clements, EGU 2014] rasdaman
11
Science SQL :: EGI-GEANT :: © 2014 P. Baumann Parallel / Distributed Query Processing Dataset B Dataset A Dataset D Dataset C select max((A.nir - A.red) / (A.nir + A.red)) - max((B.nir - B.red) / (B.nir + B.red)) - max((C.nir - C.red) / (C.nir + C.red)) - max((D.nir - D.red) / (D.nir + D.red)) from A, B, C, D 1 query 1,000+ cloud nodes
12
Science SQL :: EGI-GEANT :: © 2014 P. Baumann Secured Archive Integration First-ever direct, ad-hoc mix from protected NASA & ESA services in OGC WCS/WCPS Web client (EarthServer + CobWeb) WCPS
13
Science SQL :: EGI-GEANT :: © 2014 P. Baumann Use Case: Geo Filtering & Processing OGC Web Coverage Processing Service (WCPS) = high-level geo raster query language; adopted 2008 -WCPS 2: all grid types: for $c in ( M1, M2, M3 ) where some( $c.nir > 127 ) return encode( $c.red - $c.nir, “image/tiff“ ) (tiff A, tiff C ) 13 "From MODIS scenes M1, M2, M3: difference between red & nir, as TIFF" …but only those where nir exceeds 127 somewhere
14
Science SQL :: EGI-GEANT :: © 2014 P. Baumann Next Step: Polygon Clipping OGC WCS Application Profile – MetOcean [OGC 14-052] Weather, ocean data cubes = 4D x/y/z/t datacubes -curtain queries, corridor queries = polygon clipping -use case: weather forecast along flight path
15
Science SQL :: EGI-GEANT :: © 2014 P. Baumann rasdaman: Practice Proven from simple data access to agile analytics -strictly based on open OGC Big Geo Data standards 130+ TB databases, 2D, 3D x/y/z & x/y/t, 4D x/y/z/t timeseries single query distributed to 1,000+ cloud nodes
16
Science SQL :: EGI-GEANT :: © 2014 P. Baumann A Brief History of Array Databases
17
Science SQL :: EGI-GEANT :: © 2014 P. Baumann ISO 9075 Part 15: SQL/MDA -resolved by ISO SQL WG in June 2014 -Based on rasdaman concepts & experience n-D arrays as attributes declarative array operations select id, encode(scene.band1-scene.band2)/(scene.nband1+scene.band2)), „image/tiff“ ) from LandsatScenes where acquired between „1990-06-01“ and „1990-06-30“ and avg( scene.band3-scene.band4)/(scene.band3+scene.band4)) > 0 ISO „Science SQL“ create table LandsatScenes( id: integer not null, acquired: date, scene: row( band1: integer,..., band7: integer ) array [ 0:4999,0:4999] )
18
Science SQL :: EGI-GEANT :: © 2014 P. Baumann Conclusion n-D Arrays a major datatype, central to science, engineering, business -Massive spatio-temporal sensor, image, simulation, statistics data Query language = flexibility + scalability + information integration ISO SQL/MDA a game-changer -Any question, anytime -Overcoming data/metadata divide Visit us: -www.rasdaman.orgwww.rasdaman.org -www.earthserver.euwww.earthserver.eu
19
Science SQL :: EGI-GEANT :: © 2014 P. Baumann Why Irregular Tiling? [OpenStreetM ap] [Centrella et al: scidacreviews.org]
20
Science SQL :: EGI-GEANT :: © 2014 P. Baumann EarthServer: Agile Array Analytics 6 Lighthouse Applications covering Earth & Planetary Sciences -Established data centers adding EarthServer technology to service portfolio Summer 2014: 200+ TB operational Strictly open standards: OGC WCS & friends; common platform: rasdaman
21
Science SQL :: EGI-GEANT :: © 2014 P. Baumann selection & section result processing The rasql Query Language search & aggregation data format conversion rasdaman DB HDF PNG NetCDF HDF PNG NetCDF select c[ *:*, 100:200, *:*, 42 ] from ClimateSimulations as c select img * (img.green > 130) from LandsatArchive as img select mri from MRI as img, masks as am where some_cells( mri > 250 and m ) select png( c[ *:*, *:*, 100, 42 ] ) from ClimateSimulations as c
22
Science SQL :: EGI-GEANT :: © 2014 P. Baumann Query Rewriting select avg_cells( a + b ) from a, b select avg_cells( a ) + avg_cells( b ) from a, b avg + a b + ind b a ≡ Tile stream high traffic Scalar stream low traffic
23
Science SQL :: EGI-GEANT :: © 2014 P. Baumann select tiff( ht[ $1, *:*, *:* ] ) from HeadTomograms as ht, Hippocampus as mask where count_cells( ht > $2 and mask ) / count_cells( mask ) > $3 Research goal: to understand structural-functional relations in human brain Experiments capture activity patterns (PET, fMRI) -Temperature, electrical, oxygen consumption,... - lots of computations „activation maps“ Example: “ a parasagittal view of all scans containing critical Hippocampus activations, TIFF-coded.“ Human Brain Imaging $1 = slicing position, $2 = intensity threshold value, $3 = confidence
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.