Presentation is loading. Please wait.

Presentation is loading. Please wait.

Science SQL :: EGI-GEANT :: © 2014 P. Baumann EGI-GEANT Symposium 2014-sep-25, CWI, Amsterdam, The Netherlands Peter Baumann Jacobs University | rasdaman.

Similar presentations


Presentation on theme: "Science SQL :: EGI-GEANT :: © 2014 P. Baumann EGI-GEANT Symposium 2014-sep-25, CWI, Amsterdam, The Netherlands Peter Baumann Jacobs University | rasdaman."— Presentation transcript:

1 Science SQL :: EGI-GEANT :: © 2014 P. Baumann EGI-GEANT Symposium 2014-sep-25, CWI, Amsterdam, The Netherlands Peter Baumann Jacobs University | rasdaman GmbH baumann@rasdaman.com Science SQL [gamingfeeds.com] Work in part funded by EU FP7 EarthServer, PublicaMundi

2 Science SQL :: EGI-GEANT :: © 2014 P. Baumann Structural Variety in Big Data  Stock trading: 1-D sequences (i.e., arrays)  Social networks: large, homogeneous graphs  Ontologies: small, heterogeneous graphs  Climate modelling: 4D/5D arrays  Satellite imagery: 2D/3D arrays (+irregularity)  Genome: long string arrays  Particle physics: sets of events  Bio taxonomies: hierarchies (such as XML)  Documents: key/value stores = sets of unique identifiers + whatever  etc.

3 Science SQL :: EGI-GEANT :: © 2014 P. Baumann  Spatio-temporal sensor, image, model, & statistics data - Life Science: Pharma/chem, healthcare / bio research, bio statistics, genetics,... - Geo: Geodesy, geology, hydrology, oceanography, meteorology, earth system,... - Engineering & research: Simulation & experimental data in automotive/shipbuilding/ aerospace industry, turbines, process industry, astronomy, high energy physics,... - Management/Controlling: Decision Support, OLAP, Data Warehousing, census, statistics in industry and public administration,... - Multimedia: e-learning, distance learning, prepress,...  „80% of all data have some spatial connotation“ [C&P Hane, 1992] Who Has Array Data?

4 Science SQL :: EGI-GEANT :: © 2014 P. Baumann Arrays in [Geo] Science & Engineering  spatio-temporal sensor, image, simulation, statistics data(cubes) [OGC SWE] sensor feeds simulation data Big Data server

5 Science SQL :: EGI-GEANT :: © 2014 P. Baumann  „raster data manager“: SQL + n-D raster objects  Scalable parallel “tile streaming” architecture  In operational use -OGC Web Coverage Service Core Reference Implementation rasdaman: Agile Array Analytics select img.green[x0:x1,y0:y1] > 130 from LandsatArchive as img where avg_cells( img.nir ) < 17

6 Science SQL :: EGI-GEANT :: © 2014 P. Baumann Adaptive Tiling  Sample tiling strategies [Furtado]: - regular directional area of interest  rasdaman storage layout language insert into MyCollection values... tiling area of interest [0:20,0:40], [45:80,80:85] tile size 1000000 index d_index storage array compression zlib

7 Science SQL :: EGI-GEANT :: © 2014 P. Baumann Value-Added Satellite Image Archive

8 Science SQL :: EGI-GEANT :: © 2014 P. Baumann Inset: Hadoop Not the Answer to All  no builtin knowledge about structured data types -“Since it was not originally designed to leverage the structure […] its performance […] is therefore suboptimal” [Daniel Abadi] M. Stonebraker (XLDB 2012): „will hit a scalability wall“

9 Science SQL :: EGI-GEANT :: © 2014 P. Baumann Sample Application: Database Visualization select encode( struct { red: (char) s.image.b7[x0:x1,x0:x1], green: (char) s.image.b5[x0:x1,x0:x1], blue: (char) s.image.b0[x0:x1,x0:x1], alpha: (char) scale( d.elev, 20 ) }, "image/png" ) from SatImage as s, DEM as d [JacobsU, Fraunhofer; data courtesy BGS, ESA]

10 Science SQL :: EGI-GEANT :: © 2014 P. Baumann Use Case: Plymouth Marine Laboratory  “Avg chlorophyll concentration for given area & time period, from x/y/t cube” -10, 60,120, 240 days  Conclusions: -„we must minimise data transfer as well as [client] processing” -“standards such as WCPS provide the greatest benefit” [Oliver Clements, EGU 2014] rasdaman

11 Science SQL :: EGI-GEANT :: © 2014 P. Baumann Parallel / Distributed Query Processing Dataset B Dataset A Dataset D Dataset C select max((A.nir - A.red) / (A.nir + A.red)) - max((B.nir - B.red) / (B.nir + B.red)) - max((C.nir - C.red) / (C.nir + C.red)) - max((D.nir - D.red) / (D.nir + D.red)) from A, B, C, D  1 query  1,000+ cloud nodes

12 Science SQL :: EGI-GEANT :: © 2014 P. Baumann Secured Archive Integration First-ever direct, ad-hoc mix from protected NASA & ESA services in OGC WCS/WCPS Web client (EarthServer + CobWeb) WCPS

13 Science SQL :: EGI-GEANT :: © 2014 P. Baumann Use Case: Geo Filtering & Processing  OGC Web Coverage Processing Service (WCPS) = high-level geo raster query language; adopted 2008 -WCPS 2: all grid types: for $c in ( M1, M2, M3 ) where some( $c.nir > 127 ) return encode( $c.red - $c.nir, “image/tiff“ ) (tiff A, tiff C ) 13  "From MODIS scenes M1, M2, M3: difference between red & nir, as TIFF" …but only those where nir exceeds 127 somewhere

14 Science SQL :: EGI-GEANT :: © 2014 P. Baumann Next Step: Polygon Clipping  OGC WCS Application Profile – MetOcean [OGC 14-052]  Weather, ocean data cubes = 4D x/y/z/t datacubes -curtain queries, corridor queries = polygon clipping -use case: weather forecast along flight path

15 Science SQL :: EGI-GEANT :: © 2014 P. Baumann rasdaman: Practice Proven  from simple data access to agile analytics -strictly based on open OGC Big Geo Data standards   130+ TB databases, 2D, 3D x/y/z & x/y/t, 4D x/y/z/t timeseries  single query distributed to 1,000+ cloud nodes

16 Science SQL :: EGI-GEANT :: © 2014 P. Baumann A Brief History of Array Databases

17 Science SQL :: EGI-GEANT :: © 2014 P. Baumann  ISO 9075 Part 15: SQL/MDA -resolved by ISO SQL WG in June 2014 -Based on rasdaman concepts & experience  n-D arrays as attributes  declarative array operations select id, encode(scene.band1-scene.band2)/(scene.nband1+scene.band2)), „image/tiff“ ) from LandsatScenes where acquired between „1990-06-01“ and „1990-06-30“ and avg( scene.band3-scene.band4)/(scene.band3+scene.band4)) > 0 ISO „Science SQL“ create table LandsatScenes( id: integer not null, acquired: date, scene: row( band1: integer,..., band7: integer ) array [ 0:4999,0:4999] )

18 Science SQL :: EGI-GEANT :: © 2014 P. Baumann Conclusion  n-D Arrays a major datatype, central to science, engineering, business -Massive spatio-temporal sensor, image, simulation, statistics data  Query language = flexibility + scalability + information integration  ISO SQL/MDA a game-changer -Any question, anytime -Overcoming data/metadata divide  Visit us: -www.rasdaman.orgwww.rasdaman.org -www.earthserver.euwww.earthserver.eu

19 Science SQL :: EGI-GEANT :: © 2014 P. Baumann Why Irregular Tiling? [OpenStreetM ap] [Centrella et al: scidacreviews.org]

20 Science SQL :: EGI-GEANT :: © 2014 P. Baumann EarthServer: Agile Array Analytics  6 Lighthouse Applications covering Earth & Planetary Sciences -Established data centers adding EarthServer technology to service portfolio  Summer 2014: 200+ TB operational  Strictly open standards: OGC WCS & friends; common platform: rasdaman

21 Science SQL :: EGI-GEANT :: © 2014 P. Baumann  selection & section  result processing The rasql Query Language  search & aggregation  data format conversion rasdaman DB HDF PNG NetCDF HDF PNG NetCDF select c[ *:*, 100:200, *:*, 42 ] from ClimateSimulations as c select img * (img.green > 130) from LandsatArchive as img select mri from MRI as img, masks as am where some_cells( mri > 250 and m ) select png( c[ *:*, *:*, 100, 42 ] ) from ClimateSimulations as c

22 Science SQL :: EGI-GEANT :: © 2014 P. Baumann Query Rewriting select avg_cells( a + b ) from a, b select avg_cells( a ) + avg_cells( b ) from a, b avg + a b + ind b a ≡ Tile stream high traffic Scalar stream low traffic

23 Science SQL :: EGI-GEANT :: © 2014 P. Baumann select tiff( ht[ $1, *:*, *:* ] ) from HeadTomograms as ht, Hippocampus as mask where count_cells( ht > $2 and mask ) / count_cells( mask ) > $3  Research goal: to understand structural-functional relations in human brain  Experiments capture activity patterns (PET, fMRI) -Temperature, electrical, oxygen consumption,... -  lots of computations  „activation maps“  Example: “ a parasagittal view of all scans containing critical Hippocampus activations, TIFF-coded.“ Human Brain Imaging $1 = slicing position, $2 = intensity threshold value, $3 = confidence


Download ppt "Science SQL :: EGI-GEANT :: © 2014 P. Baumann EGI-GEANT Symposium 2014-sep-25, CWI, Amsterdam, The Netherlands Peter Baumann Jacobs University | rasdaman."

Similar presentations


Ads by Google