BARRODALE COMPUTING SERVICES LTD. www.barrodale.com Spatial Data Activities at the Reading e-Science Centre Adit Santokhee, Jon Blower, Keith Haines Reading.

Slides:



Advertisements
Similar presentations
EURO4M Project Kick-Off, April 2010 OGC Web Services Data visualization using OGC web services Maarten Plieger Wim Som de Cerff Royal Netherlands Meteorological.
Advertisements

Netlobs Manipulating Gridded Data in a Relational World Neil Stamps Technical Architect.
BADC Workshop 1: Data & Services from the BADC Royal Met. Soc. Conference – 12 September 2005 Kevin Marsh et al.
BARRODALE COMPUTING SERVICES LTD. Managing and serving large volumes of gridded spatial environmental data Adit Santokhee, Chunlei Liu,
GI Systems and Science January 30, Points to Cover  Recap of what we covered so far  A concept of database Database Management System (DBMS) 
ArcGIS Geodatabase Miles Logsdon Spatial Information Technologies, UW Garry Trudeau - Doonesbury.
Raster Data in ArcSDE 8.2 Why Put Images in a Database? What are Basic Raster Concepts? How Raster data stored in Database?
Dynamic Quick View, interoperability and the future Jon Blower, Keith Haines, Chunlei Liu, Alastair Gemmell Environmental Systems Science Centre University.
The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,
An Internet Tool For Forecasting Land Use Change And Land Degradation In The Mediterranean Region Richard Kingston & Andy Turner University of Leeds UK.
Exploring large marine datasets using an interactive website and Google Earth Jon Blower, Dan Bretherton, Keith Haines, Chunlei Liu, Adit Santokhee Reading.
Printed by STORING AND MANIPULATING GRIDDED DATA IN SPATIALLY-ENABLED DATABASES Adit Santokhee, Jon Blower, Keith Haines Reading.
The International Surface Pressure Databank (ISPD) and Twentieth Century Reanalysis at NCAR Thomas Cram - NCAR, Boulder, CO Gilbert Compo & Chesley McColl.
Christine White, Esri Growing OPeNDAP Support: Current ArcGIS Workflows and Future Directions Christine White, Esri
Активное распределенное хранилище для многомерных массивов Дмитрий Медведев ИКИ РАН.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
Unidata TDS Workshop THREDDS Data Server Overview October 2014.
Introduction Downloading and sifting through large volumes of data stored in differing formats can be a time-consuming and sometimes frustrating process.
The use of standard OGC web services in integrating distributed model, satellite and in-situ datasets Alastair Gemmell Jon Blower Keith Haines Environmental.
EGU 2011 TIGGE, TIGGE LAM and the GIFS T. Paccagnella (1), D. Richardson (2), D. Schuster(3), R. Swinbank (4), Z. Toth (3), S.
The importance of locality in the visualization of large datasets John Brooke 1, James Marsh 1, Steve Pettipher 1, Lakshmi Sastry 2 1 The University of.
 Introduction Introduction  Purpose of Database SystemsPurpose of Database Systems  Levels of Abstraction Levels of Abstraction  Instances and Schemas.
GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading.
DISTRIBUTED DATA FLOW WEB-SERVICES FOR ACCESSING AND PROCESSING OF BIG DATA SETS IN EARTH SCIENCES A.A. Poyda 1, M.N. Zhizhin 1, D.P. Medvedev 2, D.Y.
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
Rendering Adaptive Resolution Data Models Daniel Bolan Abstract For the past several years, a model for large datasets has been developed and extended.
WSRF Supported Data Access Service (VO-DAS)‏ Chao Liu, Haijun Tian, Dan Gao, Yang Yang, Yong Lu China-VO National Astronomical Observatories, CAS, China.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
THREDDS Data Server Ethan Davis GEOSS Climate Workshop 23 September 2011.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness TA Weijing Chen Semantic eScience Week 10, November 7, 2011.
Animation of DSM2 Outputs in ArcMap Siqing Liu Bay Delta Office Department of Water Resources 2/17/2015.
Slide 1 TIGGE phase1: Experience with exchanging large amount of NWP data in near real-time Baudouin Raoult Data and Services Section ECMWF.
DELIVERING ENVIRONMENTAL WEB SERVICES (DEWS) Partners: UK Met Office (Lead Partner), British Atmospheric Data Centre (BADC), British Maritime Technology.
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Integrated Grid workflow for mesoscale weather modeling and visualization Zhizhin, M., A. Polyakov, D. Medvedev, A. Poyda, S. Berezin Space Research Institute.
DAP4 James Gallagher & Ethan Davis OPeNDAP and Unidata.
Unidata TDS Workshop THREDDS Data Server Overview
Composing workflows in the environmental sciences using Web Services and Inferno Jon Blower, Adit Santokhee, Keith Haines Reading e-Science Centre Roger.
Recent developments with the THREDDS Data Server (TDS) and related Tools: covering TDS, NCML, WCS, forecast aggregation and not including stuff covered.
Requests from F. Blanc Retrieve information from each individual national reports and present this. –2. Data serving –The HYCOM products are freely available.
Data Discovery and Access to The International Surface Pressure Databank (ISPD) 1 Thomas Cram Gilbert P. Compo* Doug Schuster Chesley McColl* Steven Worley.
The CERA2 Data Base Data input – Data output Hans Luthardt Model & Data/MPI-M, Hamburg Services and Facilities of DKRZ and Model & Data Hamburg,
THREDDS Catalogs Ethan Davis UCAR/Unidata NASA ESDSWG Standards Process Group meeting, 17 July 2007.
NetCDF file generated from ASDC CERES SSF Subsetter ATMOSPHERIC SCIENCE DATA CENTER Conversion of Archived HDF Satellite Level 2 Swath Data Products to.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
The PHysics Analysis SERver Project (PHASER) CHEP 2000 Padova, Italy February 7-11, 2000 M. Bowen, G. Landsberg, and R. Partridge* Brown University.
AUKEGGSWorkshop ANU, Canberra, 29 November 2006 Implementing CSML Feature Types in applications within the NERC DataGrid Dominic Lowe, British Atmospheric.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
00/XXXX 1 Data Processing in PRISM Introduction. COCO (CDMS Overloaded for CF Objects) What is it. Why is COCO written in Python. Implementation Data Operations.
SCD Research Data Archives; Availability Through the CDP About 500 distinct datasets, 12 TB Diverse in type, size, and format Serving 900 different investigators.
Using Google Maps and other OpenSource GIS software for displaying geospatial data Jon Blower, Dan Bretherton, Keith Haines, Chunlei Liu, Adit Santokhee.
ERDDAP The Next Generation of Data Servers Bob Simons DOC / NOAA / NMFS / SWFSC / ERD Monterey, CA Disclaimer: The opinions expressed.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
UC 2006 Tech Session 1 NetCDF in ArcGIS 9.2. UC 2006 Tech Session2 Overview Introduction to Multidimensional DataIntroduction to Multidimensional Data.
British Atmospheric Data Centre ( Searching: Whither NDG? Bryan Lawrence.
Grid Remote Execution of Large Climate Models (NERC Cluster Grid) Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre
Data Discovery and Access to The International Surface Pressure Databank (ISPD) 1 Thomas Cram Gilbert P. Compo* Doug Schuster Chesley McColl* Steven Worley.
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
NcBrowse: A Graphical netCDF File Browser Donald Denbo NOAA-PMEL/UW-JISAO
1. Gridded Data Sub-setting Services through the RDA at NCAR Doug Schuster, Steve Worley, Bob Dattore, Dave Stepaniak.
Reading e-Science Centre Technical Director Jon Blower ESSC Director Rachel Harrison CS Director Keith Haines ESSC Associated Personnel External Collaborations.
DELIVERING ENVIRONMENTAL WEB SERVICES (DEWS)
Spatial Data Activities at the Reading e-Science Centre
MERRA Data Access and Services
TIGGE Archives and Access
Storage Structure and Efficient File Access
ECMWF usage, governance and perspectives
Presentation transcript:

BARRODALE COMPUTING SERVICES LTD. Spatial Data Activities at the Reading e-Science Centre Adit Santokhee, Jon Blower, Keith Haines Reading e-Science Centre University of Reading

BARRODALE COMPUTING SERVICES LTD. Background  At Reading we hold copies of various datasets (~2TB) –Mainly from models of oceans and atmosphere –Also some observational data (e.g. satellite data) –From Met Office, SOC, ECMWF, more  Most of these datasets are in the form of files –Datasets are in a variety of formats (e.g. NetCDF, GRIB, HDF5) –Large 4D spatio-temporal grids –Contain data about many variables (e.g. temperature, salinity etc) –Data are discretised on a number of different grids (e.g. standard lat-lon grids of different resolutions, rotated grids)

BARRODALE COMPUTING SERVICES LTD. Background (2)  Hence development of GADS (Grid Access Data Service) –Developed as part of GODIVA project (Grid for Ocean Diagnostics, Interactive Visualisation and Analysis NERC e-Science pilot project) –Originally developed by Woolf et al (2003)  Database systems now include capability for storing geospatial data –ReSC has also been evaluating Informix with Grid DataBlade solution –Investigating whether data can be managed and served more efficiently by being stored in DB  Re-engineering GADS as OGC-compliant Web Coverage Server (as part of the DEWS project)

BARRODALE COMPUTING SERVICES LTD. Grid Access Data Service  User’s don’t need to know anything about storage details  Can expose data with conventional names without changing data files  Users can choose their preferred data format, irrespective of how data are stored  Behaves as aggregation server –Delivers single file, even if original data spanned several files  Deployed as a Web Service –Can be called from any platform/language –Can be called programmatically (easily incorporated into larger systems), workflows –Java / Apache Axis / Tomcat  Doesn’t support “advanced” features like interpolation, re- projection, etc  However, it is not standards-compliant –Hence moving to WCS in DEWS

BARRODALE COMPUTING SERVICES LTD. GODIVA Web Portal ( Allows users to interactively select data for download using a GUI Users can create movies on the fly cf. Live Access Server

BARRODALE COMPUTING SERVICES LTD. The Grid DataBlade  Plug-in for the IBM Informix database (also version for PostgreSQL)  Written, supplied and supported by Barrodale Computing Services  Stores gridded data and metadata in an object-relational DBMS –Any data on a regular grid, not just met-ocean –Stores grids using a tiling scheme in conjunction with Smart BLOBS –Manages own (low-level) metadata automatically  Provides functions to load data directly from GIEF (Grid Import Export Format) file format (netCDF) –Metadata is automatically read from the GIEF file  Provides functions to extract data from the grids –Extractions can be sliced, subsetted or at oblique angles to the original axes

BARRODALE COMPUTING SERVICES LTD. Main features of the Grid DataBlade  Can store: –1D: timeseries, vectors –2D: raster images, arrays –3D: spatial volumes, images at different times –4D: volumes at different times –5D: 4D grids with a set of variables at each 4D point  Provides interpolation options using N-Linear, nearest- neighbour or user-supplied interpolation schemes  Provides C, Java and SQL APIs

BARRODALE COMPUTING SERVICES LTD. Example use of Grid DataBlade  extract data along a path e.g. along a ship track that involves many “legs”. The DataBlade automatically does interpolation along the path. Using North Atlantic FOAM Data

BARRODALE COMPUTING SERVICES LTD. Testing: notes  We used the UK Met Office operational North Atlantic marine forecast dataset, which has a total size of 100 GB –The data are stored under GADS as a set of NetCDF files and another copy is held in the Informix database (1 NetCDF file per time step in GADS) –Data ordering is (time, depth, latitude, longitude)  GADS is just an interface: we are really comparing the DataBlade with the latest version of the Java NetCDF API  We also tested our OPeNDAP aggregation server –Much slower than both GADS and DataBlade –Probably because it was based on earlier NetCDF API –Would expect performance to be close to that of GADS with same API. –Detailed results not reported here  Reported times include time to parse query, search metadata, extract data and produce data product

BARRODALE COMPUTING SERVICES LTD. Preliminary tests results Shape of extracted data : 50 * increasing depth * 50 * 50 DataBlade is faster than GADS for small data extraction GADS : 50 FILES DataBlade : Single SBLOB

BARRODALE COMPUTING SERVICES LTD. Preliminary tests results Shape of extracted data : 50 * increasing depth * 200 * 200 DataBlade’s performance declines around 40 MB point GADS : 50 FILES DataBlade : Single SBLOB

BARRODALE COMPUTING SERVICES LTD. Preliminary tests results GADS : Max 360 FILES DataBlade : Max 12 SBLOBs (1 blob = 30 Files) Shape of extracted data : fixed depth level but different lat-lon space DataBlade’s performance declines around 40 MB point (120 files)

BARRODALE COMPUTING SERVICES LTD. Preliminary tests results Shape for 360 Files : 360 * 1 * lat * lon Shape for 30 Files : 30 *10 * lat * lon

BARRODALE COMPUTING SERVICES LTD. Conclusions (1)  We found that in general, for extracted data volumes below 10MB, the database outperformed GADS  Above 40MB, GADS was generally found to be capable of the fastest extractions  The performance of the DataBlade decreased dramatically when attempting to extract more than 100MB of data in a single query  With DataBlade, extracting large amounts of data from a single blob is also faster than extracting similar sized data from multiple blobs but the difference is small for small areas –Similarly with GADS the extraction time increases with the number of source files involved

BARRODALE COMPUTING SERVICES LTD. Conclusions (2)  The Grid DataBlade is optimised to support its entire feature set; in particular, it is optimised to retrieve relatively small (a few tens of megabytes) of data rapidly in the case where multiple users are querying the database simultaneously –Multiple simultaneous connections not yet tested  Data archives (where users typically download large amounts of data at a time) most efficiently implemented using something like GADS  Data browse tools (where one needs to quickly generate pictures of small subsets of data in different projections) best implemented using something like the DataBlade

BARRODALE COMPUTING SERVICES LTD. Outstanding Issues As it stands, the WCS specification is probably inadequate for our needs:  We will probably output in CF-NetCDF even though this isn't (yet) in the specification  Delivery of large amounts of data (perhaps URL-based delivery)  the WCS spec assumes that the data are delivered immediately after the request, but this won't always be possible  How to handle rotations and "query by value“ in WCS ?  We have started to talk to OGSA-DAI but have not yet decided how much they can help us. Maybe OGSA-DAI should think about including a WCS implementation for spatial data ?