Printed by www.postersession.com STORING AND MANIPULATING GRIDDED DATA IN SPATIALLY-ENABLED DATABASES Adit Santokhee, Jon Blower, Keith Haines Reading.

Slides:



Advertisements
Similar presentations
Use of the SPSSMR Data Model at ATP 12 January 2004.
Advertisements

Database System Concepts and Architecture
17th February, 2000 by Maciej Korzeniowski (CERN-IT-IA-MI) 1 Oracle Discoverer Product Presentation  This is an ad hoc query and analysis tool for.
BARRODALE COMPUTING SERVICES LTD. Managing and serving large volumes of gridded spatial environmental data Adit Santokhee, Chunlei Liu,
Visibility Information Exchange Web System. Source Data Import Source Data Validation Database Rules Program Logic Storage RetrievalPresentation AnalysisInterpretation.
GI Systems and Science January 30, Points to Cover  Recap of what we covered so far  A concept of database Database Management System (DBMS) 
Technical Architectures
Dynamic Quick View, interoperability and the future Jon Blower, Keith Haines, Chunlei Liu, Alastair Gemmell Environmental Systems Science Centre University.
Time Series Analyst An Internet Based Application for Viewing and Analyzing Environmental Time Series Jeffery S. Horsburgh Utah State University David.
The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
Exploring large marine datasets using an interactive website and Google Earth Jon Blower, Dan Bretherton, Keith Haines, Chunlei Liu, Adit Santokhee Reading.
Активное распределенное хранилище для многомерных массивов Дмитрий Медведев ИКИ РАН.
Marine GIS Applications using ArcGIS Global Classroom training course Marine GIS Applications using ArcGIS Global Classroom training course By T.Hemasundar.
Chapter 1 Introduction to Databases
Database Management Systems (DBMS)
Distributed Data Analysis & Dissemination System (D-DADS) Prepared by Stefan Falke Rudolf Husar Bret Schichtel June 2000.
What is a database? Databases are designed to offer an organized mechanism for storing, managing and retrieving information.
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
Introduction to DBMS Purpose of Database Systems View of Data
Database Environment 1.  Purpose of three-level database architecture.  Contents of external, conceptual, and internal levels.  Purpose of external/conceptual.
Advance Computer Programming Java Database Connectivity (JDBC) – In order to connect a Java application to a database, you need to use a JDBC driver. –
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
 Introduction Introduction  Purpose of Database SystemsPurpose of Database Systems  Levels of Abstraction Levels of Abstraction  Instances and Schemas.
Database Server Extension for managing and querying 4D gridded spatiotemporal data Presented at the Edinburgh e-Science Institute Nov 1-2, 2005 conference.
Module Title? DBMS Introduction to Database Management System.
GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading.
Web-Enabled Decision Support Systems
DISTRIBUTED DATA FLOW WEB-SERVICES FOR ACCESSING AND PROCESSING OF BIG DATA SETS IN EARTH SCIENCES A.A. Poyda 1, M.N. Zhizhin 1, D.P. Medvedev 2, D.Y.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Chapter 1 : Introduction §Purpose of Database Systems §View of Data §Data Models §Data Definition Language §Data Manipulation Language §Transaction Management.
Integrated Grid workflow for mesoscale weather modeling and visualization Zhizhin, M., A. Polyakov, D. Medvedev, A. Poyda, S. Berezin Space Research Institute.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
Intro – Part 2 Introduction to Database Management: Ch 1 & 2.
BARRODALE COMPUTING SERVICES LTD. Spatial Data Activities at the Reading e-Science Centre Adit Santokhee, Jon Blower, Keith Haines Reading.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
The european ITM Task Force data structure F. Imbeaux.
Database Architectures Database System Architectures Considerations – Data storage: Where do the data and DBMS reside? – Processing: Where.
Composing workflows in the environmental sciences using Web Services and Inferno Jon Blower, Adit Santokhee, Keith Haines Reading e-Science Centre Roger.
Introduction to Database AIT632 Chapter 1 Sungchul Hong.
A High performance I/O Module: the HDF5 WRF I/O module Muqun Yang, Robert E. McGrath, Mike Folk National Center for Supercomputing Applications University.
Relational Database vs. Data Files By Willa Zhu JISAO/UW - PMEL/NOAA March 25, 2005.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
1 Chapter 1 Introduction to Databases Transparencies.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
SUPPORTING SQL QUERIES FOR SUBSETTING LARGE- SCALE DATASETS IN PARAVIEW SC’11 UltraVis Workshop, November 13, 2011 Yu Su*, Gagan Agrawal*, Jon Woodring†
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
John Pickford IBM H11 Wednesday, October 4, :30. – 14:30. Platform: Informix Practical Applications of IDS Extensibility (Part 2 of 2)
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Interface Development using SQL and Visual Basic for an Amazonian Soil, Vegetation, and Spectral Database Supported by Oracle The development of networking.
Information Technology: GrADS INTEGRATED USER INTERFACE Maps, Charts, Animations Expressions, Functions of Original Variables General slices of { 4D Grids.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
00/XXXX 1 Data Processing in PRISM Introduction. COCO (CDMS Overloaded for CF Objects) What is it. Why is COCO written in Python. Implementation Data Operations.
SCD Research Data Archives; Availability Through the CDP About 500 distinct datasets, 12 TB Diverse in type, size, and format Serving 900 different investigators.
Using Google Maps and other OpenSource GIS software for displaying geospatial data Jon Blower, Dan Bretherton, Keith Haines, Chunlei Liu, Adit Santokhee.
Towards Unifying Vector and Raster Data Models for Hybrid Spatial Regions Philip Dougherty.
Chapter 18 Object Database Management Systems. Outline Motivation for object database management Object-oriented principles Architectures for object database.
Introduction to Databases Dr. Osama AL Rababah. Objectives In this capture you will learn: Some common uses of database systems. The characteristics of.
Grid Remote Execution of Large Climate Models (NERC Cluster Grid) Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
NcBrowse: A Graphical netCDF File Browser Donald Denbo NOAA-PMEL/UW-JISAO
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Reading e-Science Centre Technical Director Jon Blower ESSC Director Rachel Harrison CS Director Keith Haines ESSC Associated Personnel External Collaborations.
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Spatial Data Activities at the Reading e-Science Centre
Ch 4. The Evolution of Analytic Scalability
MANAGING DATA RESOURCES
Presentation transcript:

printed by STORING AND MANIPULATING GRIDDED DATA IN SPATIALLY-ENABLED DATABASES Adit Santokhee, Jon Blower, Keith Haines Reading e-Science Centre, Environmental Systems Science Centre Modern computer simulations and satellite observations of the oceans and atmosphere produce large amounts of data on the terabyte scale. Data providers, such as the Met Office and the European Centre for Medium-Range Weather Forecasts, need a manageable system for storing these datasets, whilst enabling the many consumers of the data to access them in a convenient and secure manner. Typically, these datasets are stored in flat files (often compressed) and each institution tends to store data in its own format (e.g., NetCDF, HDF, GRIB) with the data discretized on a variety of grids. End-users of the data (which include research institutions, government agencies and private industry) should not have to know the details of how the data are stored. They require a flexible means of accessing data and downloading them in the form they prefer. A typical query might involve the extraction of a subset of data from multiple source files, interpolation, aggregation and re-projection on a new grid. There is increasing justification for using database management systems (DBMSs) to store and manipulate gridded data. The principal advantages of such databases are data integrity, consistency, flexibility and effective access to data by diverse users of multiple applications. Implementing an efficient DBMS for large quantities of gridded data is very challenging. Barrodale Computing Services Ltd. (BCS) have recently developed a software module (the BCS Grid DataBlade), that plugs into the IBM/Informix Dynamic Server 9.x (IDS) DBMS, for storage of gridded data and efficient retrieval of data products. The Reading e-Science Centre are evaluating this system on behalf of the environmental science community.  Processes queries on the database server, thereby minimizing the amount of network input/output and client- side CPU time required  Extracts data products up to times faster than previous technology  Handles 1D, 2D, 3D and 4D grids  Stores grids using a tiling scheme in conjunction with Smart BLOBS, with user control over the tile size. This allows very efficient generation of data products that involve only a small portion of the data  Stores the data in, and converts it between, more than 40 different planar mapping projections supported by the IBM/Informix Spatial DataBlade  Supports irregularly spaced grids in any or all of the grid dimensions  Handles the presence of multiple vector and/or scalar values at each grid point  Provides interpolation options using N-Linear, nearest-neighbour or user-supplied interpolation schemes  Extraction can be at any angle through the 4D volume  Native Import/Export format is NetCDF; conventions defined in Grid Import-Export format (GIEF)  Provides application programming interfaces for C, Java and SQL Introduction Features of the Grid DataBlade Example Uses Some Applications Progress Made So Far We have successfully used the Grid DataBlade to store about 12 GB Forecasting Ocean Assimilation Model (FOAM) data (temperature and salinity) in an Informix Database. Then, we tested the functionalities of the Grid DataBlade: extracting data, updating a grid, generating temperature timeseries involving extracting data from multiple grids and exporting data to files or for visualisation. These experiments were carried out using programs written in SQL, Java and the Native interfaces offered by the DataBlade and Informix APIs respectively. Future Work execute procedure grdfromgief(“pathname”,”table name”); Loading a GIEF file into a table Extracting a subset of the grid The following expression generates a timeseries for temperature at latitude 50, longitude -30 at a 5 m depth level between 1 st January 2004 to 30 th June 2004 from grids stored in the database: select GRDExtract (grid, '((translation – ) (dim_names time depth lat lon)(dim_sizes ) (affine_transformation ) (nonuniform time 7305 …… 7480)(nonuniform depth 5)(interpolation (time linear)))'::grdspec) from foamvar where grid_id <= 6;  Carrying out some detailed experiments to determine the performance of the DataBlade compared to traditional file based data access.  The ability to make threshold type of queries directly on the database server. For instance, the possibility to find all the regions where the temperature is above/below a certain value.  Creating virtual datasets. For example, density could be calculated on the database server using temperature and salinity data which are already stored in the database.  Adding some new functionality for answering queries of the form: what values of salinity correspond to a particular temperature, given I have a grid containing salinity and temperature ? 40 m 75 m 90 m Depth 52.3 m X Y The above metadata describes a grid storing temperature data for the FOAM eighth degree at various levels and times (denoted by nonunisample1 and nonunisample2 respectively). The starting point of the grid is at longitude and latitude 10. Each dimension has a set of basis vectors which tells us which axis varies fastest and by how much. In this case longitude varies fastest with degrees spacing. The following expression exports a grid of temperature that begins at latitude –89.0, longitude 0 and extended to latitude 89, longitude 360, every one degree sampled at level 5 at time 6940 (1 st January 2003) to a GIEF file : Select grdrowtogief('${curdir}/Tempvar.nc', ‘foamvar', rowid, '((translation ) (dim_names time depth lat lon)(dim_sizes )(affine_transformation )(nonuniform time 6940) (nonuniform depth 5))'::grdspec) from foamvar where grid_id = 13; Exporting a grid to a file The U.S. National Library of Medicine granted BCS access to their Visible Human Project consisting of 1,871 parallel high-resolution coloured images of a male cadaver. BCS then subsampled the data to form a 1.6-gigabyte 3D gridded dataset. Users can query the Grid DataBlade on the BCS Web site to extract 2D slices of a human cross-sections. U.S. Navy Pilots can train on real-life scenarios, including forecasted weather patterns, visibility, wind speed and direction using PC-based flight simulation software. The Grid DataBlade extracts time- significant, location-specific weather data from a four dimensional gridded dataset housed in IDS, Version 9.3 and passes it to trainees running the flight simulation on a PC. BARRODALE COMPUTING SERVICES LTD. Acknowledgements We are grateful to Ian Barrodale and Cedric Zala from Barrodale Computing Services Ltd. for kindly providing us an evaluation version of the Grid DataBlade and for assistance in using it. Special thanks also go to John Pickford from IBM for providing us a copy of the Informix Dynamic Sever and for support. References 1. Barrodale Computing Services Ltd., 2002: Storing and manipulating gridded data in databases. Online: 2. IBM, 2002: BCS speeds access to gridded data 100-fold with IBM Informix Dynamic Server. Online: