Aggregation and Subsetting in ERDDAP (a middleman data server) Bob Simons NOAA NMFS SWFSC ERD.

Slides:



Advertisements
Similar presentations
The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.
Advertisements

Multiplication and Division
Dr. Alexandra I. Cristea CS 252: Fundamentals of Relational Databases: SQL5.
James Gallagher OPeNDAP 1/10/14
Fixed and Variable Costs. Median income per household member in the U.S. in 2006 was in the range from: 1)$15,000-20,000 2)$20,000-25,000 3)$25,000-30,000.
Ncg | group about | navigator xlforecast
ETIS+: European Transport Policy Information System - Development and Implementation of Data Collection Methodology for EU Transport Modelling Funded by.
Chapter 12: Designing Databases
DAP, ERDDAP, and Tabular (Sequence) Datasets Try it: Bob Simons NOAA NMFS SWFSC ERD OBIS SOS Custom DAP ERDDAP...
DataTools Models Data, models and tools: Dealing with any complex hydraulic engineering problem invariable use is made of: data, models and tools.
OPeNDAP in the Cloud OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC Deirdre Byrne, Jefferson Ogata, John Relph 26 June 2013.
MLA Dataset Analyser solution 19 March 2008 Daniel Britton – Business analyst.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases Presented by Darren Gates for ICS 280.
Chapter 3 The Relational Model Transparencies © Pearson Education Limited 1995, 2005.
Chapter 3. 2 Chapter 3 - Objectives Terminology of relational model. Terminology of relational model. How tables are used to represent data. How tables.
Geographic Information Systems
Активное распределенное хранилище для многомерных массивов Дмитрий Медведев ИКИ РАН.
IQuOD Data Flow Tim Boyer NODC. Inflow How will IQuOD quality controlled data get into the World Ocean Database?
IST Databases and DBMSs Todd S. Bacastow January 2005.
Chapter 4 The Relational Model Pearson Education © 2014.
OPeNDAP and the Data Access Protocol (DAP) Original version by Dave Fulker.
Observing System Monitoring Center Integrating data and information across observing system networks.
Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “ An Introduction to Multidimensional Database Technology,
THREDDS Data Server Ethan Davis GEOSS Climate Workshop 23 September 2011.
OnLine Analytical Processing (OLAP)
Computer Science 101 Database Concepts. Database Collection of related data Models real world “universe” Reflects changes Specific purposes and audience.
Database Essentials. Key Terms Big Data Describes a dataset that cannot be stored or processed using traditional database software. Examples: Google search.
DAP4 James Gallagher & Ethan Davis OPeNDAP and Unidata.
1 DAPPER: An OPENDAP Server for In-Situ Data Joe Sirott Donald W. Denbo, Willa H Zhu University of Washington PMEL/NOAA.
NQuery: A Network-enabled Data-based Query Tool for Multi-disciplinary Earth-science Datasets John R. Osborne.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Indexes and Views Unit 7.
Benjamin Post Cole Kelleher.  Availability  Data must maintain a specified level of availability to the users  Performance  Database requests must.
DATA Spatial Data – where things are Non Spatial Data or Attribute Data – What things are Data in a computer database are managed and accessed through.
The Relational Model. 2 Relational Model Terminology u A relation is a table with columns and rows. –Only applies to logical structure of the database,
Mining real world data RDBMS and SQL. Index RDBMS introduction SQL (Structured Query language)
ERDDAP The Next Generation of Data Servers Bob Simons DOC / NOAA / NMFS / SWFSC / ERD Monterey, CA Disclaimer: The opinions expressed.
20 Copyright © 2008, Oracle. All rights reserved. Cache Management.
DOC / NOAA / NMFS / SWFSC / ERD
Chapter 4 The Relational Model Pearson Education © 2009.
NcBrowse: A Graphical netCDF File Browser Donald Denbo NOAA-PMEL/UW-JISAO
NQuery: A Network-enabled Data-based Query Tool for Multi-disciplinary Earth-science Datasets John R. Osborne 1, Kevin T. McHugh 2, and Donald W. Denbo.
1 SQL SERVER 2005 Express CE-105 SPRING 2007 Engr. Faisal ur Rehman.
Databases and DBMSs Todd S. Bacastow January
New ways in Big Data Management for NWP
New ways in Big Data Management for NWP
Spatial Data Activities at the Reading e-Science Centre
“It Slices, It Dices, and Area Subsets” SUMMARY AND FUTURE WORK
Geographic Information Systems
Database Management  .
Scale: Kilometers.
Client Access, Queries, Stored Procedures, JDBC
Implementing Data Models & Reports with Microsoft SQL Server
Armando Lacerda
The Server-Side with F-TDS
Database.
Chapter 4 The Relational Model Pearson Education © 2009.
Chapter 4 Indexes.
CH 4 Indexes.
CH 4 Indexes.
Armando Lacerda
Chapter 4 The Relational Model Pearson Education © 2009.
Chapter 4 The Relational Model Pearson Education © 2009.
Chapter 7 Using SQL in Applications
Chapter 4 The Relational Model Pearson Education © 2009.
Armando Lacerda
Scale: Kilometers.
Armando Lacerda
Comeaux and Worley, NSF/NCAR/SCD
Presentation transcript:

Aggregation and Subsetting in ERDDAP (a middleman data server) Bob Simons NOAA NMFS SWFSC ERD

Aggregating Gridded Data Aggregating time points: 10,000's of data files: sst[latitude][longitude] become one virtual dataset: sst[time][latitude][longitude] Aggregating variables: Many files with one variable per file become one virtual dataset with all variables

Subsetting Gridded Data OPeNDAP Projection Constraints sst[57:57][121:2:141][163:2:183] ERDDAP: sst[( )][(20):2:(40)][(-140):2:(-120)] Huge time-saver: User can just request what she needs (1%). Aggregated datasets need to be subset-able.

Aggregating In-Situ and Tabular Data A database-like table with rows and columns E.g., One file has data for one buoy for one month. It isn't a multi-dimensional grid. There are no dimensions. Aggregating features and time points: Features: stations, trajectories, profiles,... Append into a giant virtual table.

Subsetting In-Situ and Tabular Data OPeNDAP Selection Constraints (no indices, because no multi-dimensional grids) longitude,latitude,time,sst&sst>35 Easy to create. Uses domain units (degC). Very flexible. (Based on database's SQL SELECT.) Huge time-saver User can just request what she needs (1%). Aggregated datasets need to be subset-able.

Don't Treat In-Situ/Tabular Data Like Gridded Data CF DSG stores in-situ data as as gridded.nc Fine for storage, not for subsetting. Problem: Indices aren't domain units. How do you request sst>35 with indices? Problem: Indices aren't real-world sequence. Grid: lat[] is a sequence. lat[42:53] has meaning. Table: Buoy number isn't. &lat>20&lat<40 is buoy #2,14,26,109, not buoy[42:53] Problem: 5 CF DSG data structures.

Option: Treat Gridded Data Like Tabular Data Standard request: time, lat, lon bounding box What about unusual requests of gridded data, e.g., SST>35 ("Select by value") ERDDAP's EDDTableFromEDDGrid creates a giant virtual table from a gridded dataset. Columns: longitude, latitude, time, sst Query: e.g., longitude,latitude,time,sst&sst>35 Response: a table (one data point per row) Risk: huge effort for server.

Summary: Huge Advantages of Aggregation and Subsetting Users can find and deal with one aggregated dataset. Users can make one subset request to one aggregated dataset Grids: indices to get a temporal and spatial subset. Tables (selection constraints): any subset you want. (Not: one subset request to each unaggregated file, or worse, using FTP to download lots of entire files.) Don't treat tabular/in-situ data like gridded data.

Aggregation and Subsetting in ERDDAP (a middleman data server) Bob Simons NOAA NMFS SWFSC ERD