DAP, ERDDAP, and Tabular (Sequence) Datasets Try it: Bob Simons NOAA NMFS SWFSC ERD OBIS SOS Custom DAP ERDDAP...

Slides:



Advertisements
Similar presentations
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advertisements

Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
LIS618 lecture 2 Thomas Krichel Structure Theory: information retrieval performance Practice: more advanced dialog.
12 Copyright © 2005, Oracle. All rights reserved. Query Rewrite.
The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.
1 Web-Enabled Decision Support Systems Access Introduction: Touring Access Prof. Name Position (123) University Name.
Creating Tables. 2 home back first prev next last What Will I Learn? List and provide an example of each of the number, character, and date data types.
Software Requirements
Database System Concepts and Architecture
CASDA Virtual Observatory CSIRO ASTRONOMY AND SPACE SCIENCE Arkadi Kosmynin 11 March 2014.
Presented by Douglas Greer Creating and Maintaining Business Objects Universes.
VistaShare Reports How to run the FaCT Main Reports Left click to advance the presentation.
Proposed update of Technical Guidance for INSPIRE Download services based on SOS Matthes Rieke, Dr. Albert Remke (m.rieke, 52°North.
What is a Database By: Cristian Dubon.
Chapter 12 Working with Forms Principles of Web Design, 4 th Edition.
Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.
Aggregation and Subsetting in ERDDAP (a middleman data server) Bob Simons NOAA NMFS SWFSC ERD.
System Design and Memory Limits. Problem  If you were integrating a feed of end of day stock price information (open, high, low, and closing price) for.
ETEC 100 Information Technology
Chapter 11 Data Management Layer Design
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Information systems and databases Database information systems Read the textbook: Chapter 2: Information systems and databases FOR MORE INFO...
PHP and MySQL Code Reuse, OO, Error Handling and MySQL (Intro)
Rebecca Boger Earth and Environmental Sciences Brooklyn College.
Automation Repository - QTP Tutorials Made Easy The Zero th Step TEST AUTOMATION AND QTP.
IST Databases and DBMSs Todd S. Bacastow January 2005.
Overview of the ODP Data Provider Sergey Sukhonosov National Oceanographic Data Centre, Russia Expert training on the Ocean Data Portal technology, Buenos.
Introduction to Object-oriented Programming CSIS 3701: Advanced Object Oriented Programming.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Introduction to database systems
Data File Access API : Under the Hood Simon Horwith CTO Etrilogy Ltd.
MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
ITGS Case Study Theatre Booking System Ayushi Pradhan.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
OPS-15: What was Happening with My Database, AppServer ™, OS... Yesterday, Last Month, Last Year? Libor LaubacherRuanne Cluer Principal Tech Support Engineer.
IS201 Agenda: 10/15/2013 Do form and report exercise. Identify general guidelines for form and report design. Discuss a few key points about reports in.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Designing Applications for Performance Appropriate I/O for Specific Task Minimize all Initiation and Termination Design Everything to be “Interactive”
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
SQL Fundamentals  SQL: Structured Query Language is a simple and powerful language used to create, access, and manipulate data and structure in the database.
IOOS Data Services with the THREDDS Data Server Rich Signell USGS, Woods Hole IOOS DMAC Workshop Silver Spring Sep 10, 2013 Rich Signell USGS, Woods Hole.
IOOS National Glider Data Assembly Center
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Indexes and Views Unit 7.
PwC New Technologies New Risks. PricewaterhouseCoopers Technology and Security Evolution Mainframe Technology –Single host –Limited Trusted users Security.
Database Basics BCIS 3680 Enterprise Programming.
DATA Spatial Data – where things are Non Spatial Data or Attribute Data – What things are Data in a computer database are managed and accessed through.
ERDDAP The Next Generation of Data Servers Bob Simons DOC / NOAA / NMFS / SWFSC / ERD Monterey, CA Disclaimer: The opinions expressed.
DOC / NOAA / NMFS / SWFSC / ERD
ASET 1 Amity School of Engineering & Technology B. Tech. (CSE/IT), III Semester Database Management Systems Jitendra Rajpurohit.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
External Data Access 5/29/08. Current Problems No way to load, process & analyze live Atlas data via critical analysis & programming tools (SAS, R, Perl)
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
ODP V2 Data Provider overview. 22 Scope Data Provider provides access to data and metadata of the local data systems. Data Provider is a wrapper, installed.
Databases and DBMSs Todd S. Bacastow January
DBMS & TPS Barbara Russell MBA 624.
An Refresher and How-To Profile Data using SQL
CHP - 9 File Structures.
Chapter 1: Introduction
Flanders Marine Institute (VLIZ)
Software Design and Architecture
Chapter 2 Database Environment.
Updating GML datasets S-100 WG TSM September 2017

Presentation transcript:

DAP, ERDDAP, and Tabular (Sequence) Datasets Try it: Bob Simons NOAA NMFS SWFSC ERD OBIS SOS Custom DAP ERDDAP... Database ERDDAP Files Your Favorite Client Software

My Goals for this Presentation 1.Tell you more about ERDDAP. 2.Raise awareness and appreciation of tabular data. 3.Convince you that tabular datasets are best served as DAP sequences. And that serving them in DAP as 1D or 2D gridded datasets is a bad idea. (This has nothing to do with how they are stored.) Bonus: 3 powerful ideas: 1.Abstractions (capture the essence; hide the instance details) 2.Representations (different file formats) 3.Reusability (value is multiplied)

1) ERDDAP

ERDDAP Features (Re)serves diverse local and remote datasets Abstraction: thanks to DAP, the source differences are hidden. Serves gridded and tabular datasets Offers a unified place to search for datasets Full-text, category-based, or advanced. Encourages improved metadata So users can understand the dataset. Offers a standard way to request data from any dataset For humans: forms on web pages. For computers: DAP, WMS, (SOS) web services. Offers a choice of response file formats Different representations Standardizes time formats (Here, different representations are trouble.) As Strings - ISO 8601:2004(E), e.g., T20:00:00Z As numbers - seconds since T00:00:00Z Is reusable.

2) Tabular Data

Tabular Datasets Tabular data sources: databases, OBIS, SOS, CSV files, flat.nc files, CF DSG.nc files,... Geospatial CF Discrete Sampling Geometry (DSG) feature types: Point: whale sightings Profile: disposable CTD TimeSeries: moored buoy TimeSeriesProfile: CTD Trajectory: ship TrajectoryProfile: profiling glider Non-Geospatial laboratory data, references, fish disease lists, ecosystem: what eats what,... Larry Ellison is rich because databases are reusable for numerous types of data.

(ERD)DAP Data Requests: Gridded vs. Tabular Datasets Gridded Datasets (DAP projection constraints) DAP: ?temperature[437] [46:1:162][122:282] ERDDAP: ?temperature[( )][(22):(51)][(-145):(-105)] Tabular Datasets (DAP selection constraints) DAP: ?s.id,s.owner,s.time,s.latitude,s.longitude,s.wtemp&s.id="sp031"&s.time>= ERDDAP: ?id,owner,time,latitude,longitude,wtemp&id="sp031"&time>= idownertypetimelatitudelongitudewtempatmp 46088NDBC3m Discus T14:20:00Z NDBC3m Discus T14:50:00Z SANF1SFSUC-MAN T16:00:00Z SANF1SFSUC-MAN T17:00:00Z

(ERD)DAP Sequence Requests vs. Database SQL Requests (ERD)DAP: ?id,owner,type,time,latitude,longitude,wtemp&id="46088"&time>= SQL: SELECT id,owner,type,time,latitude,longitude,wtemp FROM s WHERE id="46088" AND time>= Pablo Picasso: "Good artists copy, great artists steal."

Related Tables vs. One Table idownertypelatitudelongitudetimewtempatmp 46088NDBC3m Discus T14:20:00Z NDBC3m Discus T14:50:00Z NC312NCSUC-MAN T16:00:00Z NC312NCSUC-MAN T17:00:00Z idtimewtempatmp T14:20:00Z T14:50:00Z NC T16:00:00Z NC T17:00:00Z idownertypelatitudelongitude 46088NDBC3m Discus NDBC6m Discus BP114BP3m DIscus NC312NCSUC-MAN Join (Denormalized) Buoy Table Observation Table Normalized

Yeah, but why doesn't ERDDAP support nested sequences? It does, but just internally. ERDDAP (re)presents the dataset as a single table. One table is an abstraction. It hides details. The average user understands a table. One vs. many tables: just different representations. This lets all tabular datasets have the same structure. The results of a DAP or SQL query is always one table. There are many file format representations of one table.

3) Tabular datasets are best served as DAP sequences. (Why DAP Sequences Rock!) And that serving them in DAP as 1D or 2D gridded datasets is a bad idea. (This has nothing to do with how they are stored.)

Why Sequences Rock! Reason #1 If the data is coming from a relational database, OBIS, or SOS, the dataset can't be served as a gridded dataset. There are no index (row) numbers. It isn't easy/possible to know how many rows there are. The order of the rows may change at any time. New rows are added as new data arrives: frequently.

Why Sequences Rock! Reason #2 Serving tabular data in DAP as 1D or 2D gridded datasets is a bad idea. Logic: Men:mortal. Socrates:man. Socrates:mortal. Grids:handled well by DAP. Treat table as:grid. Treat table as grid:handled well? Grid dimensions usually represent a physical continuum. DAP: ?temperature[408:437][46:1:162][122:282] ERDDAP: ?temperature[( ):( )][(22):(51)][(-145):(-105)] No arrangement of tabular dataset dimensions works well 2D [buoy][time]: buoy is not a continuum, time leads to wasted space 1D [time]: fine, but then you need 1000 datasets (1 per buoy) 1D [row]: aggregated, but row isn't a continuum. In every case, it's hard to know which rows to request. The rows you want are scattered through the dataset. so you have to either download everything or make numerous requests. Serving a DSG file directly: too many formats, too hard to query.

Why Sequences Rock! Reason #3 DAP sequence requests use the terminology of the dataset. (It's easy.) ?id,owner,type,latitude,longitude&distinct() ?id,type,latitude,longitude&owner="NDBC"&distinct() ?id&latitude>=22&latitude =-145&longitude<=-105&distinct() ?id&latitude>=22&latitude =-145&longitude = &distinct() ?&latitude>=22&latitude =-145&longitude = indexidownertypelatitudelongitudetimewtempatmp NDBC3m Discus T14:20:00Z NDBC3m Discus T14:50:00Z BP114BP3m Discus T02:00:00Z BP114BP3m discus T04:00:00Z NC312NCSUC-MAN T16:00:00Z NC312NCSUC-MAN T17:00:00Z NDBC6m Discus T14:20:00Z NDBC6m Discus T14:50:00Z Making these requests with index numbers is a difficult (not for Roberto), multi-step, programming task. And it's inefficient.

Why Sequences Rock! Reason #4 Because declarative languages (SQL, DAP selection constraints) let you describe what you want, not how to get it. ?id,owner,type,latitude,longitude&distinct() ?id,type,latitude,longitude&owner="NDBC"&distinct() ?id&latitude>=22&latitude =-145&longitude =22&latitude =-145&longitude = &distinct() ?&latitude>=22&latitude =-145&longitude = With imperative languages (C, Fortran, Java, Python), you must describe, step-by- step, how to solve the problem. 1) Request all latitudes. 2) Filter 3) Request all longitudes. 4) Multiple requests because data is scattered throughout the dataset.

Why Sequences Rock! Reason #5 Because the other options all suck. Serving the datasets as grids doesn't work. You now understand why, right? Serve the data files via FTP. Getting a chunk of data is all or nothing. Makes user deal with various file formats. Custom forms and web services are too much work to make. Custom: 6+ months per dataset? Ongoing maintenance. No consistency! Reusable: 1 day, minimal maintenance, consistent! Give trusted colleagues access to the database or the files. That's not making the data public! Don't let anyone else use the data. This is actually the #1 method of fisheries data distribution.

My Goals for this Presentation 1.Tell you more about ERDDAP. 2.Raise awareness and appreciation of tabular data. 3.Convince you that tabular datasets are best served as DAP sequences. And that serving them in DAP as 1D or 2D gridded datasets is a bad idea. (This has nothing to do with how they are stored.) Bonus: 3 powerful ideas: 1.Abstractions (capture the essence; hide the instance details) 2.Representations (different file formats) 3.Reusability (value is multiplied)

Thank you!