Download presentation
Presentation is loading. Please wait.
Published byAdrian Lavis Modified over 9 years ago
1
DAP, ERDDAP, and Tabular (Sequence) Datasets Try it: http://coastwatch.pfeg.noaa.gov/erddap Bob Simons NOAA NMFS SWFSC ERD OBIS SOS Custom DAP ERDDAP... Database ERDDAP Files Your Favorite Client Software
2
My Goals for this Presentation 1.Tell you more about ERDDAP. 2.Raise awareness and appreciation of tabular data. 3.Convince you that tabular datasets are best served as DAP sequences. And that serving them in DAP as 1D or 2D gridded datasets is a bad idea. (This has nothing to do with how they are stored.) Bonus: 3 powerful ideas: 1.Abstractions (capture the essence; hide the instance details) 2.Representations (different file formats) 3.Reusability (value is multiplied)
3
1) ERDDAP
6
ERDDAP Features (Re)serves diverse local and remote datasets Abstraction: thanks to DAP, the source differences are hidden. Serves gridded and tabular datasets Offers a unified place to search for datasets Full-text, category-based, or advanced. Encourages improved metadata So users can understand the dataset. Offers a standard way to request data from any dataset For humans: forms on web pages. For computers: DAP, WMS, (SOS) web services. Offers a choice of response file formats Different representations Standardizes time formats (Here, different representations are trouble.) As Strings - ISO 8601:2004(E), e.g., 2014-07-01T20:00:00Z As numbers - seconds since 1970-01-01T00:00:00Z Is reusable.
7
2) Tabular Data
8
Tabular Datasets Tabular data sources: databases, OBIS, SOS, CSV files, flat.nc files, CF DSG.nc files,... Geospatial CF Discrete Sampling Geometry (DSG) feature types: Point: whale sightings Profile: disposable CTD TimeSeries: moored buoy TimeSeriesProfile: CTD Trajectory: ship TrajectoryProfile: profiling glider Non-Geospatial laboratory data, references, fish disease lists, ecosystem: what eats what,... Larry Ellison is rich because databases are reusable for numerous types of data.
9
(ERD)DAP Data Requests: Gridded vs. Tabular Datasets Gridded Datasets (DAP projection constraints) DAP: ?temperature[437] [46:1:162][122:282] ERDDAP: ?temperature[(2014-07-01)][(22):(51)][(-145):(-105)] Tabular Datasets (DAP selection constraints) DAP: ?s.id,s.owner,s.time,s.latitude,s.longitude,s.wtemp&s.id="sp031"&s.time>=1404172800 ERDDAP: ?id,owner,time,latitude,longitude,wtemp&id="sp031"&time>=2014-07-01 idownertypetimelatitudelongitudewtempatmp 46088NDBC3m Discus1993-06-01T14:20:00Z48.336-123.15916.418.0 46088NDBC3m Discus1993-06-01T14:50:00Z48.336-123.15916.518.2... SANF1SFSUC-MAN1968-10-14T16:00:00Z24.456-81.87715.814.9 SANF1SFSUC-MAN1968-10-14T17:00:00Z24.456-81.87715.814.8...
10
(ERD)DAP Sequence Requests vs. Database SQL Requests (ERD)DAP: ?id,owner,type,time,latitude,longitude,wtemp&id="46088"&time>=2014-07-01 SQL: SELECT id,owner,type,time,latitude,longitude,wtemp FROM s WHERE id="46088" AND time>=2014-07-01 Pablo Picasso: "Good artists copy, great artists steal."
11
Related Tables vs. One Table idownertypelatitudelongitudetimewtempatmp 46088NDBC3m Discus48.336-123.1591993-06-01T14:20:00Z16.418.0 46088NDBC3m Discus48.336-123.1591993-06-01T14:50:00Z16.518.2... NC312NCSUC-MAN24.456-81.8771968-10-14T16:00:00Z15.814.9 NC312NCSUC-MAN24.456-81.8771968-10-14T17:00:00Z15.814.8... idtimewtempatmp 460881993-06-01T14:20:00Z16.418.0 460881993-06-01T14:50:00Z16.518.2... NC3121968-10-14T16:00:00Z15.814.9 NC3121968-10-14T17:00:00Z15.814.8... idownertypelatitudelongitude 46088NDBC3m Discus48.336-123.159 41005NDBC6m Discus32.501-79.099 BP114BP3m DIscus36.905-75.713 NC312NCSUC-MAN24.456-81.877... Join (Denormalized) Buoy Table Observation Table Normalized
12
Yeah, but why doesn't ERDDAP support nested sequences? It does, but just internally. ERDDAP (re)presents the dataset as a single table. One table is an abstraction. It hides details. The average user understands a table. One vs. many tables: just different representations. This lets all tabular datasets have the same structure. The results of a DAP or SQL query is always one table. There are many file format representations of one table.
13
3) Tabular datasets are best served as DAP sequences. (Why DAP Sequences Rock!) And that serving them in DAP as 1D or 2D gridded datasets is a bad idea. (This has nothing to do with how they are stored.)
14
Why Sequences Rock! Reason #1 If the data is coming from a relational database, OBIS, or SOS, the dataset can't be served as a gridded dataset. There are no index (row) numbers. It isn't easy/possible to know how many rows there are. The order of the rows may change at any time. New rows are added as new data arrives: frequently.
15
Why Sequences Rock! Reason #2 Serving tabular data in DAP as 1D or 2D gridded datasets is a bad idea. Logic: Men:mortal. Socrates:man. Socrates:mortal. Grids:handled well by DAP. Treat table as:grid. Treat table as grid:handled well? Grid dimensions usually represent a physical continuum. DAP: ?temperature[408:437][46:1:162][122:282] ERDDAP: ?temperature[(2014-06-01):(2014-06-30)][(22):(51)][(-145):(-105)] No arrangement of tabular dataset dimensions works well 2D [buoy][time]: buoy is not a continuum, time leads to wasted space 1D [time]: fine, but then you need 1000 datasets (1 per buoy) 1D [row]: aggregated, but row isn't a continuum. In every case, it's hard to know which rows to request. The rows you want are scattered through the dataset. so you have to either download everything or make numerous requests. Serving a DSG file directly: too many formats, too hard to query.
16
Why Sequences Rock! Reason #3 DAP sequence requests use the terminology of the dataset. (It's easy.) ?id,owner,type,latitude,longitude&distinct() ?id,type,latitude,longitude&owner="NDBC"&distinct() ?id&latitude>=22&latitude =-145&longitude<=-105&distinct() ?id&latitude>=22&latitude =-145&longitude =2014-07- 01&distinct() ?&latitude>=22&latitude =-145&longitude =2014-07-01 indexidownertypelatitudelongitudetimewtempatmp 146088NDBC3m Discus48.336-123.1591993-06-01T14:20:00Z16.418.0 246088NDBC3m Discus48.336-123.1591993-06-01T14:50:00Z16.518.2 137522BP114BP3m Discus36.905-75.7132003-02-09T02:00:00Z16.712.2 137523BP114BP3m discus36.905-75.7132003-02-09T04:00:00Z16.612.0 1732156NC312NCSUC-MAN24.456-81.8771968-10-14T16:00:00Z15.814.9 1732157NC312NCSUC-MAN24.456-81.8771968-10-14T17:00:00Z15.814.8 328245941005NDBC6m Discus32.501-79.0901984-08-22T14:20:00Z14.626.8 328246041005NDBC6m Discus32.501-79.0901984-08-22T14:50:00Z14.726.2 Making these requests with index numbers is a difficult (not for Roberto), multi-step, programming task. And it's inefficient.
17
Why Sequences Rock! Reason #4 Because declarative languages (SQL, DAP selection constraints) let you describe what you want, not how to get it. ?id,owner,type,latitude,longitude&distinct() ?id,type,latitude,longitude&owner="NDBC"&distinct() ?id&latitude>=22&latitude =-145&longitude =22&latitude =-145&longitude =2014-07- 01&distinct() ?&latitude>=22&latitude =-145&longitude =2014-07-01 With imperative languages (C, Fortran, Java, Python), you must describe, step-by- step, how to solve the problem. 1) Request all latitudes. 2) Filter 3) Request all longitudes. 4) Multiple requests because data is scattered throughout the dataset.
18
Why Sequences Rock! Reason #5 Because the other options all suck. Serving the datasets as grids doesn't work. You now understand why, right? Serve the data files via FTP. Getting a chunk of data is all or nothing. Makes user deal with various file formats. Custom forms and web services are too much work to make. Custom: 6+ months per dataset? Ongoing maintenance. No consistency! Reusable: 1 day, minimal maintenance, consistent! Give trusted colleagues access to the database or the files. That's not making the data public! Don't let anyone else use the data. This is actually the #1 method of fisheries data distribution.
19
My Goals for this Presentation 1.Tell you more about ERDDAP. 2.Raise awareness and appreciation of tabular data. 3.Convince you that tabular datasets are best served as DAP sequences. And that serving them in DAP as 1D or 2D gridded datasets is a bad idea. (This has nothing to do with how they are stored.) Bonus: 3 powerful ideas: 1.Abstractions (capture the essence; hide the instance details) 2.Representations (different file formats) 3.Reusability (value is multiplied)
20
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.