Common Data Model Scientific Feature Types John Caron UCAR/Unidata July 8, 2008.

Slides:



Advertisements
Similar presentations
Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll
Advertisements

Complex Scientific Analytics in Earth Science at Extreme Scale John Caron University Corporation for Atmospheric Research Boulder, CO Oct 6, 2010.
GRIB in TDS 4.3. NetCDF 3D Data dimensions: lat = 360; lon = 720; time = 12; variables: float temp(time, lat, lon); temp:coordinates = “time lat lon”;
Reading HDF family of formats via NetCDF-Java / CDM
Recent Work in Progress
The NCAR Command Language (NCL) and the NetCDF Data Format Research Tools Presentation Matthew Janiga 10/30/2012.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
® OGC Web Services Initiative, Phase 9 (OWS-9): Innovations Thread - OPeNDAP James Gallagher and Nathan Potter, OPeNDAP © 2012 Open Geospatial Consortium.
THREDDS, CDM, OPeNDAP, netCDF and Related Conventions John Caron Unidata/UCAR Sep 2007.
7 +/- 2 Maybe Good Ideas John Caron June (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
Активное распределенное хранилище для многомерных массивов Дмитрий Медведев ИКИ РАН.
Unidata TDS Workshop THREDDS Data Server Overview October 2014.
THREDDS Data Server, OGC WCS, CRS, and CF Ethan Davis UCAR Unidata 2008 GO-ESSP, Seattle.
THREDDS Data Server, OGC WCS, CRS, and CF Ethan Davis UCAR Unidata 2008 GO-ESSP, Seattle.
Status of netCDF-3, netCDF-4, and CF Conventions Russ Rew Community Standards for Unstructured Grids Workshop, Boulder
John Caron Unidata October 2012
OPeNDAP and the Data Access Protocol (DAP) Original version by Dave Fulker.
BUFR Information Model Gil Ross CAeM Met Office. BUFR Most BUFR Documentation is not easily understood –It treats it as a Decoding process Note – not.
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
Training on Meteorological Telecommunications Alanya, Turkey, September 2010 General Philosophy of Table Driven Code Forms Simon Elliott, EUMETSAT.
NPP/ NPOESS Product Data Format Richard E. Ullman NASA/GSFC/NPP NOAA/NESDIS/IPOAlgorithm / System EngineeringData / Information Architecture
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
Unidata’s Common Data Model John Caron Unidata/UCAR Nov 2006.
THREDDS Data Server Ethan Davis GEOSS Climate Workshop 23 September 2011.
Coverages and the DAP2 Data Model James Gallagher.
NetCDF-Java Overview John Caron Oct 29, Contents Data Models / Shared Dimensions Coordinate Systems Feature Types NetCDF Markup Language (NcML)
Mapping between SOS standard specifications and INSPIRE legislation. Relationship between SOS and D2.9 Matthes Rieke, Dr. Albert Remke (m.rieke,
NcML Aggregation vs Feature Collections. NcML functionality 1.Modify the objects found in CDM files – Especially Attributes – Don’t have to rewrite the.
Unidata’s TDS Workshop TDS Overview – Part II Unidata July 2011.
Mid-Course Review: NetCDF in the Current Proposal Period Russ Rew
N P O E S S I N T E G R A T E D P R O G R A M O F F I C E NPP/ NPOESS Product Data Format Richard E. Ullman NOAA/NESDIS/IPO NASA/GSFC/NPP Algorithm Division.
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
THREDDS Data Server Unidata’s Common Data Model Background / Summary John Caron Unidata/UCAR Mar 2007.
DAP4 James Gallagher & Ethan Davis OPeNDAP and Unidata.
Unidata TDS Workshop THREDDS Data Server Overview
Recent developments with the THREDDS Data Server (TDS) and related Tools: covering TDS, NCML, WCS, forecast aggregation and not including stuff covered.
Climate Data Formats Deniz Bozkurt
Unidata’s Common Data Model and the THREDDS Data Server John Caron Unidata/UCAR, Boulder CO Jan 6, 2006 ESIP Winter 2006.
Integrating the Climate Science Modelling Language with geospatial software and services Dominic Lowe British Atmospheric Data
IOOS Data Services with the THREDDS Data Server Rich Signell USGS, Woods Hole IOOS DMAC Workshop Silver Spring Sep 10, 2013 Rich Signell USGS, Woods Hole.
THREDDS Catalogs Ethan Davis UCAR/Unidata NASA ESDSWG Standards Process Group meeting, 17 July 2007.
Unidata’s TDS Workshop TDS Overview – Part I July 2011.
The HDF Group Data Interoperability The HDF Group Staff Sep , 2010HDF/HDF-EOS Workshop XIV1.
Unidata’s Common Data Model and NetCDF Java Library API Overview John Caron Unidata/UCAR Nov 2008.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
AUKEGGS Canberra, Exposing legacy file-based data (interop-for-files) Andrew Woolf CCLRC Rutherford Appleton Laboratory
NetCDF-4: Software Implementing an Enhanced Data Model for the Geosciences Russ Rew, Ed Hartnett, and John Caron UCAR Unidata Program, Boulder
NetCDF and Scientific Data Durability Russ Rew, UCAR Unidata ESIP Federation Summer Meeting
Data File Formats: netCDF by Tom Whittaker University of Wisconsin-Madison SSEC/CIMSS 2009 MUG Meeting June, 2009.
GIS for Atmospheric Sciences and Hydrology By David R. Maidment University of Texas at Austin National Center for Atmospheric Research, 6 July 2005.
Data Interoperability at the IRI: translating between data cultures Benno Blumenthal International Research Institute for Climate Prediction Columbia University.
Grids and Beyond: netCDF-CF and ISO/OGC Features and Coverages Ethan Davis, John Caron, Ben Domenico UCAR/Unidata AMS IIPS, 23 January 2008.
UC 2006 Tech Session 1 NetCDF in ArcGIS 9.2. UC 2006 Tech Session2 Overview Introduction to Multidimensional DataIntroduction to Multidimensional Data.
Unidata Technologies Relevant to GO-ESSP: An Update Russ Rew
CF 2.0 Coming Soon? (Climate and Forecast Conventions for netCDF) Ethan Davis ESO Developing Standards - ESIP Summer Mtg 14 July 2015.
OGC Web Services with complex data Stephen Pascoe How OGC Web Services relate to GML Application Schema.
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
Interoperability Day Introduction Standards-based Web Services Interfaces to Existing Atmospheric/Oceanographic Data Systems Ben Domenico Unidata Program.
Update on Unidata Technologies for Data Access Russ Rew
THREDDS Data Server (TDS) and Data Discovery John Caron Unidata/UCAR May 15, 2006.
WMO GRIB Edition 3 Enrico Fucile Inter-Program Expert Team on Data Representation Maintenance and Monitoring IPET-DRMM Geneva, 30 May – 3 June 2016.
Unidata Infrastructure for Data Services Russ Rew GO-ESSP Workshop, LLNL
NetCDF Data Model Details Russ Rew, UCAR Unidata NetCDF 2009 Workshop
Other Projects Relevant (and Not So Relevant) to the SODA Ideal: NetCDF, HDF, OLE/COM/DCOM, OpenDoc, Zope Sheila Denn INLS April 16, 2001.
NetCDF-Java version 2.2 Common Data Model John Caron Unidata/UCAR Dec 10, 2004.
SRNWP Interoperability Workshop
What is NetCDF ? And what are its plans for world domination?
ECMWF usage, governance and perspectives
Presentation transcript:

Common Data Model Scientific Feature Types John Caron UCAR/Unidata July 8, 2008

Contents Overview / Related Work CDM Feature types (focus on point data) Nested Table notation for Point Features Representing Point Data in Netcdf-3/CF –Preliminary experiments with BUFR

Unidata’s Common Data Model Abstract data model for scientific data NetCDF-Java library implementation/ prototype Features are being pushed into the netCDF-4 C library

Scientific Feature Types Grid Point Radial Trajectory Swath StationProfile Coordinate Systems Common Data Model Data Access netCDF-3, HDF5, OPeNDAP BUFR, GRIB1, GRIB2, NEXRAD, NIDS, McIDAS, GEMPAK, GINI, DMSP, HDF4, HDF-EOS, DORADE, GTOPO, ASCII

Related Standards/Models National and International committees are mandating compliance with ISO/OGC data standards Where does the CDM fit in ?

↑ You are here

GML encoding Abstract  OGC WXS = Web (MFC) Service client/server protocols OGC Abstract Specification ISO/TC 211 Reference model CSMLncML-Gml netCDF-3, HDF5, OPeNDAP, BUFR, GRIB1, GRIB2, NEXRAD, NIDS, McIDAS, GEMPAK, GINI, DMSP, HDF4, HDF-EOS, DORADE, GTOPO, ASCII

Where does CDM fit ? Bridge between actual datasets and abstract data model(s) Translate file’s “native data model” into higher-level semantic model “bottom-up” vs “top-down” approach

File Format THREDDS Data Server WCS NCSS OPeNDAP HTTPServer WXS Server BADC Data Server XML netcdf file CSML opendap object WXS ESSI WCS-G Server ncGml CDM Feature types Coordinate Systems Data Access

Climate Science Modelling Language (CSML) British Atmospheric Data Center (BADC) Uses ISO/OGC semantic model GML application schema for atmospheric and oceanographic data

CSML - CDM Feature types CSML Feature TypeCDM Feature Type PointFeature PointSeriesFeatureStationFeature TrajectoryFeature PointCollectionFeatureStationFeature at fixed time ProfileFeature ProfileSeriesFeatureStationProfileFeature at one location and fixed vertical levels RaggedProfileSeriesFeatureStationProfileFeature at one location SectionFeatureSectionFeature with fixed number of vertical levels RaggedSectionFeatureSectionFeature ScanningRadarFeatureRadialFeature GridFeatureGridFeature at a single time GridSeriesFeatureGridFeature SwathFeature

↑ You were there

CDM Feature Types Formerly known as Scientific Data Types Based on examining real datasets “in the wild” –Attempt to categorize, so that datasets can be handled in a more general way Implementation for OGC feature services –Intended to scale to large, multifile collections –Intended to support “specialized queries” Space, Time Data abstraction –Netcdf-Java has prototype implementation

Gridded Data Grid: multidimensional grid, separable coordinates Radial: a connected set of radials using polar coordinates collected into sweeps Swath: a two dimensional grid, track and cross-track coordinates

Gridded Data float gridData(t,z,y,x); float t(t); float y(y); float x(x); float z(z); float lat(y,x); float lon(y,x); float height(t,z,y,x); Cartesian coordinates Data is 2,3,4D All dimensions have 1D coordinate variables (separable)

Radial Data float radialData(radial, gate) : float distance(gate) float azimuth(radial) float elevation(radial) float time(radial) float origin_lat; float origin_lon; float origin_alt; Polar coordinates two dimensional Not separate time dimension

Swath float swathData( track, xtrack) float lat( track, xtrack ) float lon( track, xtrack ) float alt( track, xtrack ) float time(track) two dimensional track and cross-track not separate time dimension orbit tracking allows fast search

Unstructured Grid Pt dimension not connected Need to specify the connectivity explicitly No implementation in the CDM yet float unstructGrid(t,z,pt); float lat(pt); float lon(pt); float time(t); float height(z);

↑ Be here now

float data(sample); Point: measured at one point in time and space Station: time-series of points at the same location Profile: points along a vertical line Station Profile: a time-series of profiles at same location. Trajectory: points along a 1D curve in time/space Section: a collection of profile features which originate along a trajectory. 1D Feature Types (“point data”)

Point Observation Data Table { lat, lon, z, time; obs1, obs2,... } obs(sample); Set of measurements at the same point in space and time = obs Collection of obs = dataset Sample dimension not connected float obs1(sample); float obs2(sample); float lat(sample); float lon(sample); float z(sample); float time(sample);

Time-series Station Data Table { stationId; lat, lon, z; Table { time; obs1, obs2,... } obs(*); // connected } stn(stn); // not connected float obs1(sample); float obs2(sample); int stn_id(sample); float time(sample); int stationId(stn); float lat(stn); float lon(stn); float z(stn); float obs1(sample); float obs2(sample); float lat(sample); float lon(sample); float z(sample); float time(sample); float obs1(stn, time); float obs2(stn, time); float time(stn, time); int stationId(stn); float lat(stn); float lon(stn); float z(stn);

Profile Data Table { profileId; lat, lon, time; Table { z; obs1, obs2,... } obs(*); // connected } profile(profile); // not connected float obs1(sample); float obs2(sample); int profile_id(sample); float z(sample); int profileId(profile); float lat(profile); float lon(profile); float time(profile); float obs1(profile, level); float obs2(profile, level); float z(profile, level); float time(profile); float lat(profile); float lon(profile); float obs1(sample); float obs2(sample); float lat(sample); float lon(sample); float z(sample); float time(sample);

Time-series Profile Station Data Table { stationId; lat, lon; Table { time; Table { z; obs1, obs2,... } obs(*); // connected } profile(*); // connected } stn(stn); // not connected float obs1(profile, level); float obs2(profile, level); float z(profile, level); float time(profile); float lat(profile); float lon(profile); float obs1(stn, time, level); float obs2(stn, time, level); float z(stn, time, level); float time(stn, time); float lat(stn); float lon(stn);

Table { trajectory_id; Table { lat, lon, z, time; obs1, obs2,... } obs(*); // connected } traj(traj) // not connected Trajectory Data float obs1(sample); float obs2(sample); float lat(sample); float lon(sample); float z(sample); float time(sample); int trajectory_id(sample); float obs1(traj,obs); float obs2(traj,obs); float lat(traj,obs); float lon(traj,obs); float z(traj,obs); float time(traj,obs); int trajectory_id(traj);

Section Data float obs1(traj,profile,level); float obs2(traj,profile,level); float z(traj,profile,level); float lat(traj,profile); float lon(traj,profile); float time(traj, profile); Table { section_id; Table { surface_obs // data anywhere lat, lon, time Table { depth; obs1, obs2,... } obs(*); // connected } profile(*); // connected } section(*) // not connected

Nested Table Notation (1) 1.A feature instance is a row in a table. 2.A table is a collection of features of the same type. The table may be fixed or variable length. 3.A nested (child) table is owned by a row in the parent table. 4.Both coordinates and data variables can be at any level of the nesting. 5.A feature type is represented as nested tables of specific form. 6.A feature collection is an unconnected collection of a specific feature type. Table { data1, data2 lat, lon, time; Table { z; obs1, obs2,... } obs(17); } profile(*);

Nested Table Notation (2) A constant coordinate can be factored out to the top level. This is logically joined to any nested table with the same dimension. dim level = 17; float z(level); Table { data1, data2 lat, lon, time; Table { obs1, obs2,... } obs(level); } profile(*);

Nested Table Notation (3) A coordinate in an inner table is connected; a coordinate in the outermost table is unconnected. Table { stationId; lat, lon; Table { time; Table { z; obs1, obs2,... } obs(*); // connected } profile(*); // connected } stn(stn); // not connected Table { lat, lon, z, time; obs1, obs2,... } point(sample); Table { trajectory_id; Table { lat, lon, z, time; obs1, obs2,... } obs(*); // connected } traj(traj) // not connected

Relational model Nested Tables are a hierarchical data model (tree structure) Simple transformation to relational model – explicitly add join variables to tables Table { stationId; lat, lon, z; Table { time; obs1, obs2,... } obs(42); } stn(stn); RTable { stationId // primary key lat, lon, z; } stn RTable { stationId // secondary key time; obs1, obs2,... } obs;

Nested Model Summary Compact notation to describe 1D point feature types –Connectivity of points is key property –Variable/fixed length table dimensions can be notated easily –Constant/varying coordinates can be easily seen Can be translated to relational model to get different performance tradeoffs

↑ Be here whenever Representing point data in netCDF3/CF (or) Fitting data into unnatural shapes

Representing point data in NetCDF-3 / CF Many existing files already store point data in netCDF-3, but not standardized. CF Convention has 2 simple examples, no guidance for more complex situations Can use Nested Tables as comprehensive abstract model of data Look for general solutions

CF Example 1: Trajectory data float O3(time) ; O3:coordinates = “time lon lat z" ; double time(time) ; float lon(time) ; float lat(time) ; float z(time) ; Problem: what if multiple trajectories in same file?

CF Example 2: Station data float data(time, station); data:coordinates = "lat lon alt time" ; double time(time) ; float lon(station) ; float lat(station) ; float alt(station) ; If stations have different times, use double time(time, station) ; Problem: what if stations have different number of times?

Ragged Array Rectangular Array (netCDF-3)

Storing Ragged Arrays Linearize the Array: put all elements of the ragged array into a 1D array 1.Connect using index ranges 2.Connect using linked lists 3.Connect by matching field values (relational) 4.Index join Rectangularize the Array: use maximum size of the ragged array, use missing values Works well if avg ~ max Or if you will store/transmit compressed

Linearize Ragged Arrays Index Ranges

Linearize Ragged Arrays Linked List of Indices Parent Child

Linearize Ragged Arrays Match field values (relational) LatLonAltStn KBO KFRC StnTimeData KBO12: KFRC12: KFRC12: KBO12: KFRC12: KFRC12: KFRC12: KBO12:3034.5

Linearize Ragged Arrays Index Join LatLonAltStn KBO KFRC ParentTimeData 112: : : : : : : :3034.5

Nested Model  netCDF 1.Nested Table  Pseudo-Structures Table { profileId; lat, lon, time; Table { z; obs1, obs2,... } obs(*); } profile(profile); dimensions: profile = 42; obs = 714; variables: int profileId(profile); float lat(profile); float lon(profile); float time(profile); float z(obs); float obs1(obs); float obs2(obs); 

Storing Ragged Arrays Multidimensional / Rectangular dimensions: profile = 42; levels = 17; variables: float lat(profile); float lon(profile); float time(profile); float z(profile,level); float obs1(profile,level); float obs2(profile,level); Relational dimensions: profile = 42; obs = 2781; variables: int profileId(profile); float lat(profile); float lon(profile); float time(profile); float z(obs); float obs1(obs); float obs2(obs); int profile(obs); Index Join dimensions: profile = 42; obs = 2781; variables: float lat(profile); float lon(profile); float time(profile); float z(obs); float obs1(obs); float obs2(obs); int profileIndex(obs)

Storing Ragged Arrays Link + Parent: dimensions: profile = 42; obs = 2781; variables: float lat(profile); float lon(profile); float time(profile); int firstObs(profile) float z(obs); float obs1(obs); float obs2(obs); int nextChild(obs) int profileIndex(obs) Index Range: dimensions: profile = 42; obs = 2781; variables: float lat(profile); float lon(profile); float time(profile); int firstObs(profile) int numObs(profile) float z(obs); float obs1(obs); float obs2(obs); Linked List: dimensions: profile = 42; obs = 2781; variables: float lat(profile); float lon(profile); float time(profile); int firstObs(profile) float z(obs); float obs1(obs); float obs2(obs); int nextChild(obs)

Case Study: BUFR WMO standard for binary point data Table driven Variable length Motherlode/IDD feed –150K messages, 5.5M obs, 1 Gbyte per day –350 categories of WMO headers –70 distinct BUFR types

BUFR  netCDF-3 BUFR data is stored as unsigned ints –scale/offset/bit widths stored in external tables –bit packed –Variable-length arrays of data Translate to netCDF: –Align data on byte boundaries –Use standard scale/offset attributes –rectangularize or linearize ragged arrays

Profiler BUFR data uncompressed, variable # of levels Size(Kb)Zippedratio raw ratio zip BUFR NetCDF multidim netCDF linear

Compressed BUFR data fixed length nested tables Size Kb Zip Kb BUFR NetCDF ratio11.95 Size Kb Zip Kb BUFR NetCDF ratio EUMETSAT, single level upper air 15 messages, 430 obs/message NCEP, satellite sounding 73 messages, 60 obs/message

Point data in netCDF-3 Summary Main problem is ragged arrays Tradeoffs –ease-of-writing vs. ease-of-reading –storage size –More studies with BUFR data NetCDF-4 is likely straightforward, since it has variable length Structures CF proposal Real Soon Now

NetCDF-Java library 4.0 Point Feature API NetCDF-Java library 4.0 will have a new API based on Nested Table model New Sequence data type = variable length array of Structures –Iterators over StructureData objects Experimenting with: –Automatic analysis of datasets to guess feature type –Annotate/configure Feature Dataset to identify nested tables and coordinates (push into NcML?) –NcML aggregation over feature collections (?)

Conclusions CDM Feature Type model and implementation are evolving Nested Table notation provides a flexible way to characterize 1D point datasets Netcdf-Java 4.0 library has refactored point data implementation TDS will eventually provide new point subsetting services

Recent new documents CDM Feature Types CDM Point Feature Types software/netcdf-java/CDM/ Feedback: