THREDDS Status John Caron Unidata 5/7/2013. Outline Release schedule Aggregations -> featureCollections / NCSS GRIB refactor Discrete Sampling Geometry.

Slides:



Advertisements
Similar presentations
James Gallagher OPeNDAP 1/10/14
Advertisements

Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata.
GRIB in TDS 4.3. NetCDF 3D Data dimensions: lat = 360; lon = 720; time = 12; variables: float temp(time, lat, lon); temp:coordinates = “time lat lon”;
Recent Work in Progress
The Model Output Interoperability Experiment in the Gulf of Maine: A Success Story Made Possible By CF, NcML, NetCDF-Java and THREDDS Rich Signell (USGS,
A Common Data Model In the Middle Tier Enabling Data Access in Workflows … HDF/HDF-EOS Workshop XIV September 29, 2010 Doug Lindholm Laboratory for Atmospheric.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
® OGC Web Services Initiative, Phase 9 (OWS-9): Innovations Thread - OPeNDAP James Gallagher and Nathan Potter, OPeNDAP © 2012 Open Geospatial Consortium.
THREDDS, CDM, OPeNDAP, netCDF and Related Conventions John Caron Unidata/UCAR Sep 2007.
7 +/- 2 Maybe Good Ideas John Caron June (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
Making earth science data more accessible: experience with chunking and compression Russ Rew January rd Annual AMS Meeting Austin, Texas.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Активное распределенное хранилище для многомерных массивов Дмитрий Медведев ИКИ РАН.
Unidata TDS Workshop THREDDS Data Server Overview October 2014.
Status of netCDF-3, netCDF-4, and CF Conventions Russ Rew Community Standards for Unstructured Grids Workshop, Boulder
John Caron Unidata October 2012
Linux Operations and Administration
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
OPeNDAP and the Data Access Protocol (DAP) Original version by Dave Fulker.
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
Feature Collections Subsetting 1. Overview 2. NCSS 2.1. Dataset description 2.2. Grid requests 2.3. Grid as point requests 3. CdmrFeature.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
2 3 ROMS/COAWST NcML file 4 5 Exploiting IOOS: A Distributed, Standards-Based Framework and Software Stack for Searching, Accessing, Analyzing and.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
Unidata’s Common Data Model John Caron Unidata/UCAR Nov 2006.
THREDDS Data Server Ethan Davis GEOSS Climate Workshop 23 September 2011.
OPeNDAP Developer’s Workshop Feb OPeNDAP 4 Data Server – Hyrax James Gallagher and Nathan Potter 21 Feb 2007.
Coverages and the DAP2 Data Model James Gallagher.
Weathertop Consulting, LLC Wednesday, January 14, 2009 IIPS 11A.2 1 A General Purpose System for Server-side Analysis of Earth Science Data Roland Schweitzer.
NetCDF-Java Overview John Caron Oct 29, Contents Data Models / Shared Dimensions Coordinate Systems Feature Types NetCDF Markup Language (NcML)
NcML Aggregation vs Feature Collections. NcML functionality 1.Modify the objects found in CDM files – Especially Attributes – Don’t have to rewrite the.
Unidata’s TDS Workshop TDS Overview – Part II Unidata July 2011.
Mid-Course Review: NetCDF in the Current Proposal Period Russ Rew
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
THREDDS Data Server Unidata’s Common Data Model Background / Summary John Caron Unidata/UCAR Mar 2007.
Integrated Grid workflow for mesoscale weather modeling and visualization Zhizhin, M., A. Polyakov, D. Medvedev, A. Poyda, S. Berezin Space Research Institute.
DAP4 James Gallagher & Ethan Davis OPeNDAP and Unidata.
Opendap dev - meeting, Boulder, Feb 2007 OPeNDAP infrastructure in European Operational Oceanography T Loubrieu (IFREMER) T Jolibois (CLS)
Unidata TDS Workshop THREDDS Data Server Overview
Recent developments with the THREDDS Data Server (TDS) and related Tools: covering TDS, NCML, WCS, forecast aggregation and not including stuff covered.
Unidata’s Common Data Model and the THREDDS Data Server John Caron Unidata/UCAR, Boulder CO Jan 6, 2006 ESIP Winter 2006.
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
IOOS Data Services with the THREDDS Data Server Rich Signell USGS, Woods Hole IOOS DMAC Workshop Silver Spring Sep 10, 2013 Rich Signell USGS, Woods Hole.
THREDDS Catalogs Ethan Davis UCAR/Unidata NASA ESDSWG Standards Process Group meeting, 17 July 2007.
Unidata’s TDS Workshop TDS Overview – Part I July 2011.
A Data Access Framework for ESMF Model Outputs Roland Schweitzer Steve Hankin Jonathan Callahan Kevin O’Brien Ansley Manke.
00/XXXX 1 Data Processing in PRISM Introduction. COCO (CDMS Overloaded for CF Objects) What is it. Why is COCO written in Python. Implementation Data Operations.
GrADS-DODS Server An open-source tool for distributed data access and analysis Joe Wielgosz, Brian Doty, Jennifer Adams COLA/IGES - Calverton, MD
Data Interoperability at the IRI: translating between data cultures Benno Blumenthal International Research Institute for Climate Prediction Columbia University.
Weathertop Consulting, LLC Server-side OPeNDAP Analysis – Concrete steps toward a generalized framework via a reference implementation using F-TDS Roland.
LAS and THREDDS: Partners for Education Roland Schweitzer Steve Hankin Jonathan Callahan Joe Mclean Kevin O’Brien Ansley Manke Yonghua Wei.
July 19, 2004Joint Techs – Columbus, OH Network Performance Advisor Tanya M. Brethour NLANR/DAST.
1 Earth System Grid Center for Enabling Technologies OPeNDAP Services for ESG March 9, 2016 Peter Fox, Patrick West, Stephan Zednik RPI Performance Measures.
Rich Signell Roland Viger Curtis Price USGS Community for Data Integration Feb 15, 2012.
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
5-7 May 2003 SCD Exec_Retr 1 Research Data, May Archive Content New Archive Developments Archive Access and Provision.
9/21/04 James Gallagher Server-Side: The Basics This part of the workshop contains an overview of the two servers which OPeNDAP has developed. One uses.
Update on Unidata Technologies for Data Access Russ Rew
THREDDS Data Server (TDS) and Data Discovery John Caron Unidata/UCAR May 15, 2006.
TSDS (HPDE DAP). Objectives (1) develop a standard API for time series-like data, (2) develop a software package, TSDS (Time Series Data Server), that.
Unidata Infrastructure for Data Services Russ Rew GO-ESSP Workshop, LLNL
® Sponsored by Improving Access to Point Cloud Data 98th OGC Technical Committee Washington DC, USA 8 March 2016 Keith Ryden Esri Software Development.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
NetCDF-Java version 2.2 Common Data Model John Caron Unidata/UCAR Dec 10, 2004.
Remote Data Access Update
Remote Data Access Update
Future Development Plans
OPeNDAP/Hyrax Interfaces
Adapting an existing web server to S3
Presentation transcript:

THREDDS Status John Caron Unidata 5/7/2013

Outline Release schedule Aggregations -> featureCollections / NCSS GRIB refactor Discrete Sampling Geometry (point data) ncstream/cdmRemote/cdmrFeature

THREDDS/CDM personnel John Caron (1.0) – head cook and bottle washer Ethan Davis (.25) – Architecture, standards, catalogs Marcos Hermida (1.0) – NCSS, WMS, maven, spring, javascript Lansing Madry (1.0) – support, testing, domain expertise Sean Arms (.25) – IDV/CDM interface, NCEP models, rosetta, python, domain expertise Dennis Heimbigner (0.5) – OpenDAP, HTTPClient Julius Chastang (0.0): – IDV, python interface Yuan Ho (0.0) – IDV, radar IOSPs

KillCat Release (1 June 2013) Last 4.3 Feature Release motherlode.ucar.edu -> thredds.ucar.edu All Unidata servers upgrade to Major Features – GRIB complete rewrite, FeatureCollection scaling – netCDF-4 writing with netCDF C library (JNA) – CF 1.6 Discrete Sampling Convention – NetCDF Subset Service and WMS improvements – Software Engineering: GitHub, Maven

BlackCat Release 4.4 (31 Oct 2013) Improvements for ESG – millions of catalogs Refactor/harmonize NCSS, CdmRemote, and RadarServer APIs Extend NCSS for use in WRF initialization Release THREDDS Data Manager (TDM) for outside use Migrate HttpClient from 3.x (EOL) to 4.x Possible – OpenID authentication – WaterML from NCSS Grid as Point (where does extra metadata come from?) – Experiments with Async

SantaCat Release 4.5 (25 Dec 2013) DAP4 server and client Grid Feature Collection, replace FMRC GRIB FeatureCollection: Constant Forecast Offset/Hour options CdmRemoteFeature service implemented for all CF-1.6 DSG feature types

SchrödingersCat Release 5.x Require Java 7 (nio2) and Tomcat 7 – Java 6 reached end of life Feb 2013 API changes allowed TDS configuration refactor Refactor GridDatatype to Coverage – Swath/Image – Cross-seam lat/lon data requests – Unstructured Grid? – Time-dependent coordinate system? – Better dataset classification Refactor Catalog reading/writing package Improved metadata harvesting support

SchrödingersCat Release 5.x Search/discovery service ? Asyncronous requests – client and server ? TDS-lite ? – on demand trusted local server – Access from C, python

Aggregation -> FeatureCollections Aggregation is associated with virtual datasets defined with NcML NcML is seriously overloaded with semantics – Originally a client-side configuration, done on-the-fly – Adapted to play seamlessly with TDS configuration catalogs – Server side aggregations more complicated – Hard to do all 3 at once: Very large, updating, performance – has new set of configuration elements that make it easier for both users and implementors Performance comes from storing the result dataset info (“.ncx” files) Aggregation is being phased out in favor of “feature collections” – GRIB, POINT collections are the first real implementation – GRID, FMRC will be refactored in 4.5 NcML can still be used to modify the datasets s/FeatureCollections.html s/FeatureCollections.html

NetCDF Subset Service REST web service for coordinate based subsetting on GRID datasets On our thredds.ucar.edu server, datasets are NCEP model runs as GRIB collections TDS 4.3: Much improved interface and reliability – needs more performance Can return netCDF-3 or netCDF-4 files Net effect is a subset / transformation service

GRIB – CDM/TDS 4.3 Complete rewrite of GRIB1, GRIB2 IOSPs – Table Handling – Multifile collections – eliminate user configuration Automatically figure out coord systems User Groups for multiple horizontal domains – Indexing (.gbx9) and cache metadata (.ncx) – User configuration passed into the IOSP TDS featureCollection=GRIB – Time Partitions (performance) – User Configuration for changing datasets – Motivated by NCDC/NOMADS issues and $

netCDF storage

GRIB storage

GRIB Rectilyzationologicment Turn unordered collection of 2D slices into 3-6D multidimensional array Each GRIB record (2D slice) is independent There is no overall schema to describe what its supposed to be  there is, but not able to be encoded in GRIB

GRIB collection indexing Index file name.gbx9 GRIB file … Index file name.gbx9 GRIB file Index file name.gbx9 GRIB file 1000x smaller Create TDS Collection Index collectionName.ncx 1000x smaller CDM metadata

GRIB time partitioning TDS gbx9 GRIB file … gbx9 GRIB file gbx9 GRIB file ncx gbx9 GRIB file … gbx9 GRIB file gbx9 GRIB file ncx … Partition index Collection.ncx

NCEP GFS half degree All data for one run in one file 3.65 Gbytes/run, 4 runs/day, 22 days Total 321 Gbytes, 88 files Partition by day (mostly for testing) Index files – Gbx9: 2.67 Mbytes each – Ncx: 240 Kbytes each – Daily partition indexes : 260K each – Overall index is about 50K (CDM metadata) – Index overhead = grib file sizes / 1000

CFSR timeseries data at NCDC Climate Forecast Series Reanalysis (31 years, 372 months) Total 5.6 Tbytes, 56K files analyze one month (198909) – 151 files, approx 15Gb. 15Mb gbx9 indexes. – 101 variables, time steps – records duplicates (15%) – 1.1M collection index, 60K needs to be read by TDS when opening.

Big Data cfsr-hpr-ts9 9 month (275~ day run) 4x / day at every 5 day intervals. run since 1982 to present! ~22 million files

GRIB - summary Fast indexing allows you to find the subsets that you want in under a second – Time partitioning should scale up as long as your data is time partitioned – No pixie dust: still have to read the data! – GRIB2 stores compressed horizontal slices, must decompress entire slice to get one value – Experimenting with storing in netcdf-4 – Chunk to get timeseries data at a single point Still getting the bugs out on changing/updating datasets (4.3.17) featureCollections will (eventually) solve many of the problems of Aggregations

Discrete Sampling Geometries (aka Point Data) Conventions added to CF 1.6 CDM 4.3 has complete implementation – ucar.nc2.ft.point package TDS 4.3 featureCollection – type = POINT, STATION – Creates a cdmrFeature web service

Discrete Sample Feature Types point: a collection of data points with no connection in time and space timeSeries: a series of data points at the same location, with varying time trajectory: a series of data points along a curve in time and space profile: a set of data points along a vertical line timeSeriesProfile: a series of profiles at the same location, with varying time trajectoryProfile: a set of profiles which originate from points along a trajectory

ucar.nc2.ft.point Subset by lat/lon box, time range Iterate over rows (result set) Not arrays (netCDF classic data model) Scales to large collections Allows streaming Similar to/compatible with RDBMS Nested tables – hierarchical data model TODO: arbitrary predicates (filter)

ncstream NetCDF files (almost always) have to be written, then copied to network – Assumes random access, not stream – “read optimized” : data layout is known ncstream explores what “streaming netcdf” might look like – “write-optimized”: append only – Efficient conversion to netCDF files on the client Ncstream data model == CDM data model Binary encoding using Google's Protobuf Protobuf – Binary object serialization, cross language, transport nuetral, extensible – Very fast: some tests show >10x OPeNDAP Have experimental versions in CDM and TDS since 4.1

cdmRemote Replacement for OPeNDAP 2.0 that can handle the full CDM data model In 4.3, CDM/TDS uses cdmRemote in preference to OPeNDAP, for remote access to CDM datasets – User can configure this Index based – just like netCDF Not currently promoting outside of CDM/TDS stack

cdmrFeature TDS 4.3 web service for coordinate based subsetting REST API Harmonize / merge with NCSS (4.4 Oct 2013) Extend to all CF-DSG feature types (4.5 Dec 2013) Intended to be used on collections of DSG (point) data – Needs time partition date in the filename Output – netCDF-3/CF – XML, CSV – ncstream fro CDM clients Clients – HTML form, like NCSS – CDM / ToolsUI – Python scripts

featureCollection configuration <featureCollection name="Metar Station Data" featureType="Station" path="urlpath/station/data">

Why not OPeNDAP? Reasonable data model using sequences Incomplete coordinate system data model – cant make requests in lat/lon or time in a standard way Server side processing not standardized – Client cant discover whats possible – Semantics non standardized Waiting for DAP4 – See what we have when that’s ready