The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Slides:



Advertisements
Similar presentations
1 NASA CEOP Status & Demo CEOS WGISS-25 Sanya, China February 27, 2008 Yonsook Enloe.
Advertisements

Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD.
Weathertop Consulting, LLC Server-side OPeNDAP Analysis - A General Approach Utilizing Legacy Applications through TDS Roland Schweitzer Weathertop Consulting,
Aggregation and Subsetting in ERDDAP (a middleman data server) Bob Simons NOAA NMFS SWFSC ERD.
WOCE Global Data V3 WOCE-DPC Report Nathan Bindoff and David M. Legler Co-Chairs, WOCE DPC WOCE Conference November 2002 All of it.
Climate Analytics on Global Data Archives Aparna Radhakrishnan 1, Venkatramani Balaji 2 1 DRC/NOAA-GFDL, 2 Princeton University/NOAA-GFDL 2. Use-case 3.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
LAS & NVODS S.Hankin -- Sep NVODS and the Live Access Server (LAS) Steve Hankin, PI (NOAA/PMEL) Jon Callahan (U of WA/JISAO) Ansley Manke (NOAA/PMEL)
Активное распределенное хранилище для многомерных массивов Дмитрий Медведев ИКИ РАН.
Comprehensive Large Array-data Stewardship System (CLASS) Web Site Tutorial Visit CLASS Site at
IQuOD Data Flow Tim Boyer NODC. Inflow How will IQuOD quality controlled data get into the World Ocean Database?
Reiner Schlitzer Alfred Wegener Institute for Polar and Marine Research Ocean Data View - Available Data Collections and Data Model.
Tools for accessing distributed in-situ data collections Donald W. Denbo, NOAA/PMEL-JISAO Jason E. Fabritz, NOAA/PMEL-JISAO Bernard J. Kilonsky, Sea Level.
Coordinated Energy and water-cycle Observations Peroject A Well Organized Data Archive System Data Integrating/Archiving Center at University of Tokyo.
GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading.
Bringing it All Together: NODC’s Geoportal Server as an Integration Tool for Interoperable Data Services Kenneth S. Casey, Ph.D. YuanJie Li NOAA National.
Feature Collections Subsetting 1. Overview 2. NCSS 2.1. Dataset description 2.2. Grid requests 2.3. Grid as point requests 3. CdmrFeature.
Weathertop Consulting, LLC Wednesday, January 14, 2009 IIPS 11A.2 1 A General Purpose System for Server-side Analysis of Earth Science Data Roland Schweitzer.
1 AJAX and Dapper: The Good, the Bad, and the Ugly Joe Sirott PMEL/NOAA.
HPDC 2014 Supporting Correlation Analysis on Scientific Datasets in Parallel and Distributed Settings Yu Su*, Gagan Agrawal*, Jonathan Woodring # Ayan.
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
NcBrowse A Graphical netCDF/OPeNDAP Browser Donald Denbo 1 & John Osborne 2 1 UW/JISAO-NOAA/PMEL, 2 OceanAtlas Software
Integrated Model Data Management S.Hankin ESMF July ‘04 Integrated data management in the ESMF (ESME) Steve Hankin (NOAA/PMEL & IOOS/DMAC) ESMF Team meeting.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
Data Access to Marine Surface Observations and Products from COADS 29 January, 2002 Steven Worley National Center for Atmospheric Research.
1 Dapper and Argo Joe Sirott PMEL/NOAA. 2 What is Dapper? Web server that provides distributed access to in-situ data via OPeNDAP protocol Clients include.
U.S. JGOFS Data Management a retrospective Cyndy Chandler U.S. JGOFS Data Management Office 25 January 2005 NACP Data Management Planning Workshop New.
1 HYCOM Data Service HYCOM Data Service An overview Ashwanth Srinivasan, (FSU) Steve Hankin (NOAA/PMEL)
, Key Components of a Successful Earth Science Subsetter Architecture ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
IODE Ocean Data Portal - ODP  The objective of the IODE Ocean Data Portal (ODP) is to facilitate and promote the exchange and dissemination of marine.
1 DAPPER: An OPENDAP Server for In-Situ Data Joe Sirott Donald W. Denbo, Willa H Zhu University of Washington PMEL/NOAA.
UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.
NQuery: A Network-enabled Data-based Query Tool for Multi-disciplinary Earth-science Datasets John R. Osborne.
A Data Access Framework for ESMF Model Outputs Roland Schweitzer Steve Hankin Jonathan Callahan Kevin O’Brien Ansley Manke.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Improving Data Catalogs with Free and Open Source Software Kevin O’Brien University of Washington Joint Institute for the Study of the Atmosphere and Ocean.
Ed Armstrong – PI Luca Cinquini Chris Mattmann NASA Jet Propulsion Laboratory Frank O’Brien Zach Siegrist System Science Applications, Inc. 18 July 2012.
Recent developments and trends in Network Access to Oceanographic In-situ Data Nancy Soreide, NOAA/PMEL John “Oz” Osborne, NOAA/PMEL - OceanAtlas Software.
NOAAServer: Unified access to distributed NOAA data Ernest Daddio, NOAA/ESDIM Steve Hankin, NOAA/PMEL Donald Denbo, NOAA/PMEL/JISAO Nancy Soreide, NOAA/PMEL.
Information Technology: GrADS INTEGRATED USER INTERFACE Maps, Charts, Animations Expressions, Functions of Original Variables General slices of { 4D Grids.
A Climate Data Portal Focused on realtime and retrospective in situ data Nancy Soreide, Don Denbo, Willa Zhu, NOAA/PMEL Charles Sun, NOAA/NODC Bernie Kilonsky,
Observing System Monitoring Center (OSMC) Work in progress in brief June 2005 Steve Hankin, Kevin O’Brien – PMEL.
1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.
An Introduction to the Argo Data Sytem South Pacific Workshop 11 – 14 October 2005 Mark Ignaszewski FNMOC.
Weathertop Consulting, LLC Server-side OPeNDAP Analysis – Concrete steps toward a generalized framework via a reference implementation using F-TDS Roland.
LAS and THREDDS: Partners for Education Roland Schweitzer Steve Hankin Jonathan Callahan Joe Mclean Kevin O’Brien Ansley Manke Yonghua Wei.
Java OceanAtlas A Cross-Platform Application for Visualization and Selection of Oceanographic Profile Data John R. Osborne ftp://odf.ucsd.edu/pub/OceanAtlas/
Distributed Data Servers and Web Interface in the Climate Data Portal Willa H. Zhu Joint Institute for the Study of Ocean and Atmosphere University of.
A Climate Data Portal Focused on realtime and retrospective in situ data Nancy Soreide, Don Denbo, Willa Zhu, PMEL Charles Sun, NODC Bernie Kilonsky, U.
CARBOOCEAN Data management and SOCAT Benjamin Pfeil, Are Olsen, Jeremy Malzcyk, Steve Hanhin, Alex Kozyr and many others Partner 16 WDC-MARE Partner 19.
EPIC Tools for in-situ data collections Donald W. Denbo, NOAA/PMEL Willa H. Zhu, NOAA/PMEL/JISAO John Osborne, OceanAtlas Software Christopher Moore, NOAA/PMEL/JISAO.
GO-ESSP The Earth System Grid The Challenges of Building Web Client Geo-Spatial Applications Eric Nienhouse NCAR.
The ECOST Web-based platform for data providers and for data users.
NcBrowse: A Graphical netCDF File Browser Donald Denbo NOAA-PMEL/UW-JISAO
1. Gridded Data Sub-setting Services through the RDA at NCAR Doug Schuster, Steve Worley, Bob Dattore, Dave Stepaniak.
NQuery: A Network-enabled Data-based Query Tool for Multi-disciplinary Earth-science Datasets John R. Osborne 1, Kevin T. McHugh 2, and Donald W. Denbo.
2005 – 06 – - ESSP1 WDC Climate : Web Access to Metadata and Data Frank Toussaint World Data Center for Climate (M&D/MPI-Met, Hamburg)
Installing the THREDDS and Aggregation Servers ● Install and verify the Tomcat servlet engine ● Install and verify the THREDDS servlet (which also contains.
Data Browsing/Mining/Metadata
Spatial Data Activities at the Reading e-Science Centre
MERRA Data Access and Services
Flanders Marine Institute (VLIZ)
Integrating Data and Information Across Observing System
The Server-Side with F-TDS
LP DAAC OPeNDAP Services
Live Access Server (LAS)
Visualization of Global Argo Metadata:
EOSDIS Approach to Data Services in the Cloud
OPeNDAP/Hyrax Interfaces
Presentation transcript:

The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin OBrien, Ansley Manke, Steve Du, Xiaoping Wang, Joe Mclean, Joe Sirott, Jerry Davison

Gridded vs. Observational Data Clean Organized Labeled Voluminous Handled by machines Dirty Messy Often un/mis-labeled Increasingly voluminous Previously handled by hand

Live Access Server (LAS) Web based, common interface to diverse sources of climate data Single interface for subsetting, download, visualization, comparison Easy access to metadata and documentation Unified access to distributed data holdings Uniform user interface to existing back end visualization packages

LAS Data Model For data access users must specify: Dataset Variable 4D Region Constraints

Dataset

Variable

4D Region Constraints

Output

LAS Architecture LAS is three tiered

Access to Remote Data Ferret back end is linked with OPeNDAP

Data Server Details Java servlet redesign

Server Side Functionality After parsing the user request LAS must: For interactive results each task should take <5 sec. Access & Subset the data Perform analysis Create Visualization

The Hard Part After parsing the user request LAS must: Access & Subset the data Perform analysis Create Visualization

Classes of Observational Climate Data Station time series (Eulerian) –Oceanic tide guages (1D) moored thermister chains (2D) –Atmospheric surface weather stations (1D) profilers (2D)

Classes of Observational Climate Data Profile data –Oceanic CTD casts, bottle data (ordered by cruise track, quasi-scattered) repeat stations (ordered by cruise track or station location) –Atmospheric profilers (station based) baloons (2D, quasi-lagrangian)

Classes of Observational Climate Data Tracks (Lagrangian) –Oceanic ship underway data (surface) drifting buoys (surface) ARGO floats (surface tracks, scattered profiles) instrumented animals (depth) –Atmospheric airplane underway data (altitude) baloons (altitude, quasi-stationary, quasi-profile)

Classes of Observational Climate Data Random Scatter –Oceanic surface ship observations profile locations –Atmospheric surface weather obs

Example Dataset NOAA/NODC/OCL World Ocean Database 2001 –data collected from ocean cruises and moorings –scattered profiles, lagrangian drifters –physical, chemical and biological data –dozens (hundreds?) of variables –> 7 million profiles (1792-present, global) –> 10 Gigabytes of data (accelerating every year)

Example Dataset NOAA/NODC/OCL World Ocean Database 2001 Current access: Choose either temporally or spatially sorted data Choose year(s) or 10x10 degree box Choose instrument Retrieve data for all variables from that file Problems: Cannot subset data (1 year x 1 instrument 7 Mbytes) Data returned in impenetrable compressed ASCII files Associated metadata is lost

Example Dataset NOAA/NODC/OCL World Ocean Database 2001 Our attempt at synoptic/cross-instrument data access –Store data by variable Plan for those getting data out, not putting data in. What do scientific analysis and visualization packages need? –Store data for minimum # of disk seeks Memory is fast (and cheap!), disk seeks are slow. Multi-stage process for determining data blocks needed. Read excess data into memory, then winnow.

Example Dataset NOAA/NODC/OCL World Ocean Database 2001 Longitude Latitude Time Step 1: synoptic meta-pointer file (0.3 MByte) a) load synoptic meta-pointer file into memory b) subset to extract metadata pointers 10deg x 10deg x 50 irregular timesteps = 260 Kbytes number of profiles pointer into NetCDF metadata file =

Example Dataset NOAA/NODC/OCL World Ocean Database 2001 Step 2: metadata/data-pointer file (200 Mbyte) a) read blocks of profile metadata into memory b) subset by X/Y/T to obtain valid data pointers T X Y Julian day Lat Lon Cruise ID # of levels Var_ptr Var_QC = N variables x

Example Dataset NOAA/NODC/OCL World Ocean Database 2001 Step 3: data files ( Mbyte) a) read profile data b) subset by depth/quality flag to obtain valid data 1D profile T X Y Depth Value Quality flag = Z N depths x

Example Dataset NOAA/NODC/OCL World Ocean Database 2001 Our attempt at synoptic/cross-instrument data access Successes: Able to subset without accessing (much) unwanted data Access to (<1 Mbyte) subsets in seconds Access to metadata (What profiles exist?) even faster Problems: Only set up for most important variables Data cannot be updated, must be rewritten Must reinvent logic for relational queries Funky, home built soluition

Other data streams METAR obs (station time series) –1700 US weather stations report hourly data –25 variables = 120 Mbytes/month ARGO floats (profiles) –4000 floats reporting profiles every 10 days –50 levels x 10 variables = 24 Mbytes/month Tagging Of Pacific Pelagics (TOPP) (lagrangian tracks) –50 animals per year tagged with 1 min data recorders –5 variables = 0.8 Mbytes/month Voluntary Observing Ships (random scatter) –3000 surface ship reports per day –25 variables = 9 Mbytes/month

Observational Data Access Requirements Subset based on X, Y, Z, T or metadata (e.g. quality flag or station/ship/platform/animal_ID). Only return requested data. (Reduced volume for remote data access.) For near-real-time, daily updates are acceptable. (Can recreate static files on a daily basis if necessary.) Use standards wherever possible. Make the creation of the database as simple as possible. (Non-experts can follow cookbook examples.)

Conclusion Efficient access to observational data is an unsolved problem. Data volumes are increasing exponentially. Data access problems hinder the development of interactive visualization tools.