AR5 Data and Product Access Architecture Concepts for Discussion Steve Hankin (NOAA/PMEL) (Not including metadata architecture or security)

Slides:



Advertisements
Similar presentations
1 NASA CEOP Status & Demo CEOS WGISS-25 Sanya, China February 27, 2008 Yonsook Enloe.
Advertisements

Crucial Patterns in Service- Oriented Architecture Jaroslav Král, Michal Žemlička Charles University, Prague.
Integrating NOAA’s Unified Access Framework in GEOSS: Making Earth Observation data easier to access and use Matt Austin NOAA Technology Planning and Integration.
Gateway Node Security Block Diagram ESG Gateway Node Confluence Server OpenID Filter Authz Service Callout Authorization Service (SSL) F-TDS OpenID Filter.
Technical Architectures
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
Exploring large marine datasets using an interactive website and Google Earth Jon Blower, Dan Bretherton, Keith Haines, Chunlei Liu, Adit Santokhee Reading.
CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech.
Web-Enabling the Warehouse Chapter 16. Benefits of Web-Enabling a Data Warehouse Better-informed decision making Lower costs of deployment and management.
SQL Reporting Services Overview SSRS includes all the development and management pieces necessary to publish end user reports in  HTML  PDF 
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
Client/Server Architectures
Unidata TDS Workshop THREDDS Data Server Overview October 2014.
Introduction Downloading and sifting through large volumes of data stored in differing formats can be a time-consuming and sometimes frustrating process.
HYCOM Data Service New Datasets, Functionality and Future Development Ashwanth Srinivasan, (FSU) Steve Hankin (NOAA/PMEL) Major contributors: Jon Callahan.
OPeNDAP and the Data Access Protocol (DAP) Original version by Dave Fulker.
AIRNow-International The future of the United States real-time air quality reporting and forecasting program and GEOSS participation John E. White U.S.
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
Bringing it All Together: NODC’s Geoportal Server as an Integration Tool for Interoperable Data Services Kenneth S. Casey, Ph.D. YuanJie Li NOAA National.
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
Fundamentals of Database Chapter 7 Database Technologies.
Weathertop Consulting, LLC Wednesday, January 14, 2009 IIPS 11A.2 1 A General Purpose System for Server-side Analysis of Earth Science Data Roland Schweitzer.
® IBM Software Group © 2007 IBM Corporation J2EE Web Component Introduction
Planning for Arctic GIS and Geographic Information Infrastructure Sponsored by the Arctic Research Support and Logistics Program 30 October 2003 Seattle,
Dynamic Content On Edge Cache Server (using Microsoft.NET) Name: Aparna Yeddula CS – 522 Semester Project Project URL: cs.uccs.edu/~ayeddula/project.html.
Web services at TRFIC TRFIC has developed the Access Technologies to achieve its goals of interoperability and provide access to data and information on.
1 AJAX and Dapper: The Good, the Bad, and the Ugly Joe Sirott PMEL/NOAA.
Mid-Course Review: NetCDF in the Current Proposal Period Russ Rew
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
Page 1 Pacific THORPEX Predictability, 6-7 June 2005© Crown copyright 2005 The THORPEX Interactive Grand Global Ensemble David Richardson Met Office, Exeter.
Integrated Model Data Management S.Hankin ESMF July ‘04 Integrated data management in the ESMF (ESME) Steve Hankin (NOAA/PMEL & IOOS/DMAC) ESMF Team meeting.
Opendap dev - meeting, Boulder, Feb 2007 OPeNDAP infrastructure in European Operational Oceanography T Loubrieu (IFREMER) T Jolibois (CLS)
Unidata TDS Workshop THREDDS Data Server Overview
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
PaPCo, Das2, and Autoplot Jeremy Faden, University of Iowa.
Unidata’s TDS Workshop TDS Overview – Part I July 2011.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory.
NQuery: A Network-enabled Data-based Query Tool for Multi-disciplinary Earth-science Datasets John R. Osborne.
A Data Access Framework for ESMF Model Outputs Roland Schweitzer Steve Hankin Jonathan Callahan Kevin O’Brien Ansley Manke.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
IPCC TGICA and IPCC DDC for AR5 Data GO-ESSP Meeting, Seattle, Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute.
The Unified Access Framework for Gridded Data … the 1 st year focus of NOAA’s Global Earth Observation Integrated Data Environment (GEO-IDE) Steve Hankin,
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
NOAAServer: Unified access to distributed NOAA data Ernest Daddio, NOAA/ESDIM Steve Hankin, NOAA/PMEL Donald Denbo, NOAA/PMEL/JISAO Nancy Soreide, NOAA/PMEL.
Information Technology: GrADS INTEGRATED USER INTERFACE Maps, Charts, Animations Expressions, Functions of Original Variables General slices of { 4D Grids.
Product-Generation in ESG: some explorations of the user experience Steve Hankin – March, 2007.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Managing Enterprise GIS Geodatabases
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
1 Overall Architectural Design of the Earth System Grid.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Product-Generation in ESG: some explorations of the user experience and discussion of implications for the design of ESG Steve Hankin & Roland Schweitzer.
SCD Research Data Archives; Availability Through the CDP About 500 distinct datasets, 12 TB Diverse in type, size, and format Serving 900 different investigators.
April 2008ESG All-Hands meeting ESG Product Services Overview of components Issues in need of discussion Steve Hankin, NOAA/PMEL Roland Schweitzer, Weathertop.
1 Summary. 2 ESG-CET Purpose and Objectives Purpose  Provide climate researchers worldwide with access to data, information, models, analysis tools,
Weathertop Consulting, LLC Server-side OPeNDAP Analysis – Concrete steps toward a generalized framework via a reference implementation using F-TDS Roland.
LAS and THREDDS: Partners for Education Roland Schweitzer Steve Hankin Jonathan Callahan Joe Mclean Kevin O’Brien Ansley Manke Yonghua Wei.
IPCC WG II + III Requirements for AR5 Data Management GO-ESSP Meeting, Paris, Michael Lautenschlager, Hans Luthardt World Data Center Climate.
Distributed Data Servers and Web Interface in the Climate Data Portal Willa H. Zhu Joint Institute for the Study of Ocean and Atmosphere University of.
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
GO-ESSP The Earth System Grid The Challenges of Building Web Client Geo-Spatial Applications Eric Nienhouse NCAR.
Update on Unidata Technologies for Data Access Russ Rew
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
The CUAHSI Hydrologic Information System Spatial Data Publication Platform David Tarboton, Jeff Horsburgh, David Maidment, Dan Ames, Jon Goodall, Richard.
Access HDF5 Datasets via OPeNDAP’s Data Access Protocol (DAP)
Presentation transcript:

AR5 Data and Product Access Architecture Concepts for Discussion Steve Hankin (NOAA/PMEL) (Not including metadata architecture or security)

June '07 GO-ESSP 2 You’ve just heard Bryan’s thoughts on requirements (which probably resemble the following) –User needs -- by IT sophistication level (WG*) WG1 - physical processes –Raw files (on native grids) –CF subsets (potentially large – e.g. global) Native grid and regridded –Broad range of analyses (scope tbd by science community) –Intercomparison on hi-res global fields –Visualizations, tables, animations, … WG2,3 – regional impacts on life and societies; mitigation –CF subsets (regional) –Basic analysis (e.g. area averages, extrema) –Intercomparison on regional scale –Visualizations, tables –tab-delimited (“Excel”) –viz on globe (e.g.Google Earth), animations, …

June '07 GO-ESSP 3 Requirements, cont’d –Provider needs by IT capabilities level (est. 28?) contributing orgs Some providers not able to serve own data Deployable AR5 components (if any) must install easily at various infrastructures User authentication/access control –Data volumes 200+ TB (ESG proposal) – 20,000 TB (Bryan)

June '07 GO-ESSP 4 How AR4 did it –Central DB –Data sent on hard drives by postal service –All data regridded to same grid –QC via CMOR -- run at sites (scalable) –Some central analysis (summaries) –Massive data distribution from a central point AR4 Data Base: 30 Tbyte data collection 61,000 files

June '07 GO-ESSP 5 AR4 stumbling blocks Show stoppers: –Some ocean models could not be regridded to the AR4 grid without information loss (solved?) Difficulties –Unreliable disk drives –Headache to match CMOR requirements –No doubt many other war stories ….

Could we adapt the AR4 approach to AR5? ESG proposal asserts, “No”. “With an increasing number of users and an increasing quantity of data, it will no longer be feasible to carry out the requirements of AR5 with the centralized data management strategy utilized for AR4.” “With an increasing number of users and an increasing quantity of data, it will no longer be feasible to carry out the requirements of AR5 with the centralized data management strategy utilized for AR4.” Well, that’s the party line, anyway. Assertion: if necessary a centralized solution is again possible

June '07 GO-ESSP 7 Centralized approach Ship disks again –Disk drives today: $250 = 500 Gbytes –By AR5 time (24 months?), say, 2-5 Tbytes of disk could reasonably be mailed from each modeling site –With insistence on a standard drive model, might retain data on original disks –Up to 150Tbyte by this means –Who would step forward to take this burden

June '07 GO-ESSP 8 Centralized approach All data regridded to standard grid –Accept a sub-optimal resolution, but add GODAE-style hi-res fields (surface-only, selected sections and time series, etc.) GODAE-style hi-res fields (surface-only, selected sections and time series, etc.) Hi-res analysis results. E.g. vertical integrals Hi-res analysis results. E.g. vertical integrals

June '07 GO-ESSP 9 Could we adapt the AR4 approach to AR5? Major burdens on [whatever] host organization –Financial –Sysadmin headaches –Network loads –IO loads from subsetting Compromises in the flexibility of analyses (due to pre-computed fields) But it could work …

June '07 GO-ESSP 10 Why make this point ? The IT challenges that we are debating are an opportunity to demonstrate a new way of doing things –The risk is that we disappoint ourselves (as much as to AR5 science) What we want to demonstrate: –A “data grid” – a scalable, distributed approach –The potential of IT to improve how science is done –Enhanced collaboration

June '07 GO-ESSP 11 Time Tables Distributed technology has to be demonstrated in time for AR5 planners to make decisions. 18 months from now ( “early 2009” in the SciDAC proposal) for functioning testbed –Conclusions: Few (if any) new “standards” can be considered. Must work with the ones we have. Consider areas in need of further standardization as testing opportunities Code components should be running at at least a BETA level by (?when? 12 months?) [group sense?]

netCDF-CF files atomic datasets (aggregations) analyses (incl. regridding) products (viz, etc.) services (protocols) FTP OPeNDAP & WCS (*) OPeNDAP & WCS * - analysis embedded in URL. No syntax standard. (F-TDS?) multiple (**) ** - LAS request protocol; TDS/netCDF “fileout”; WMS? Services (protocols) Proposal: ESG Data and Product Access Stack

June '07 GO-ESSP 13 netCDF-CF files atomic datasets (aggregations) analyses (incl. regridding) products (viz, etc.) raw files desktop access & subsets Visualizations, tables & scripts Products ESG Data and Product Access Stack

June '07 GO-ESSP 14 Data suppliers internet Gateway node Data node

June '07 GO-ESSP 15 netCDF-CF files atomic datasets (aggregations) analyses (incl. regridding) products (viz, etc.) raw files desktop access & subsets Visualizations, tables & scripts O(1TB) How to distribute the layers on the nodes? O(10GB) O(0.1-10GB) O(1-10MB) Size of single data requests Which operations are feasible over the internet?

June '07 GO-ESSP 16 netCDF-CF files atomic datasets (aggregations) analyses (incl. regridding) products (viz, etc.) Gateway node netCDF-CF files atomic datasets (aggregations) analyses (incl. regridding) Data node Proposed deployment of stack layers based on output sizes Server-side analysis

netCDF-CF files atomic datasets (aggregations) analyses (incl. regridding) any node netCDF-CF files atomic datasets (aggregations) analyses (incl. regridding) any node Differencing: a standard analysis operation (and a perennial issue for model intercomparisons) Difference Regrid

netCDF-CF files atomic datasets (aggregations) analyses (incl. regridding) products (viz, etc.) Gateway node netCDF-CF files atomic datasets (aggregations) analyses (incl. regridding) any node Difference Regrid netCDF-CF files atomic datasets (aggregations) analyses (incl. regridding) Regrid Differencing: also doable in the product layer

June '07 GO-ESSP 19 netCDF-CF files atomic datasets (aggregations) analyses (incl. regridding) products (viz, etc.) An Existing Implementation TDS (w/ HYRAX?) F-TDS (a TDS plug-in) (“F” for ferret, but applicable to other legacy apps, too) LAS (using ferret, CDAT and other legacy apps)

F-TDS TDS IOService Provider Ferret (or other legacy app.) } Data provider supplies own regridding and analysis tools. Java CDAT Ferret Java Matlab Java (We need to standardize an analysis expression language.)

Workflow orchestration Backend Service metadata LAS API back end request (SOAP) Product Server Backend Service TDS OPeNDAP Legacy CDAT JDBC Legacy Ferret Service proxy LAS Architecture (v7) UI netCDF files SQL database Metadata (XML) GIS services Service API SOAP

June '07 GO-ESSP 22 Desktop:Matlab, IDL, IDV, Ferret, GrADS, … Information Products netCDF,ASCII, GIS layers

June '07 GO-ESSP 23 What products should AR5 offer ? A matter of policy tbd: –Each gateway node offers distinct products (CDAT, NCL, BADC, Ferret, Matlab, …) or –Standard set of products or –Some combination of these

June '07 GO-ESSP 24 One style of user experience: access to native coordinates and regridded fields

June '07 GO-ESSP 25 Large subsets may be created in batch mode

Visual model intercomparison

June '07 GO-ESSP 27 Segue from browser to desktop

June '07 GO-ESSP 28 Plot on Google Earth Fine structure materializes as we zoom in Display to Google Earth ?

June '07 GO-ESSP 29 An AR5-wide UI through HTML smoke and mirrors (“sister servers”)

June '07 GO-ESSP 30 Discussion (Thank you)

June '07 GO-ESSP 31 New LAS user interface (currently “alpha” level) Interact with the graphics

June '07 GO-ESSP 32