Download presentation
Presentation is loading. Please wait.
Published byColeen Hampton Modified over 9 years ago
1
Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779
2
Drach2Sept. 10, 2002 Overview I. Earth System Grid: Grid Access to Climate Research Data II. Metadata Standards for Gridded Climate Data
3
Part I ESG: Grid Access to Climate Research Data
4
Drach4Sept. 10, 2002 The goal of ESG is to make climate data – particularly climate model data – an easily accessible community resource. The project is funded by the SciDAC program: Scientific Discovery through Advanced Computing. Enabling researchers to understand and make effective use of very large, distributed climate datasets is critical. The broad strategy is to develop a collection of server-side capabilities – minimize the amount of data movement. Multiple interfaces to ESG will allow researchers to focus on science rather than issues of data transfer, format, and data set manipulation. Foundation is Globus Grid technology Earth System Grid Overview
5
Drach5Sept. 10, 2002 Globus middleware supports linkage of distributed data archives, supercomputers, workstations, local disk caches into data/computational grids. GridFTP: high-performance, secure, robust data transfer mechanism: protocol, server, client library. ESG is integrating OpenDAP (DODS protocol) with GridFTP protocol. Single sign-on using Grid Security Infrastructure Proxy certificates Community Authorization Service (CAS) Replica Location Service: manages copying and placement of files in a distributed environment. Logical vs. physical files http://www.globus.org ESG uses Globus Grid technology.
6
Drach6Sept. 10, 2002 ESG: U.S. Collaborations & Development ORNL: Climate storage & computational resources ORNL: Climate storage & computational resources ANL: Computational grids, & grid-based applications ANL: Computational grids, & grid-based applications USC/ISI: Computational grids, & grid-based applications USC/ISI: Computational grids, & grid-based applications NCAR: Climate change predication and scenarios NCAR: Climate change predication and scenarios LBNL: Climate storage Facility and access LBNL: Climate storage Facility and access LLNL: Model diagnostics & inter-comparison LLNL: Model diagnostics & inter-comparison
7
Drach7Sept. 10, 2002 Program for Climate Model Diagnosis and Intercomparison Validation and intercomparison of atmospheric general circulation models, coupled ocean-atmosphere models Development of analysis software, quality control, archiving, distribution of model results. Climate Data Analysis Tools (CDAT) is a Python-based analysis and visualization system. Global warming detection studies CMIP (coupled models) and AMIP (atmospheric GCMs) gather model simulation results from thirty modeling groups worldwide.
8
Drach8Sept. 10, 2002 PCMDI and Model Development Modeling groups PCMDI Diagnosis, quality control, data archival Simulation data Controlled simulation runs Feedback to modelers Gridded observation data Observations Data assimilation PCMDI
9
Drach9Sept. 10, 2002 ESG-II Architecture Portals Servers Middleware
10
Drach10Sept. 10, 2002 ESG: Metadata Services METADATA EXTRACTION METADATA EXTRACTION METADATA DISPLAY METADATA DISPLAY METADATA BROWSING METADATA BROWSING METADATA QUERY METADATA QUERY ESG CLIENTS API & USER INTERFACES Data & Metadata Catalog Dublin Core Database CF Database mirror Dublin Core XML Files COMMENTS XML Files METADATA HOLDINGS METADATA ANNOTATION METADATA ANNOTATION METADATA VALIDATION METADATA VALIDATION METADATA ACCESS (update, insert, delete, query) METADATA ACCESS (update, insert, delete, query) SERVICE TRANSLATION LIBRARY SERVICE TRANSLATION LIBRARY CORE METADATA SERVICES METADATA AGGREGATION METADATA AGGREGATION METADATA DISCOVERY METADATA DISCOVERY METADATA & DATA REGISTRATION METADATA & DATA REGISTRATION PUBLISHING HIGH LEVEL METADATA SERVICES SEARCH & DISCOVERY ADMINISTRATION BROWSING & DISPLAY ANALYSIS & VISUALIZATION
11
Drach11Sept. 10, 2002 OpenDAP (DODS): Distributed Oceanographic Data System (Unidata) Integrations of Globus GridFTP, DODS data access THREDDS: THematic Real ‑ time Environmental Distributed Data Services (Unidata) LAS: Live Access Server (NOAA Pacific Marine Environmental Laboratory) Works with CDAT, Ferret, GrADS, … CDAT: Climate Data Analysis Tools (PCMDI), includes CDMS: Climate Data Management System, VCDAT visualization Community Data Portal project (NCAR) NCL (NCAR) Globus Grid technology(ANL, ISI): GridFTP, CAS Community Authorization Service ESG is leveraging off existing software and projects.
12
Drach12Sept. 10, 2002 CDAT: Example of an ESG GUI Client Access
13
Drach13Sept. 10, 2002 LAS/CDAT: Example of a Web- based Data Portal Technology: Web Based (end user requirements) LAS, DODS, ESG (i.e., Globus), CDAT Portal should hide/simplify the Grid for users Single sign-on Community-based authorization Simplified resource location Remote job submission, management Accesses the ESG Grid Testbed
14
Part II Metadata Standards for Gridded Climate Data
15
Drach15Sept. 10, 2002 Most climate simulation data are in the form of gridded datasets: collections of variables as a function of longitude, latitude, time, and vertical level. A dataset is a logical container: A file An aggregation of files A collection of database tables Model-generated data Model data Derived data: zonal averages, global averages, virtual variables Observational data, including reanalyses Attributes in the form of (name, value) pairs, array values Climate Model Datasets
16
Drach16Sept. 10, 2002 Suitable basis for storing data, but lack the metadata to support certain application requirements netCDF (UCAR) array data model flexible attribute/value metadata model simple API HDF (NCSA, NASA) collection of APIs, can be tailored to specific data models including scientific data sets, satellite data, point data Binary formats
17
Drach17Sept. 10, 2002 GRIB (WMO, ECMWF, NCEP) mixed sequential/array data model tailored for simulation output, supports common horizontal grid types hardwired metadata model good compression capabilities lacks a standard API Binary formats
18
Drach18Sept. 10, 2002 Self-describing binary formats are flexible, but underconstrain representation of coordinate systems. Coordinate Systems Index Space Variable Space Coordinate Space Coordinate System Time(i) Latitude(j,k) Longitude(j,k) V = Temperature(Time, Latitude, Longitude) V’ = Temperature(i,j,k)
19
Drach19Sept. 10, 2002 Curvilinear grid - Los Alamos POP ocean model Horizontal Grids Temperature(i,j) Latitude(i,j) Longitude(i,j) Lat_bounds(i,j,4) Lon_bounds(i,j,4)
20
Drach20Sept. 10, 2002 Reduced grid Horizontal Grids Temperature(i,j) Latitude(i) Longitude(i,j) Lat_bounds(i,2) Lon_bounds(i,j,4)
21
Drach21Sept. 10, 2002 General grid – Colorado State geodesic grid Horizontal Grids Temperature(npts) Latitude(npts) Longitude(npts) Lat_bounds(npts,6) Lon_bounds(npts,6)
22
Drach22Sept. 10, 2002 Applications must be able to recognize the spatial/temporal coordinate axes. Visualization: continental overlays Data: selection by axis type Spatial/temporal location file = cdms.open(‘sample.nc’) temperature = file[‘temperature’] data = temperature(latitude=(-45.0, 45.0)) file = cdms.open(‘sample.nc’) temperature = file[‘temperature’] data = temperature(latitude=(-45.0, 45.0))
23
Drach23Sept. 10, 2002 Climate simulations use different types of calendars ‘proleptic’ Gregorian Julian Mixed Gregorian/Julian No leap years (noleap) 30-day months Climatologies represent multi-year averages. Time representation and calendars
24
Drach24Sept. 10, 2002 Several conventions have been developed to augment the netCDF data model. Represent a balance between needs of data producers and data consumers. COARDS convention 1D coordinates axes, rectilinear horizontal grids axis identification based on units variables limited to four dimensions ordering of dimensions fixed http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf_profile.html Metadata conventions
25
Drach25Sept. 10, 2002 CF (Climate and Forecast) convention Based on earlier conventions, COARDS and GDT multidimensional coordinates (auxiliary coordinate variables) simplified axis identification specific representation for several horizontal grid types rectilinear curvilinear reduced grids variables can have an arbitrary number of dimensions no constraint on ordering of dimensions non-Gregorian calendars standard name table http://www.cgd.ucar.edu/cms/eaton/cf-metadata/ Metadata conventions
26
Drach26Sept. 10, 2002 Ability to recognize comparable quantities is fundamental to model intercomparison. CF defines a schema for standard name tables XML representation used for table of standard variable names and descriptions standard_name attribute is optional. No restriction on variable names. Relationship to ontology development? Comparability of quantities Program for Climate Model Diagnosis and Intercomparison support@pcmdi.llnl.gov Pa Pressure defined at the level of the mean topography within the grid box. air_pressure_at_sea_level
27
Drach27Sept. 10, 2002 ESG has adopted the netCDF data model and the CF convention as standards Other standards and conventions will follow. NcML markup language. ESG metadata
28
Drach28Sept. 10, 2002 CF and NcML apply to data aggregates as well as files Data aggregation: collections of files/datasets are treated as single entities. array model netCDF-like tailored for extraction of 'hyperslabs' of data Aspects of aggregation: combining/merging variables joining variables creating new coordinate axes overlaying/adding metadata nesting datasets Aggregation
29
Drach29Sept. 10, 2002 Aggregation maps well to multifile datasets: multifile datasets can be thought of as 'partitioned' into files. Variables may 'span' multiple files. Usually a dataset is partitioned on time and/or vertical level axes. PCMDI CDAT supports aggregations via the cdscan utility, uses XML representation THREDDS/DODS aggregation server (http://www.unidata.ucar.edu/project s/THREDDS/) Aggregation Time Level Variable
30
Drach30Sept. 10, 2002 The Earth System Grid project is developing metadata services to support a variety of schemas and conventions. The initial focus of ESG is to enable climate researchers to make effective use of distributed, model-generated datasets. The netCDF schema and CF convention are the foundation for representation of this data. Summary
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.