Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779.

Slides:



Advertisements
Similar presentations
Earth System Curator Spanning the Gap Between Models and Datasets.
Advertisements

Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech.
Recent Work in Progress
The Model Output Interoperability Experiment in the Gulf of Maine: A Success Story Made Possible By CF, NcML, NetCDF-Java and THREDDS Rich Signell (USGS,
Integrating NOAA’s Unified Access Framework in GEOSS: Making Earth Observation data easier to access and use Matt Austin NOAA Technology Planning and Integration.
Toni Saarinen, Tite4 Tomi Ruuska, Tite4 Earth System Grid - ESG.
Earth System Grid: Model Data Distribution & Server-Side Analysis to Enable Intercomparison Projects PCMDI Software Team UCRL-PRES
The Earth System Grid Discovery and Semantic Web Technologies Line Pouchard Oak Ridge National Laboratory Luca Cinquini, Gary Strand National Center for.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
Unidata TDS Workshop THREDDS Data Server Overview October 2014.
Introduction Downloading and sifting through large volumes of data stored in differing formats can be a time-consuming and sometimes frustrating process.
CCSM Portal/ESG/ESGC Integration (a PY5 GIG project) Lan Zhao, Carol X. Song Rosen Center for Advanced Computing Purdue University With contributions by:
Research Data at NCAR 1 August, 2002 Steven Worley Scientific Computing Division Data Support Section.
GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading.
Multidimensional Data and GIS Steve Kopp Nawajish Noman ESRI.
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
DM_PPT_NP_v01 SESIP_0715_AJ HDF Product Designer Aleksandar Jelenak, H. Joe Lee, Ted Habermann Gerd Heber, John Readey, Joel Plutchak The HDF Group HDF.
Ian Foster Argonne National Lab University of Chicago Globus Project The Grid and Meteorology Meteorology and HPN Workshop, APAN.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
ESG The Earth System Grid (ESG) Presented by Don Middleton & Luca Cinquini NCAR Scientific Computing Division On Behalf of the ESG Team SCD Executive Committee.
Mid-Course Review: NetCDF in the Current Proposal Period Russ Rew
The Earth System Grid (ESG) Goals, Objectives and Strategies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
Integrated Model Data Management S.Hankin ESMF July ‘04 Integrated data management in the ESMF (ESME) Steve Hankin (NOAA/PMEL & IOOS/DMAC) ESMF Team meeting.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Unidata TDS Workshop THREDDS Data Server Overview
TPAC Tasmanian Partnership for Advanced Computing Partner in APAC (Australian Partnership for Advanced Computing) Expertise centre for Earth Systems Science.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The Earth System Grid: A Visualisation Solution Gary Strand.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
Web Portal Design Workshop, Boulder (CO), Jan 2003 Luca Cinquini (NCAR, ESG) The ESG and NCAR Web Portals Luca Cinquini NCAR, ESG Outline: 1.ESG Data Services.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)
- Vendredi 27 mars PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL.
Fox 2 AISRP April 4-6, 2005  Earth System Grid  Grid-enabled OPeNDAP  Architecture - Server and Application access  Framework experience.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
00/XXXX 1 Data Processing in PRISM Introduction. COCO (CDMS Overloaded for CF Objects) What is it. Why is COCO written in Python. Implementation Data Operations.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
1 Overall Architectural Design of the Earth System Grid.
SCD Research Data Archives; Availability Through the CDP About 500 distinct datasets, 12 TB Diverse in type, size, and format Serving 900 different investigators.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
In Vivo Imaging Middleware and Applications RSNA 2007 Berkant Barla Cambazoglu The Ohio State University Department of Biomedical Informatics.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
LAS and THREDDS: Partners for Education Roland Schweitzer Steve Hankin Jonathan Callahan Joe Mclean Kevin O’Brien Ansley Manke Yonghua Wei.
SCD User Briefing The Community Data Portal and the Earth System Grid Don Middleton with presentation material developed by Luca Cinquini, Mary Haley,
UC 2006 Tech Session 1 NetCDF in ArcGIS 9.2. UC 2006 Tech Session2 Overview Introduction to Multidimensional DataIntroduction to Multidimensional Data.
Semantic Web underpinnings of the IRI Data Library Semantic Web as a Framework for Multiple Metadata IRI Data Library: presenting Data in multiple frameworks.
OGC Web Services with complex data Stephen Pascoe How OGC Web Services relate to GML Application Schema.
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
GO-ESSP The Earth System Grid The Challenges of Building Web Client Geo-Spatial Applications Eric Nienhouse NCAR.
5-7 May 2003 SCD Exec_Retr 1 Research Data, May Archive Content New Archive Developments Archive Access and Provision.
Update on Unidata Technologies for Data Access Russ Rew
NcBrowse: A Graphical netCDF File Browser Donald Denbo NOAA-PMEL/UW-JISAO
The CF Conventions: Governance and Community Issues in Establishing Standards for Representing Climate, Forecast, and Observational Data Russ Rew 1, Bob.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Data Browsing/Mining/Metadata
The Earth System Grid: A Visualisation Solution
Multidimensional Data and GIS
Data Requirements for Climate and Carbon Research
HAO/SCD: VO, metadata, catalogs, ontologies, querying
Metadata Development in the Earth System Curator
Robert Dattore and Steven Worley
Presentation transcript:

Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES

Drach2Sept. 10, 2002 Overview I. Earth System Grid: Grid Access to Climate Research Data II. Metadata Standards for Gridded Climate Data

Part I ESG: Grid Access to Climate Research Data

Drach4Sept. 10, 2002  The goal of ESG is to make climate data – particularly climate model data – an easily accessible community resource. The project is funded by the SciDAC program: Scientific Discovery through Advanced Computing.  Enabling researchers to understand and make effective use of very large, distributed climate datasets is critical. The broad strategy is to develop a collection of server-side capabilities – minimize the amount of data movement.  Multiple interfaces to ESG will allow researchers to focus on science rather than issues of data transfer, format, and data set manipulation.  Foundation is Globus Grid technology Earth System Grid Overview

Drach5Sept. 10, 2002  Globus middleware supports linkage of distributed data archives, supercomputers, workstations, local disk caches into data/computational grids.  GridFTP: high-performance, secure, robust data transfer mechanism: protocol, server, client library.  ESG is integrating OpenDAP (DODS protocol) with GridFTP protocol.  Single sign-on using Grid Security Infrastructure  Proxy certificates  Community Authorization Service (CAS)  Replica Location Service: manages copying and placement of files in a distributed environment.  Logical vs. physical files  ESG uses Globus Grid technology.

Drach6Sept. 10, 2002 ESG: U.S. Collaborations & Development ORNL: Climate storage & computational resources ORNL: Climate storage & computational resources ANL: Computational grids, & grid-based applications ANL: Computational grids, & grid-based applications USC/ISI: Computational grids, & grid-based applications USC/ISI: Computational grids, & grid-based applications NCAR: Climate change predication and scenarios NCAR: Climate change predication and scenarios LBNL: Climate storage Facility and access LBNL: Climate storage Facility and access LLNL: Model diagnostics & inter-comparison LLNL: Model diagnostics & inter-comparison

Drach7Sept. 10, 2002 Program for Climate Model Diagnosis and Intercomparison  Validation and intercomparison of atmospheric general circulation models, coupled ocean-atmosphere models  Development of analysis software, quality control, archiving, distribution of model results. Climate Data Analysis Tools (CDAT) is a Python-based analysis and visualization system.  Global warming detection studies  CMIP (coupled models) and AMIP (atmospheric GCMs) gather model simulation results from thirty modeling groups worldwide.

Drach8Sept. 10, 2002 PCMDI and Model Development Modeling groups PCMDI Diagnosis, quality control, data archival Simulation data Controlled simulation runs Feedback to modelers Gridded observation data Observations Data assimilation PCMDI

Drach9Sept. 10, 2002 ESG-II Architecture Portals Servers Middleware

Drach10Sept. 10, 2002 ESG: Metadata Services METADATA EXTRACTION METADATA EXTRACTION METADATA DISPLAY METADATA DISPLAY METADATA BROWSING METADATA BROWSING METADATA QUERY METADATA QUERY ESG CLIENTS API & USER INTERFACES Data & Metadata Catalog Dublin Core Database CF Database mirror Dublin Core XML Files COMMENTS XML Files METADATA HOLDINGS METADATA ANNOTATION METADATA ANNOTATION METADATA VALIDATION METADATA VALIDATION METADATA ACCESS (update, insert, delete, query) METADATA ACCESS (update, insert, delete, query) SERVICE TRANSLATION LIBRARY SERVICE TRANSLATION LIBRARY CORE METADATA SERVICES METADATA AGGREGATION METADATA AGGREGATION METADATA DISCOVERY METADATA DISCOVERY METADATA & DATA REGISTRATION METADATA & DATA REGISTRATION PUBLISHING HIGH LEVEL METADATA SERVICES SEARCH & DISCOVERY ADMINISTRATION BROWSING & DISPLAY ANALYSIS & VISUALIZATION

Drach11Sept. 10, 2002  OpenDAP (DODS): Distributed Oceanographic Data System (Unidata)  Integrations of Globus GridFTP, DODS data access  THREDDS: THematic Real ‑ time Environmental Distributed Data Services (Unidata)  LAS: Live Access Server (NOAA Pacific Marine Environmental Laboratory)  Works with CDAT, Ferret, GrADS, …  CDAT: Climate Data Analysis Tools (PCMDI), includes CDMS: Climate Data Management System, VCDAT visualization  Community Data Portal project (NCAR)  NCL (NCAR)  Globus Grid technology(ANL, ISI): GridFTP, CAS Community Authorization Service ESG is leveraging off existing software and projects.

Drach12Sept. 10, 2002 CDAT: Example of an ESG GUI Client Access

Drach13Sept. 10, 2002 LAS/CDAT: Example of a Web- based Data Portal  Technology: Web Based (end user requirements) LAS, DODS, ESG (i.e., Globus), CDAT  Portal should hide/simplify the Grid for users Single sign-on Community-based authorization Simplified resource location Remote job submission, management  Accesses the ESG Grid Testbed

Part II Metadata Standards for Gridded Climate Data

Drach15Sept. 10, 2002  Most climate simulation data are in the form of gridded datasets: collections of variables as a function of longitude, latitude, time, and vertical level.  A dataset is a logical container:  A file  An aggregation of files  A collection of database tables  Model-generated data  Model data  Derived data: zonal averages, global averages, virtual variables  Observational data, including reanalyses  Attributes in the form of (name, value) pairs, array values Climate Model Datasets

Drach16Sept. 10, 2002  Suitable basis for storing data, but lack the metadata to support certain application requirements  netCDF (UCAR)  array data model  flexible attribute/value metadata model  simple API  HDF (NCSA, NASA)  collection of APIs, can be tailored to specific data models including scientific data sets, satellite data, point data Binary formats

Drach17Sept. 10, 2002  GRIB (WMO, ECMWF, NCEP)  mixed sequential/array data model  tailored for simulation output, supports common horizontal grid types  hardwired metadata model  good compression capabilities  lacks a standard API Binary formats

Drach18Sept. 10, 2002  Self-describing binary formats are flexible, but underconstrain representation of coordinate systems. Coordinate Systems Index Space Variable Space Coordinate Space Coordinate System Time(i) Latitude(j,k) Longitude(j,k) V = Temperature(Time, Latitude, Longitude) V’ = Temperature(i,j,k)

Drach19Sept. 10, 2002  Curvilinear grid - Los Alamos POP ocean model Horizontal Grids Temperature(i,j) Latitude(i,j) Longitude(i,j) Lat_bounds(i,j,4) Lon_bounds(i,j,4)

Drach20Sept. 10, 2002  Reduced grid Horizontal Grids Temperature(i,j) Latitude(i) Longitude(i,j) Lat_bounds(i,2) Lon_bounds(i,j,4)

Drach21Sept. 10, 2002  General grid – Colorado State geodesic grid Horizontal Grids Temperature(npts) Latitude(npts) Longitude(npts) Lat_bounds(npts,6) Lon_bounds(npts,6)

Drach22Sept. 10, 2002  Applications must be able to recognize the spatial/temporal coordinate axes.  Visualization: continental overlays  Data: selection by axis type Spatial/temporal location file = cdms.open(‘sample.nc’) temperature = file[‘temperature’] data = temperature(latitude=(-45.0, 45.0)) file = cdms.open(‘sample.nc’) temperature = file[‘temperature’] data = temperature(latitude=(-45.0, 45.0))

Drach23Sept. 10, 2002  Climate simulations use different types of calendars  ‘proleptic’ Gregorian  Julian  Mixed Gregorian/Julian  No leap years (noleap)  30-day months  Climatologies represent multi-year averages. Time representation and calendars

Drach24Sept. 10, 2002  Several conventions have been developed to augment the netCDF data model.  Represent a balance between needs of data producers and data consumers.  COARDS convention  1D coordinates axes, rectilinear horizontal grids  axis identification based on units  variables limited to four dimensions  ordering of dimensions fixed  Metadata conventions

Drach25Sept. 10, 2002  CF (Climate and Forecast) convention  Based on earlier conventions, COARDS and GDT  multidimensional coordinates (auxiliary coordinate variables)  simplified axis identification  specific representation for several horizontal grid types  rectilinear  curvilinear  reduced grids  variables can have an arbitrary number of dimensions  no constraint on ordering of dimensions  non-Gregorian calendars  standard name table  Metadata conventions

Drach26Sept. 10, 2002  Ability to recognize comparable quantities is fundamental to model intercomparison.  CF defines a schema for standard name tables  XML representation used for table of standard variable names and descriptions  standard_name attribute is optional. No restriction on variable names.  Relationship to ontology development? Comparability of quantities Program for Climate Model Diagnosis and Intercomparison Pa Pressure defined at the level of the mean topography within the grid box. air_pressure_at_sea_level

Drach27Sept. 10, 2002  ESG has adopted the netCDF data model and the CF convention as standards  Other standards and conventions will follow.  NcML markup language. ESG metadata

Drach28Sept. 10, 2002  CF and NcML apply to data aggregates as well as files  Data aggregation: collections of files/datasets are treated as single entities.  array model  netCDF-like  tailored for extraction of 'hyperslabs' of data  Aspects of aggregation:  combining/merging variables  joining variables  creating new coordinate axes  overlaying/adding metadata  nesting datasets Aggregation

Drach29Sept. 10, 2002  Aggregation maps well to multifile datasets: multifile datasets can be thought of as 'partitioned' into files. Variables may 'span' multiple files.  Usually a dataset is partitioned on time and/or vertical level axes.  PCMDI CDAT supports aggregations via the cdscan utility, uses XML representation  THREDDS/DODS aggregation server ( s/THREDDS/) Aggregation Time Level Variable

Drach30Sept. 10, 2002  The Earth System Grid project is developing metadata services to support a variety of schemas and conventions.  The initial focus of ESG is to enable climate researchers to make effective use of distributed, model-generated datasets.  The netCDF schema and CF convention are the foundation for representation of this data. Summary