Data Standards at the IRI Data Library

Slides:



Advertisements
Similar presentations
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
Advertisements

Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
Depositing e-material to The National Library of Sweden.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
The IRI Climate Data Library: translating between data cultures Benno Blumenthal International Research Institute for Climate Prediction Columbia University.
NcML Aggregation vs Feature Collections. NcML functionality 1.Modify the objects found in CDM files – Especially Attributes – Don’t have to rewrite the.
RDF and triplestores CMSC 461 Michael Wilson. Reasoning  Relational databases allow us to reason about data that is organized in a specific way  Data.
IRI Data Library: enhancing accessibility of climate knowledge M. Benno Blumenthal, Michael Bell, John del Corral, Rémi Cousin, and Igor Khomyakov.
Recent developments with the THREDDS Data Server (TDS) and related Tools: covering TDS, NCML, WCS, forecast aggregation and not including stuff covered.
M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society OpenDAP 2007
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
Server-side Analysis and a Semantic Framework for Metadata M. Benno Blumenthal International Research Institute for Climate and Society Columbia University.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
The HDF Group Data Interoperability The HDF Group Staff Sep , 2010HDF/HDF-EOS Workshop XIV1.
Information Technology: GrADS INTEGRATED USER INTERFACE Maps, Charts, Animations Expressions, Functions of Original Variables General slices of { 4D Grids.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Data Interoperability at the IRI: translating between data cultures Benno Blumenthal International Research Institute for Climate Prediction Columbia University.
Weathertop Consulting, LLC Server-side OPeNDAP Analysis – Concrete steps toward a generalized framework via a reference implementation using F-TDS Roland.
IRI/LDEO Climate Data Library M.Benno Blumenthal, Michael Bell, John del Corral, Remi Cousin, and Haibo Liu International Research Institute for Climate.
Semantic Web underpinnings of the IRI Data Library Semantic Web as a Framework for Multiple Metadata IRI Data Library: presenting Data in multiple frameworks.
M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society Using a Resource.
OGC Web Services with complex data Stephen Pascoe How OGC Web Services relate to GML Application Schema.
IRI Data Library Faceted Search: an example of RDF-based faceted search for climate data Drawing on multiple ontologies to build an application Using inference.
NcBrowse: OPeNDAP Server Access and 3-D Graphics Presented by Nancy N. Soreide NOAA/PMEL Donald W. Denbo UW/JISAO-NOAA/PMEL.
M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society Use of RDF/OWL.
Using the Semantic Web M. Benno Blumenthal International Research Institute for Climate and Society Columbia University 31 July 2012 CU Metadata Group.
An Introduction to the Semantic Web M. Benno Blumenthal International Research Institute for Climate and Society Columbia University 2 November 2011.
IRI/LDEO Climate Data Library M.Benno Blumenthal, Michael Bell, and John del Corral International Research Institute for Climate and Society Columbia University.
Java Web Services Orca Knowledge Center – Web Service key concepts.
The CUAHSI Hydrologic Information System Spatial Data Publication Platform David Tarboton, Jeff Horsburgh, David Maidment, Dan Ames, Jon Goodall, Richard.
CS 325 Spring ‘09 Chapter 1 Goals:
Data Browsing/Mining/Metadata
The Next Generation Network Enabled Weather (NNEW) SWIM Application
Introduction To DBMS.
IST 220 – Intro to Databases
MS Access Forms, Queries, Reports Matt Martin
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Chapter 1: Introduction
IRI Data Library Overview
Other Services in Hyrax
Chapter 1: Introduction
The IRI was founded approximately 12 years ago with a mission To enhance society’s capability to understand, anticipate and manage the impacts of climate.
Web Engineering.
Accessing Spatial Information from MaineDOT
Transport and Access of Data, Metadata, and Semantics using RDF
IRI Data Library: enhancing accessibility of climate knowledge
CEOP/IGWCO Joint Meeting, Feb.28  March 4, University of Tokyo, Japan
IRI/LDEO Climate Data Library
IRI/LDEO Climate Data Library
IRI/LDEO Climate Data Library
Chapter 2 Database Environment.
IRI/LDEO Climate Data Library
IRI/LDEO Climate Data Library
IRI Data Library Overview
CEE 6440 GIS in Water Resources Fall 2004 Term Paper Presentation
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Chapter 10 ADO.
IRI Data Library Faceted Search: an example of
M.Benno Blumenthal, Michael Bell,
Tutorial 7 – Integrating Access With the Web and With Other Programs
IN32A-05 The IRI/LDEO Climate Data Library: Helping People use Climate Data M.Benno Blumenthal, Emily Grover-Kopec, Michael Bell, and John del Corral.
Chapter 1: Introduction
Chapter 1: Introduction
IRI/LDEO Climate Data Library
Chapter 1: Introduction
IRI Data Library efforts focus on making climate and other data products more widely accessible through tool development, data organization and transformation,
Chapter 1: Introduction
Future Development Plans
SDMX IT Tools SDMX Registry
Presentation transcript:

Data Standards at the IRI Data Library M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University http://iridl.ldeo.columbia.edu/

Current Data Exchange Standards There are many of them Some are flexible but semantically weak Others are semantically specific but not sufficiently flexible We are working on this …

Data Library Overview multidimensional Specialized Data Tools Maproom Generalized Data Tools Data Viewer Data Language Dataset Variable ivar multidimensional So here is the IRI/LDEO Data Library shot at this, connecting the space of data and data manipulations. At the bottom we have the compute engine/data organization, which is what maps the data/manipulation space into URLs, i.e. the WWW. Built on top of that are some general data tools, i.e. they can be applied to any dataset and adapt accordingly. There is a data language, making it possible to specify sophisticated analy ses. And there is a data viewer, making it possible to quickly graph data in a number of standard ways. And there are also more specialized tools, designed for particular audiences to view specific things. We have a Maproom (soon to be or already map rooms) which contains continuously updated views of aspects of the climate system, as well as specialized tools aimed at particular audiences that let a use extract views/data with a few clicks. There is a tradeoff here: the general tools are great, but require a user to navigate a vast set of datasets and a vast set of possible manipulations, which not everybody is up to. The specialized tools are a way to make sophisticated c alculations easy to access. IRI Data Collection URL/URI for data, calculations, figs, etc

“geolocated by lat/lon” multidimensional “geolocation by IRI Data Collection Economics Public Health “geolocated by entity” Ocean/Atm “geolocated by lat/lon” multidimensional GIS “geolocation by vector object or projection metadata” spectral harmonics equal-area grids GRIB grid codes climate divisions IRI Data Collection Dataset Variable ivar multidimensional Data Cultures “Broadly Speaking” We started with Oceans/Atm – multidimensional geolocated by lat/lon – with exceptions that tend to get handled in non-standard ways. GRIB in some ways is the 600 lb gorilla, since it is very similar in style to the WCS standard in that metadata carries the geolocation, but, of course, it is a difficult if-not- impossible standard to completely code for. Economics/Public Health geolocation by entity, mostly tables GIS geolocation by vector object or by projection metadata -- mostly a 2D mindset in the tools, which makes time analysis of data difficult. IRI Data Collection – nested datasets, multidimensional variables with independent variables All variables have attributes which can affect the way the data is processed and/or displayed We use dimensions for a lot –lon,lat,height,time forecast time, lead time, eigenvalue number, member number, country, district, category Dataflow, delayed execution architecture, which means one can usefully define many dimensions even when one tends to only evaluate a few realizations at a time.

GRIB netCDF images binary Database Tables queries OpenDAP IRI Data Collection GRIB netCDF images binary spreadsheets shapefiles Database Tables queries Servers OpenDAP THREDDS images w/proj IRI Data Collection Dataset Variable ivar Having got all that data into the data library, we can process it in a uniform way. The data structure leads directly to calculations and “virtual variables”, i.e. many of the Data Library entries are actually calculations done on other entities, e.g. PressureLevel data zonal velocities computed from hybrid level divergence and vorticity, or sea surface temperature anomaly computed from sea surface temperature and sea surface temperature climatology User Interface – the data collection structure and metadata is used to generate a web interface to the data – provides navigation through the datasets, a viewer that slices and dices, many manipulations and calculations. We also generate output Data Files in many different formats, as well as tables of various kinds, many of which are useful to one kind of user or another, i.e. different data cultures have different preferred formats. Atm/Ocean like netcdf and straight binary, GIS prefers sets of images, Public Health prefers tables. We also act as a data server using OpenDAP and THREDDS, again mostly useful for Ocean/Atm We have perhaps implemented OpenGIS Web Map Server v1.3 – a bit of a mistake, since v1.3 is the next version rather than the currently widely used one. Time will correct this, with any luck. It is important to note that everything following from the structure and attributes in the IRI Data Collection, no additional configuration is done to control the conversions to different fromats or to serve the data with different protocols. Not all data can be served in all formats. OpenDAP/THREDDS is particularly important because it can express any dataset and/or any analysis, so that I can transfer calculations between servers. At least, it will once I code transmission of SimpleFeatures with OpenDAP.

descriptive and navigational pages IRI Data Collection GRIB netCDF images binary spreadsheets shapefiles Database Tables queries Servers OpenDAP THREDDS images w/proj IRI Data Collection Dataset Variable ivar Calculations “virtual variables” images graphics descriptive and navigational pages Having got all that data into the data library, we can process it in a uniform way. The data structure leads directly to calculations and “virtual variables”, i.e. many of the Data Library entries are actually calculations done on other entities, e.g. PressureLevel data zonal velocities computed from hybrid level divergence and vorticity, or sea surface temperature anomaly computed from sea surface temperature and sea surface temperature climatology User Interface – the data collection structure and metadata is used to generate a web interface to the data – provides navigation through the datasets, a viewer that slices and dices, many manipulations and calculations. We also generate output Data Files in many different formats, as well as tables of various kinds, many of which are useful to one kind of user or another, i.e. different data cultures have different preferred formats. Atm/Ocean like netcdf and straight binary, GIS prefers sets of images, Public Health prefers tables. We also act as a data server using OpenDAP and THREDDS, again mostly useful for Ocean/Atm We have perhaps implemented OpenGIS Web Map Server v1.3 – a bit of a mistake, since v1.3 is the next version rather than the currently widely used one. Time will correct this, with any luck. It is important to note that everything following from the structure and attributes in the IRI Data Collection, no additional configuration is done to control the conversions to different fromats or to serve the data with different protocols. Not all data can be served in all formats. OpenDAP/THREDDS is particularly important because it can express any dataset and/or any analysis, so that I can transfer calculations between servers. At least, it will once I code transmission of SimpleFeatures with OpenDAP. Clients OpenDAP THREDDS Data Files netcdf binary images Tables OpenGIS WMS v1.3

OpenDAP OpenDAP: very important to us because we can act as both a client and as a server, and because it is flexible enough to represent all our calculations (“virtual variables”), i.e. a user can specify an analysis and export it. At the moment we cannot read shapefile data using it (and the serving of shapes over OpenDAP is consequently untested), but hopefully that is temporary Impedance mismatch is low

Other Important Standards netcdf GRIB GEOTIFF Shapefiles vs. PostGIS in Postgres (OGC compliant)

Standards becoming important to us (we think) OGC: GIS Conceptual Framework OGC: WMS, WFS, WCS These are designed to be partial – we will have many datasets/analyses that we cannot transfer using these protocols

Interoperability requires Semantics Currently we have some numeric interoperability, but we have a long ways to go for semantic interoperability

Standard Metadata Schema/Data Services Datasets Tools Users

Many Data Communities Tools Users Datasets Standard Metadata Schema

Super Schema Standard metadata schema Tools Users Datasets

Super Schema: direct Standard metadata schema/data service Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema

Flaws A lot of work Super Schema/Service is the Lowest-Common-Denominator Science keeps evolving, so that standards either fall behind or constantly change

RDF Standard Data Model Exchange Standard metadata schema RDF RDF RDF Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema RDF RDF RDF Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema Tools Users Datasets Standard Metadata Schema

RDF Data Model Exchange Standard metadata schema RDF Tools Users Datasets Standard Metadata Schema RDF Tools Users Datasets Standard Metadata Schema RDF Tools Users Datasets Standard Metadata Schema RDF Tools Users Datasets Standard Metadata Schem RDF Tools Users Datasets Standard Metadata Schema RDF

RDF Architecture queries queries queries Virtual (derived) RDF RDF

Why is this better? Maps the original dataset metadata into a standard format that can be transported and manipulated Still the same impedance mismatch when mapped to the least-common-denominator standard metadata, but When a better standard comes along, the original complete-but-nonstandard metadata is already there to be remapped, and “late semantic binding” means everyone can use the new semantic mapping Can uses enhanced mappings between models that are close EASIER – these are tools to enhance the mapping process

Key Features of RDF/OWL Web-based Framework for writing down and interrelating semantic standards Non-contextual Modeling: data object relationships are stated explicitly, not inferred from context Late-Semantic-Binding: semantics do not alter transport/storage, semantic mapping can be added later as scientific fields evolve Not much track record – yet

RDF vs. XML Schema RDF is usually transported as XML So it is XML But it differs from XML Schema in that the Schema is not fixed beforehand XML Schema – a prearranged exchange RDF/XML – add to/query an information space

Sample Tool: Faceted Search http://iridl.ldeo.columbia.edu/ontologies/query2.pl?...