Data Interoperability at the IRI: translating between data cultures Benno Blumenthal International Research Institute for Climate Prediction Columbia University.

Slides:



Advertisements
Similar presentations
The Next Generation Network Enabled Weather (NNEW) SWIM Application Asia/Pacific AMHS/SWIM Workshop Chaing Mai, Thailand March 5-7, 2012 Tom McParland,
Advertisements

A Unified Data Model and Programming Interface for Working with Scientific Data Doug Lindholm Laboratory for Atmospheric and Space Physics University of.
Hope Foley Perpetual Technologies SQL Saturday #51 - Nashville.
GIS Overview. What is GIS? GIS is an information system that allows for capture, storage, retrieval, analysis and display of spatial data.
Geographic Information Systems
Data Discovery 1.Data discovery systems 2.TDS and metadata 3.NcISO services.
ESRM 250 & CFR 520: Introduction to GIS © Phil Hurvitz, KEEP THIS TEXT BOX this slide includes some ESRI fonts. when you save this presentation,
Lecture 4 Data. Why GIS? Ask questions Solve a problem Support a decision Make Maps Involve others, share data, procedures, ideas.
OPeNDAP and the Data Access Protocol (DAP) Original version by Dave Fulker.
VrRBO with THREDDS data store. Paths & URLs THREDDS server THREDDS data directory.
Contributions from TU Dresden / GLUES Fakultät Forst, Geo- und Hydrowissenschaften, Fachrichtung Geowissenschaften, Professur Geoinformationssysteme Matthias.
MapServer-OGR-OPeNDAP: An Integrated System for Uniform Access to Land and Oceanographic Datasets Frank Warmerdam Consultant Thomas E. Burk University.
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
Geographic Information System GIS This project is implemented through the CENTRAL EUROPE Programme co-financed by the ERDF GIS Geographic Inf o rmation.
THREDDS Data Server Ethan Davis GEOSS Climate Workshop 23 September 2011.
Coverages and the DAP2 Data Model James Gallagher.
The OpenGIS Consortium Geog 516 Presentation #2 Rueben Schulz March 2004.
6. Simple Features Specification Background information UML overview Simple features geometry.
The IRI Climate Data Library: translating between data cultures Benno Blumenthal International Research Institute for Climate Prediction Columbia University.
NcML Aggregation vs Feature Collections. NcML functionality 1.Modify the objects found in CDM files – Especially Attributes – Don’t have to rewrite the.
How do we represent the world in a GIS database?
Intro to GIS and ESRI Trainers: Randy Jones, GIS Technician, Douglas County Jon Fiskness, GISP GIS Coordinator, City of Superior.
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
1 International Standards for Data Interoperability GALEON Geo-interface for Air, Environment, Land, Ocean NetCDF Ben Domenico Unidata Program Center*
IRI Data Library: enhancing accessibility of climate knowledge M. Benno Blumenthal, Michael Bell, John del Corral, Rémi Cousin, and Igor Khomyakov.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
DAP4 James Gallagher & Ethan Davis OPeNDAP and Unidata.
Spatial Information Retrieval. Spatial Data Mining + Knowledge Discovery Used for mining data in spatial databases with huge amounts of data Spatial data.
M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society OpenDAP 2007
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
Server-side Analysis and a Semantic Framework for Metadata M. Benno Blumenthal International Research Institute for Climate and Society Columbia University.
Intro to GIS & Pictometry Trainers: Randy Jones, GIS Technician, Douglas County Jon Fiskness, GISP GIS Coordinator, City of Superior.
Benjamin Post Cole Kelleher.  Availability  Data must maintain a specified level of availability to the users  Performance  Database requests must.
Slide 1 SDTSSDTS FGDC CWG SDTS Revision Project ANSI INCITS L1 Project to Update SDTS FGDC CWG September 2, 2003.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Proximity Spider Project by Ganesh Naikare Project Advisor: Professor Scott Spetka.
Towards Unifying Vector and Raster Data Models for Hybrid Spatial Regions Philip Dougherty.
Data Stewardship at the NOAA Data Centers Sub Topic - Value Added Products ESIP Federation Meeting, Washington, DC January 6-8, 2009.
IRI/LDEO Climate Data Library M.Benno Blumenthal, Michael Bell, John del Corral, Remi Cousin, and Haibo Liu International Research Institute for Climate.
Semantic Web underpinnings of the IRI Data Library Semantic Web as a Framework for Multiple Metadata IRI Data Library: presenting Data in multiple frameworks.
What is GIS? “A powerful set of tools for collecting, storing, retrieving, transforming and displaying spatial data”
M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society Using a Resource.
Rich Signell Roland Viger Curtis Price USGS Community for Data Integration Feb 15, 2012.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Uploading Data Matthew Hanson  GeoNode made up of several components  Web Framework – Django  OGC Server – GeoServer  Database – PostGIS.
IRI Data Library Faceted Search: an example of RDF-based faceted search for climate data Drawing on multiple ontologies to build an application Using inference.
M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society Use of RDF/OWL.
Using the Semantic Web M. Benno Blumenthal International Research Institute for Climate and Society Columbia University 31 July 2012 CU Metadata Group.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
An Introduction to the Semantic Web M. Benno Blumenthal International Research Institute for Climate and Society Columbia University 2 November 2011.
IRI/LDEO Climate Data Library M.Benno Blumenthal, Michael Bell, and John del Corral International Research Institute for Climate and Society Columbia University.
IRI Data Library Overview
The IRI was founded approximately 12 years ago with a mission To enhance society’s capability to understand, anticipate and manage the impacts of climate.
Transport and Access of Data, Metadata, and Semantics using RDF
Geographic Information Systems
IRI/LDEO Climate Data Library
IRI/LDEO Climate Data Library
IRI/LDEO Climate Data Library
IRI/LDEO Climate Data Library
IRI/LDEO Climate Data Library
2. An overview of SDMX (What is SDMX? Part I)
Data Standards at the IRI Data Library
RDF Standard Data Model Exchange
IRI Data Library Faceted Search: an example of
IN32A-05 The IRI/LDEO Climate Data Library: Helping People use Climate Data M.Benno Blumenthal, Emily Grover-Kopec, Michael Bell, and John del Corral.
IRI/LDEO Climate Data Library
IRI Data Library efforts focus on making climate and other data products more widely accessible through tool development, data organization and transformation,
CORBA and COM TIP Two practical techniques for object composition
Presentation transcript:

Data Interoperability at the IRI: translating between data cultures Benno Blumenthal International Research Institute for Climate Prediction Columbia University

Work presented has been done by John del Corral Michael Bell Emily Grover-Kopec

Outline Data Cultures Needed Functionality for usefully merging Data model changes Functionality changes Example Dataset Connections Mapping SimpleFeatures to OpenDAP

IRI Data Collection Dataset Variable ivar multidimensional Economics Public Health “geolocated by entity” GIS “geolocation by vector object or projection metadata” Ocean/Atm “geolocated by lat/lon” multidimensional spectral harmonics equal-area grids GRIB grid codes climate divisions

IRI Data Collection Dataset Variable ivar Servers OpenDAP THREDDS GRIB netCDF images binary Database Tables queries spreadsheets shapefiles images w/proj IDA TIFF/geoTIFF BIL

IRI Data Collection Dataset Variable ivar Calculations “virtual variables” User Interface viewer manipulations calculations OpenGIS WMS v1.3 Data Files netcdf binary images Clients OpenDAP THREDDS Tables Servers OpenDAP THREDDS GRIB netCDF images binary Database Tables queries spreadsheets shapefiles images w/proj IDA TIFF/geoTIFF BIL

Functionality for new Users want to use our current data holdings aggregation (particularly of images) time analysis of GIS data translation to “entity” basis

Data Model Changes projection attributes –SpatialReferenceSystemWKT (An OpenGIS standard) –SpatialReferenceSystemDims geometry data object –OpenGIS Simple Feature

Functionality Changes geometry display: fill, fillby, stroke, mark,... projection functionality: objects with differing projection attributes can be made to match rasterization: geometry object converted to raster –computes the fraction of each gridbox that is covered by the given geometry object

Example NDVI[x y time] Albers projection and location [d_name] Lon/lat geometry produces NDVIg[d_name time]

weighted-average

Dataset connections Metadata standards are useful because they let us make meaningful connections between datasets.

Connections we might want NDVI(district,time) Difficult to document dataset – currently at IRI user has to follow a series of links to find out about districts and related information Would be nice if User searches Software searches, e.g. data viewer could display district information, locator map … Lists of related information for user’s next step

Two kinds of Dataset Connections 1.Standard variables 2.Standard units 3.Definitional connections (e.g. same dataset or table) 4.Computational connections (data connected by construction) Latter are local but solid; former are standard and possibly hazy

Metadata Interoperability Attributes (all agree) Basic XML (all agree) XML with name spaces Given a dataset following one metadata standard, how do we make connections to datasets following a different standard?

Conceptual Model Standard concepts and standard relationships Delayed mapping between standards means the information loss in impedance mismatches does not propagate

Semantic Web XML (with namespaces) readable files with tagged data XML Schema structure of a tagged file RDF standard for expressing relations RDF Schema standard for expressing relations between classes OWL logical derivations of relations – our relations are different, e.g. projection transformation to connect data in different projections

Sample Geometry as OpenDAP Example: climate division outlines associated with climate division data are conceptually Location[IDIV] Where location is a geometry (multipolygon) This becomes nested sequences in OpenDAP dds das with some attributes

Simple Features and OpenDAP v2.0 structure Point structure {float lat;float lon;} point; LineString sequence {float lat;float lon;}LineString; MultiPoint sequence {float lat;float lon;} MultiPoint; Polygon sequence {sequence {float lat;float lon;} aring;} Polygon; MultiLineString sequence {sequence {float lat;float lon;} LineString; }MultiLineString; MultiPolygon sequence {sequence {sequence {float lat;float lon; } ring;} Polygon;} MultiPolygon; Geometry Collection sequence {string OpenGISSimpleFeature; sequence{...}geom }GeometryCollection; but how to handle a collection of a collecton???

Simple Features and OpenDAP v2.0 attributes SpatialReferenceSystemWKT Projection information OpenGISSimpleFeature Point, LineString, Polygon, MultiPoint, MultiLineString,MultiPolygon,GeometryCollection Or Dimensionality 0 (point) 1 (line) 2 (polygon)

Sample Attributes NDVI [ x y time] SpatialReferenceSystemWKT PROJCS["Albers_Equal_Area_Conic",GEOGCS["GCS_North_Am erican_1927",DATUM["D_North_American_1927",SPHEROID["C larke_1866", , ]],PRIMEM["Greenwich",0],U NIT["Degree", ]],PROJECTION["Albers"],P ARAMETER["False_Easting",0],PARAMETER["False_Northing", 0],PARAMETER["Central_Meridian",20],PARAMETER["Standard _Parallel_1",21],PARAMETER["Standard_Parallel_2",- 19],PARAMETER["Latitude_Of_Origin",1],UNIT["Meter",1]] SpatialReferenceSystemDims x y

Lessons Learned Must be able to specify different standards for different parts of a dataset – country codes are an iso standard, for example, but are unlikely to ever be part of CF or WCS Must be able to transmit everything, not just anointed datasets Bringing together different concepts in different data cultures (multidimensionality, location as object) improves both sides of the divide.