Server-side Analysis and a Semantic Framework for Metadata M. Benno Blumenthal International Research Institute for Climate and Society Columbia University
Data Analysis as a Service The Data Library's open data model and ability to create networks of virtual web pages and other web resources leads to some powerful applications As datasets become more complicated and difficult to handle, systems that hide that complexity and facilitate analysis become more essential Metadata and its transforms are essential Archived but accessible data are ever more important
Complexity pervades (*)
Overview IRI Data Collection Generalized Data Tools Specialized Data Tools Dataset Variable ivar multidimensional Data ViewerData Language Maproom URL/URI for data, calculations, figs, etc
IRI Data Collection Dataset Variable ivar multidimensional Economics Public Health “geolocated by entity” GIS “geolocation by vector object or projection metadata” Ocean/Atm “geolocated by lat/lon” multidimensional spectral harmonics equal-area grids GRIB grid codes climate divisions IRI Data Collection Data by geolocation type
IRI Data Collection Dataset Variable ivar Servers OpenDAP THREDDS GRIB netCDF images binary Database Tables queries spreadsheetsshapefiles images w/proj IRI Data Collection Data by format
IRI Data Collection Dataset Variable ivar Calculations “virtual variables” images graphics descriptive and navigational pages OpenGIS WMS/WCS KML Data Files netcdf binary images Clients OpenDAP THREDDS Tables Servers OpenDAP THREDDS GRIB netCDF images binary Database Tables queries spreadsheetsshapefiles images w/proj IRI Data Collection Data as services
IRI General Data Tools Data page
IRI General Data Tools Data viewer
Calculations: svd (link: svdview) (link: svd results dataset) (link: svd documentation) IRI General Data Tools
svd program
Calculations: Cluster Analysis (link: cluster view) (link: cluster results dataset)(link: k-means fn) IRI General Data Tools
WMS and KML: land cover (link:figure page) IRI General Data Tools
WMS and KML: precipitation (link: figure page) IRI General Data Tools
IRI Map Room Maproom Animation
Malaria Early Warning System Front page illustrates most recent dekadal rainfall estimates (FEWS RFE) Change dates to view different time periods Administrative and epidemiological overlays available Click and drag box across map to zoom IRI Map Room
STEP 1: Select size of domain for analysis STEP 2: Select location for which analysis will be created Administrative District OR Box – 11km, 33km, 55km, 111km MEWS Time Series Analyses IRI Map Room
MEWS Time Series Analyses IRI Map Room
MEWS tool transparently interrelates the three geospatial models Dekadal precipitation (longitude, latitude, time) District outlines Time series for districts (generated on-the-fly) from first two
Data Flow based Analysis with explicit semantics Results data analysis Data analysis Semantic Web
Faceted Search (link)
Models, Crosswalks, and Objects in a single RDF/OWL framework s/
Standard metadata schema Tools Users Datasets Standard Metadata Schema RDF Tools Users Datasets Standard Metadata Schema RDF Tools Users Datasets Standard Metadata Schem RDF RDF Data Model Exchange RDF Tools Users Datasets Standard Metadata Schema RDF Tools Users Datasets Standard Metadata Schema RDF
Data Servers Ontologies MMI JPL Standards Organizations Start Point RDF/XML-Schema Crawler XSLT/GRDDL ingest XML Schema to OWL translation Owl Semantics SWRL Rules SeRQL CONSTRUCT Search Queries Location Canonicalizer Time Canonicalizer Sesame Search Interface bibliography IRI RDF Architecture
Semantic Crosswalk for metadata translation
Semantic metadata translation: maproom to GCMD DIF
Sample GCMD DIF-CD Record
OpenDAP CF to WCS Service
Function Documentation (*)
Function Semantics Used to generate function documentation Basis for more extensive function semantics Eventually would like to use to generate workflows Currently working with SSWAP to insure that it can describe these workflow steps: e.g. variable to transformed variable and variable to figure to image file.
SSWAP Simple Semantic Web Architecture and Protocol A way of providing a service that semantically describes its domain and range to advertise it. To invoke it, both domain and range are restricted. Traditionally we specify of chain of processing steps, and provenance documents that effort. SSWAP specifies an object by constraining it – you could specify its provenance to get it “traditionally”, or some other quality.
Multiplicity of Data Representations RDF provides a unifying framework to simultaneous hold and deliver dataset metadata according to multiple standards Models, Crosswalks, and Objects organizes that framework clarifying the semantic distance spanned bidirectional XML Schema to OWL translation enables delivery of inferred metadata to existing XML-based systems Persistence with inference/transform is the underlying technology Semantic Service Framework could extend this framework to semantically-informed workflow generation
21 st Century data analysis Definitive web-accessible data archives Cloud data analysis services based on those archives Semantic descriptions of datasets Semantic descriptions of analysis steps Semantic assembly of workflow pipelines Science is about reproducibility, as are virtual dataset services. This means access to the data, access to the analysis methods, and commitment to archives.
Other Maproom Examples
I. Food Security: Application At the request of the UN FAO, a web-based tool was created to support Desert Locust management and control Eliminates NDVI-based error for identification of locust habitat Adds daily and 10-day CMORPH rainfall estimates for identification of potential breeding areas Michael Bell, Benno Blumenthal
MODIS images: composite and NDVI are now available through IRI Health Maproom Ministry of Health in Eritrea follows NDVI indices on regular basis and provides warnings to the sub-districts I. Human Health: Application Michael Bell, Benno Blumenthal, John del Corral, Emily Grover-Kopec
Fire Management Presentation of the tool to CARE and Ministry of Environment (Indonesia). Improvement and publications are in progress Michael Bell, Benno Blumenthal, Joshua Qian, Andy Robertson, Michael Tippett