WEB SERVICES FOR UNIFIED ACCESS TO NATIONAL HYDROLOGIC DATA REPOSITORIES AND REAL TIME OBSERVATION DATA: CUAHSI HIS EXPERIENCE Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of Universities for the Advancement of Hydrologic Sciences, Inc.; HIS = Hydrologic Information System Collaborative Project: UT Austin + SDSC + Drexel + Duke +Utah State
SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D SDSC Spatial Information Systems Lab Research and system development Services-based spatial information integration infrastructure Mediation services for spatial data, query processing, map assembly services Long-term spatial data preservation Spatial data standards and technologies for online mapping (SVG, WMS/WFS) Support of spatial data projects at SDSC and beyond services In Geosciences (GEON, CUAHSI, CBEO,…) In regional development (NIEHS SBRP, Katrina) In Neurosciences (BIRN, CCDB)
SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D The Grid is becoming the backbone for collaborative science and data sharing CI is about RE-USING data and research resources !!
SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D Cyberinfrastructure for hydrology Hydrologic observations: Reliance on federally-organized data collection (NWIS, STORET, NCDC, etc.) with huge and complex nomenclatures simplifying access to federal repositories relatively lower emphasis on data ownership Handling time in both UTC and local Various spatial offsets Multiple data types: time series, fields, spatial data Integrative discipline: Interoperation with atmospheric, ocean, soils, geomorphology, social datasets and services… Community: Organized by “natural boundaries” networks of relatively autonomous self-managed data nodes Partnership with public sector water management 96% use Windows for research; Excel, ArcGIS, Matlab – most popular
Super computer Centers: NCSA, TACC Domain Sciences: Unidata, NCAR LTER, GEON Government: USGS, EPA, NCDC, USDA Industry: ESRI, Kisters, OpenMI HIS Team WATERS Testbed WATERS Network Information System CUAHSI HIS The CUAHSI Community, HIS and WATERS CUAHSI: 116 Universities (Nov. 2006) HIS Team: Texas, SDSC, Utah, Drexel, Duke
WaterOneFlow Web Services Data accessthrough web services Data storage through web services Downloads Uploads Observatory servers Workgroup HIS SDSC HIS servers 3 rd party servers e.g. USGS, NCDC GIS Matlab IDL Splus, R D2K, I2K Programming (Fortran, C, VB) Web services interface Web portal Interface (HDAS) Information input, display, query and output services Preliminary data exploration and discovery. See what is available and perform exploratory analyses HTML -XML WSDL - SOAP Hydrologic Information System Service Oriented Architecture
Main Components Web services for accessing hydrologic repositories Hydrologic Observations Data Model Hydrologic Data Access System + Time Series Viewer + desktop clients Collection of CUAHSI nodes NWIS ArcGIS Excel NCAR Unidata NASA Storet NCDC Ameriflux Matlab AccessSAS Fortran Visual Basic C/C++ CUAHSI Web Services
Point Observations Information Model Data Source Network Sites Observation Series Values {Value, Time, Qualifier} USGS Streamflow gages Neuse River near Clayton, NC Discharge, stage, start, end (Daily or instantaneous) 206 cfs, 13 August 2006 A data source operates an observation network A network is a set of observation sites A site is a point location where one or more variables are measured A variable is a property describing the flow or quality of water An observation series is an array of observations at a given site, for a given variable, with start time and end time A value is an observation of a variable at a particular time A qualifier is a symbol that provides additional information about the value
Observations Data Model Schema (version 4.0) Data Source and Network SitesVariablesValuesMetadata Depth of snow pack Streamflow Landuse, Vegetation Windspeed, Precipitation Controlled Vocabulary Tables e.g. mg/kg, cfs e.g. depth e.g. Non-detect,Estimated, A site is a point location where one or more variables are measured A data source operates an observation network A network is a set of observation sites Metadata provide information about the context of the observation. A variable is a property describing the flow or quality of water A value is an observation of a variable at a particular time From Ernest To, David Maidment, CRWR
Water Data Web Sites
NWISWeb site output # agency_cd Agency Code # site_no USGS station number # dv_dt date of daily mean streamflow # dv_va daily mean streamflow value, in cubic-feet per-second # dv_cd daily mean streamflow value qualification code # # Sites in this file include: # USGS NEUSE RIVER NEAR CLAYTON, NC # agency_cdsite_nodv_dtdv_vadv_cd USGS USGS USGS USGS USGS USGS USGS USGS USGS USGS USGS Time series of streamflow at a gaging station
SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D Challenges… (1/2) Sites STORET has stations, and measurement points, at various offsets… Site metadata lacking and inconsistent (e.g. 2/3 no HUC info, 1/3 no state/county info); agency site files need to be upgraded to ODM… A groundwater site is different than a stream gauge… Censored values Values have qualifiers, such as “less than”, “censored”, etc. – per value. Sometimes mixed data types.. Units There are multiple renditions of the same units, even within one repository There may be several units for the same parameter code (STORET) If no value recorded – there are no units?? Unit multipliers E.g. NCDC ASOS keeps measurements as integers, and provides a multiplier for each variable Sources STORET requires organization IDs (which collected data for STORET) in addition to site IDs Time stamps: ISO 8601 Data types problem (conversion to PST???) A service to determine UTC offsets given lat/lon and date??
SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D Challenges… (2/2) Values retrieval USGS: by site, variable, time range EPA: by organization-site, variable, medium, units, time range NCDC: fewer variables, period of record applies to site, not to seriesCatalog Variable semantics Variable names and measurement methods don’t match E.g. NWIS parameter # 625 is labeled ‘ammonia + organic nitrogen‘, Kjeldahl method is used for determination but not mentioned in parameter description. In STORET this parameter is referred to as Kjeldahl Nitrogen. One-to-one mapping not always possible E.g. NWIS: ‘bed sediment’ and ‘suspended sediment’ medium types vs. STORET’s ‘sediment’. Ontology tagging, semantic mediation
- From different database structures, data collection procedures, quality control, access mechanisms to uniform signatures … Water Markup Language - Tested in different environments - Standards-based - Can support advanced interfaces via harvested catalogs - Accessible to community - Templates for development of new services - Optimized, error handling, memory management, versioning, run from fast servers And: working with agencies on setting up services!
WaterOneFlow API GetValues –Returns a TimeSeries GetSiteInfo –Station Information, including a period of record GetVariableInfo –Returns variable/parameter information -- developed to have a low barrier to entry -- terminology same as Observations Database -- reuse of common elements
GetVariableInfo Input –Vocabulary:VariableCode Output –VariableResponse Discharge, cubic feet per second cubic feet per second
- BIG ROCK C NR VALYERMO CA format=YYYY-MM-DD&begin_date= &site_no= < Discharge, cubic feet per second cubic feet per second T00:00: T00:00:00 GetSiteInfo
GetValues NWIS, STORET, etc. –Location: NWIS: –Variable: NWIS:00060 –Time Range: to MODIS, etc. –Location: GEOM:BOX( ,180 90) –Variable: MODIS:11/plotarea=landocean –Time Range: to
timeSeriesResponse xmlns:xsi=" xmlns:xsd=" xmlns=" NWIS: nwis: BIG ROCK C NR VALYERMO CA Discharge, cubic feet per second cubic feet per second
HIS nodes: cross-platform design Central CUAHSI HIS Node (Windows) GEON Data Node (Linux) Data Apache Tomcat IIS Web Server ASP.Net Geon Software Stack SQL Server Proxy ArcGIS Technologies HDAS HODM Web Service Web Services Web Service proxies Data Remote CUAHSI HIS Node (Windows) Data IIS Web Server ASP.Net SQL Server ArcGIS Technologies HDAS HODM Web Service Web Services Web Service proxies Remote CUAHSI HIS Node (Windows) Data IIS Web Server ASP.Net SQL Server ArcGIS Technologies HDAS HODM Web Service Web Services Web Service proxies Remote CUAHSI HIS Node (Windows) Data IIS Web Server ASP.Net SQL Server ArcGIS Technologies HDAS HODM Web Service Web Services Web Service proxies Remote CUAHSI HIS Node (Windows) Data IIS Web Server ASP.Net SQL Server ArcGIS Technologies HDAS HODM Web Service Web Services Web Service proxies Remote CUAHSI HIS Nodes (Windows) Application Services, handling of spatial data types, etc Security management, distributed data management, integration with other CI projects
SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D Resource registration Shapefiles TIFF images, GMT rasters Web Services, WMS services Relational databases, ASCII PDFs, URLs “CUAHSI data” NetCDF Coming: Geodatabases and ODM
SAN DIEGO SUPERCOMPUTER CENTER, UCSD SciR&D Possible Connections Review of ODM Dealing with observations/measurements rather than with sensor data? Review of WaterOneFlow services schema Aligning WaterOneFlow output schemas with GML/SensorML Carrying WaterOneFlow requests/responses over WFS Long term preservation of observation data Water Data Interoperability Testbed?
Survey of Observing Systems NEON: ORION: WATERS CUASHI: CLEANER: GLEON: CREON: MoveBank: Civil Infrastructure: IRIS/USArray: