Download presentation
Presentation is loading. Please wait.
Published byLillian Drusilla Dorsey Modified over 9 years ago
1
Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of Universities for the Advancement of Hydrologic Sciences, Inc.; HIS = Hydrologic Information System NSF-supported Collaborative Project: UT Austin + SDSC + Drexel + Duke +Utah State www.cuahsi.org/his/
2
The Grid is becoming the backbone for collaborative science and data sharing CI is about RE-USING data and research resources !!
3
Cyberinfrastructure for hydrology (in the U.S.) Hydrologic observations: Reliance on federally-organized data collection (NWIS, STORET, NCDC, etc.) with huge and complex nomenclatures simplifying access to federal repositories relatively lower emphasis on data ownership Handling time in both UTC and local Various spatial offsets Multiple data types: time series, fields, spatial data Integrative discipline: Interoperation with atmospheric, ocean, soils, geomorphology, social datasets and services… Community: Organized by “natural boundaries” networks of relatively autonomous self-managed data nodes Partnership with public sector water management 96% use Windows for research; Excel, ArcGIS, Matlab – most popular Mix of standards, software licensing models, vocabularies; leveraging tools developed in other CI projects.
4
WaterOneFlow Web Services Data accessthrough web services Data storage through web services Downloads Uploads Observatory servers Workgroup HIS SDSC HIS servers 3 rd party servers e.g. USGS, NCDC GIS Matlab IDL Splus, R D2K, I2K Programming (Fortran, C, VB) Web services interface DASH: Data Access System for Hydrology Information input, display, query and output services Preliminary data exploration and discovery. See what is available and perform exploratory analyses HTML -XML WSDL - SOAP Hydrologic Information System Service Oriented Architecture
5
Super computer Centers: NCSA, TACC Domain Sciences: Unidata, NCAR LTER, GEON Government: USGS, EPA, NCDC, USDA Industry: ESRI, Kisters, OpenMI HIS Team WATERS Testbed WATERS Network Information System CUAHSI HIS The CUAHSI Community, HIS and WATERS CUAHSI: 116 Universities (Nov. 2006) HIS Team: Texas, SDSC, Utah, Drexel, Duke
6
CUAHSI HIS as a mediator across multiple agency and PI data Keeps identifiers for sites, variables, etc. across observation networks Manages and publishes controlled vocabularies, and provides vocabulary/ontology management and update tools Provides common structural definitions for data interchange Provides a sample protocol implementation Governance framework: a consortium of universities, MOUs with federal agencies, collaboration with key commercial partners, led by renowned hydrologists, and NSF support for core development and test beds
7
Main Components Hydrologic Observations Data Model, ODM databases and site catalogs Web services for accessing hydrologic repositories and data in ODMs Clients: Online Data Access System + multiple desktop application add-ons Network of CUAHSI HIS servers, deployed at hydrologic observatories and integrated with other observing systems and sensor data collection NWIS ArcGIS Excel NCAR Unidata NASA Storet NCDC Ameriflux Matlab AccessSAS Fortran Visual Basic C/C++ CUAHSI Web Services Remote CUAHSI HIS Node (Windows) Data IIS Web Server ASP.Net SQL Server ArcGIS Technologies HDAS HODM Web Service Web Services Web Service proxies Remote CUAHSI HIS Node (Windows) Data IIS Web Server ASP.Net SQL Server ArcGIS Technologies HDAS HODM Web Service Web Services Web Service proxies Remote CUAHSI HIS Node (Windows) Data IIS Web Server ASP.Net SQL Server ArcGIS Technologies HDAS HODM Web Service Web Services Web Service proxies Remote CUAHSI HIS Node (Windows) Data IIS Web Server ASP.Net SQL Server ArcGIS Technologies HDAS HODM Web Service Web Services Web Service proxies
8
Point Observations Information Model A data source operates an observation network A network is a set of observation sites A site is a point location where one or more variables are measured A variable is a property describing the flow or quality of water An observation series is an array of observations at a given site, for a given variable, with start time and end time A value is an observation of a variable at a particular time A qualifier is a symbol that provides additional information about the value Data Source Network Sites Observation Series Values {Value, Time, Qualifier} USGS Streamflow gages Neuse River near Clayton, NC Discharge, stage, start, end (Daily or instantaneous) 206 cfs, 13 August 2006 Return network information, and variable information within the network Return site information, including a series catalog of variables measured at a site with their periods of record Return time series of values
9
Challenges… (1/2) Sites STORET has stations, and measurement points, at various offsets… Site metadata lacking and inconsistent (e.g. 2/3 no HUC info, 1/3 no state/county info); agency site files need to be upgraded to ODM… A groundwater site is different than a stream gauge… Censored values Values have qualifiers, such as “less than”, “censored”, etc. – per value. Sometimes mixed data types.. Units There are multiple renditions of the same units, even within one repository There may be several units for the same parameter code (STORET) If no value recorded – there are no units?? Unit multipliers E.g. NCDC ASOS keeps measurements as integers, and provides a multiplier for each variable Sources STORET requires organization IDs (which collected data for STORET) in addition to site IDs Time stamps: ISO 8601 A service to determine UTC offsets given lat/lon and date??
10
Challenges… (2/2) Values retrieval USGS: by site, variable, time range EPA: by organization-site, variable, medium, units, time range NCDC: fewer variables, period of record applies to site, not to seriesCatalog Variable semantics Variable names and measurement methods don’t match E.g. NWIS parameter # 625 is labeled ‘ammonia + organic nitrogen‘, Kjeldahl method is used for determination but not mentioned in parameter description. In STORET this parameter is referred to as Kjeldahl Nitrogen. One-to-one mapping not always possible E.g. NWIS: ‘bed sediment’ and ‘suspended sediment’ medium types vs. STORET’s ‘sediment’. Ontology tagging, semantic mediation
11
- From different database structures, data collection procedures, quality control, access mechanisms to uniform signatures … Water Markup Language - Tested in different environments - Standards-based - Can support advanced interfaces via harvested catalogs - Accessible to community - Templates for development of new services - Optimized, error handling, memory management, versioning, run from fast servers - Working with agencies on setting up services and updating site files NWIS Daily Values (discharge), NWIS Ground Water, NWIS Unit Values (real time), NWIS Instantaneous Irregular Data, EPA STORET, NCDC ASOS, DAYMET, MODIS, NAM12K, ODM
12
WaterOneFlow API, v. 1.0 GetValues Returns a TimeSeries GetSiteInfo Station Information, including a period of record GetVariableInfo Returns variable/parameter information Also: GetSites, GetVariables Object and string output
13
WaterML design principles Driven largely by hydrologists; the goal is to capture semantics of hydrologic observations discovery and retrieval Relies to a large extent on the information model as in ODM (Observations Data Model), and terms are aligned as much as possible Several community reviews since 2005 Driven by data served by USGS NWIS, EPA STORET, multiple individual PI-collected observations Is no more than an exchange schema for CUAHSI web services The least barrier for adoption by hydrologists A fairly simple and rigid schema tuned to the current implementation Conformance with OGC specs not in the initial scope
14
WaterML key elements Response Types –SiteInfo –Variables –TimeSeries Key Elements –site –sourceInfo –seriesCatalog –variable –timeSeries values –queryInfo GetValues GetVariableInfo GetSiteInfo
15
variables variablesResponse variable 1 many timeSeriesqueryInfo criteria timeSeriesResponse variable sourceInfo queryURL values site queryInfo criteria sitesResponse seriesCatalog siteInfo queryURL variable series variableTimeInterval 1 many Structure of responses
16
SiteInfo response queryInfo site name code location seriesCatalog variables what how many when TimePeriodType
17
TimeSeries response queryInfo location variable values
18
Clients Tested with.Net and Java Desktop clients: Excel, Matlab, ArcGIS, VB.NET, more being written Web client: DASH (Data Access System for Hydrology): http://river.sdsc.edu/DASH (beta) http://river.sdsc.edu/DASH
19
DASH AGS Server IIS Windows 2003 Server 4 GB Ram 1 TB Disk Quad Core CPU SQL Server VS 2005 WaterOneFlow Web Services ArcGIS 9.2 GIS Data Mxd Service ODM LoaderODM tools ODM Current Deployment Architecture Direct DB connection
20
SQL Server ODMs and catalogs. All instances exposed as ODM (i.e. have standard ODM tables or views: Sites, Variables, SeriesCatalog, etc.) NWIS-IID NWIS-DV ASOS STORET TCEQ BearRiver... Spatial store Geodatabase or collection of shapefiles or both NWIS-IID points NWIS-DV points ASOS points STORET points TCEQ points BearRiver points... My new ODM My new points More databases More synced layers DASH Web Application Background layers (can be in the same or separate spatial store) WOF services Web services from a common template NWIS-IID WS NWIS-DV WS ASOS WS STORET WS TCEQ WS BearRiver WS... My new WS More WS from ODM-WS template USGS NCDC EPA TCEQ Web Configuration file Stores information about registered networks MXD Stores information about layers WSDLs, web service URLs Connection strings Layer info, symbology, etc. ODM DataLoader 2 6 5 3 1 4 WORKGROUP HIS SERVER ORGANIZATION STEPS FOR REGISTERING OBSERVATION DATA
21
HIS Scalability Adding… – …data types and datasets; processing models and services; servers; users and roles – – - shall not create unmanageable bottlenecks that require system re- engineering Designing for scalability: –Distilling a generic set of web service signatures; resolving semantic and structural heterogeneities –Using ODM as a common generic format for time series data, for ease of coding and uniform search interfaces –DASH GUI design to abstract specifics of disparate repositories –Leveraging common CI components developed in GEON –Working with agencies to remove web service bottlenecks
22
Near future Deployment at the 11 WATERS test beds, and beyond And documenting experience Organizing HIS support Working with federal and state agencies on web services NCDC, USGS, EPA, state agencies (e.g. TCEQ) Analysis services for site catalogs and ODMs ( ---- see next slide) OGC connections: WaterML is OGC Discussion Paper (approved at April 2007 TC Meeting) Need to be reviewed further, based on initial implementation Internationalization (with CSIRO WRON, European WISE, H2OML) Carry CUAHSI WaterML messages over O&M, as O&M profile Towards WaterML and web services 1.1
23
US Map of USGS Observations Antarctica Puerto Rico Hawaii Alaska
24
US Map of USGS Observations – by Mean Period of Record
25
Different types of nutrients by decade: Available Data Total
26
Some physical properties by decade: Available Data Total
27
Same without discharge, gage height, temperature and precipitation (the four most common, in that order): Available Data Total
28
Near future Deployment at the 11 WATERS test beds, and beyond And documenting experience Organizing HIS support Working with federal and state agencies on web services NCDC, USGS, EPA, state agencies (e.g. TCEQ) Analysis services for site catalogs and ODMs ( ---- see next slide) OGC connections: WaterML is OGC Discussion Paper (approved at April 2007 TC Meeting) Need to be reviewed further, based on initial implementation Internationalization (with CSIRO WRON, European WISE, H2OML) Carry CUAHSI WaterML messages over O&M, as O&M profile Towards WaterML and web services 1.1
29
SDSC Spatial Information Systems Lab Research and system development Services-based spatial information integration infrastructure Mediation services for spatial data, query processing, map assembly services Long-term spatial data preservation Spatial data standards and technologies for online mapping (SVG, WMS/WFS) Support of spatial data projects at SDSC and beyond services In Geosciences (GEON, CUAHSI, CBEO,…) In regional development (NIEHS SBRP, Katrina) In Neurosciences (BIRN, CCDB) http://scirad.sdsc.edu/datatech/si.html Contact: zaslavsk@sdsc.edu
30
Links and Acknowledgments The CUAHSI HIS project: http://www.cuahsi.org/his/ (main site)http://www.cuahsi.org/his/ http://water.sdsc.edu (central development server)http://water.sdsc.edu Many thanks to Microsoft Research for partly sponsoring this trip
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.