Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sharing and publishing data using CUAHSI HIS Outline HIS data publication system WaterML and WaterOneFlow web services Observations data model (ODM) Data.

Similar presentations


Presentation on theme: "Sharing and publishing data using CUAHSI HIS Outline HIS data publication system WaterML and WaterOneFlow web services Observations data model (ODM) Data."— Presentation transcript:

1 Sharing and publishing data using CUAHSI HIS Outline HIS data publication system WaterML and WaterOneFlow web services Observations data model (ODM) Data loading Data editing and quality control Controlled vocabularies HIS central registration and tagging

2 Base Station Computer(s) Telemetry Network Sensors Query, Visualize, and Edit data using ODM Tools Excel Text ODM Database ODM Data Loader Streaming Data Loader GetSites GetSiteInfo GetVariableInfo GetValues WaterOneFlow Web Service WaterML Discovery Hydroseek Access Analysis GIS Matlab Splus R IDL Java C++ VB Water Metadata Catalog Harvester Service RegistryHydrotagger HIS Central HydroExcel HydroGet HydroLink HydroObjects ODM Contribute your ODM HIS Data Publication System

3 Steps in publishing data 1.Establish an HIS Server 2.Load observations into an ODM database 3.Provide access to data through web services (http:// / /cuahsi_1_0.asmx?WSDL)http:// / /cuahsi_1_0.asmx?WSDL 4.Index the resulting water data service at HIS Central (http://hiscentral.cuahsi.org)http://hiscentral.cuahsi.org

4 Establishing an HIS Server Windows server platform Base Software: Microsoft SQL and ArcGIS Server HIS Server applications –WaterOneFlow web services –ODM + tools –DASH HIS Data http://his.cuahsi.org/hisserver.html

5 Load Observations into an ODM Database Soil moisture data Streamflow Flux tower data Groundwater levels Water Quality Precipitation & Climate ODM

6 Outline HIS data publication system WaterML and WaterOneFlow web services Observations data model (ODM) Data loading Data editing and quality control Controlled vocabularies HIS central registration and tagging

7 WaterML and WaterOneFlow Locations Variables Time GetSiteInfo GetVariableInfo GetValues WaterOneFlow Web Service Client TCEQ UT USGS Data Repositories Data EXTRACT TRANSFORM LOAD WaterML WaterML is an XML language for communicating water data WaterOneFlow is a set of web services based on WaterML Slide from David Valentine

8 Web Services Library Web Application: Data Portal Your application Excel, ArcGIS, Matlab Fortran, C/C++, Visual Basic Hydrologic model ……………. Your operating system Windows, Unix, Linux, Mac Internet Simple Object Access Protocol WaterOneFlow Web Services Slide from David Valentine

9 WaterOneFlow Set of query functions Returns data in WaterML NWIS Daily Values (discharge), NWIS Ground Water, NWIS Unit Values (real time), NWIS Instantaneous Irregular Data, EPA STORET, NCDC ASOS, DAYMET, MODIS, NAM12K, USGS SNOTEL, ODM (multiple sites) Slide from David Valentine

10 WaterML design principles Goal - capture semantics of hydrologic observations discovery and retrieval Role - exchange schema for CUAHSI web services Driven by –Hydrologists (community review) –ODM –USGS NWIS, EPA STORET, Academic Sources Conformance with Open Geospatial Consortium standards. http://www.opengeospatial.org/ http://www.opengeospatial.org/ For XSD pros, the WaterML schema is at http://his.cuahsi.org/wofws.html http://his.cuahsi.org/wofws.html Slide from David Valentine

11 Data Source Network Sites Variables Values {Value, Time, Qualifier, Offset} Utah State University Little Bear River Little Bear River at Mendon Rd Dissolved Oxygen 9.78 mg/L, 1 October 2007, 6PM A data source operates and provides data to an observation network A network is a set of observation sites (stored in a single ODM instance) A site is a point location where one or more variables are measured A variable is a measured property (e.g. describing the flow or quality of water) A value is an observation of a variable at a particular time A qualifier is a symbol that provides additional information about the value An offset allows specification of measurements at various depths in water GetSites GetSiteInfo GetVariableInfo GetValues Point Observations Information Model

12 - Sites - Variables - TimeSeries Building Blocks of WaterML Responses Response TypesKey Elements –site –sourceInfo –seriesCatalog –variable –value –queryInfo GetValues GetVariableInfo GetSiteInfo GetSites Slide from David Valentine

13 Sites response queryInfo site name code location seriesCatalog variables Series how many when TimePeriodType Slide from David Valentine

14 VariablesResponseType variable – same as in series element Code, name, units Sites Variables Values Slide from David Valentine

15 GetValues response - timeSeries queryInfo timeSeries –sourceInfo – “where” –variable – “what” –values Sites Variables Values Slide from David Valentine

16 Values Each time series value recorded in value element Timestamp, plus metadata for the value, recorded in element’s attributes ISO Timevalue qualifier Slide from David Valentine

17 Outline HIS data publication system WaterML and WaterOneFlow web services Observations data model (ODM) Data loading Data editing and quality control Controlled vocabularies HIS central registration and tagging

18 Why an Observations Data Model Syntactic heterogeneity (File types and formats) Semantic heterogeneity –Language for observation attributes (structural) –Language to encode observation attribute values (contextual) Publishing and sharing research data Metadata to facilitate unambiguous interpretation Enhance analysis capability

19 Scope Focus on Hydrologic Observations made at a point Exclude Remote sensing or grid data. These are part of a digital watershed but not suitable for an atomic database model and individual value queries Primarily store raw observations and simple derived information to get data into its most usable form. Limit inclusion of extensively synthesized information and model outputs at this stage.

20 What are the basic attributes to be associated with each single data value and how can these best be organized? Value DateTime Variable Location Units Interval (support) Accuracy Offset OffsetType/ Reference Point Source/Organization Censoring Data Qualifying Comments Method Quality Control Level Sample Medium Value Type Data Type

21 CUAHSI Observations Data Model Streamflow Flux tower data Precipitation & Climate Groundwater levels Water Quality Soil moisture data A relational database at the single observation level (atomic model) Stores observation data made at points Metadata for unambiguous interpretation Traceable heritage from raw measurements to usable information Standard format for data sharing Cross dimension retrieval and analysis Space, S Time, T Variables, V s t ViVi v i (s,t) “Where” “What” “When” A data value

22 CUAHSI Observations Data Model http://www.cuahsi.org/his/odm.html

23 Site Attributes SiteCode, e.g. NWIS:10109000 SiteName, e.g. Logan River Near Logan, UT Latitude, Longitude Geographic coordinates of site LatLongDatum Spatial reference system of latitude and longitude Elevation_m Elevation of the site VerticalDatum Datum of the site elevation Local X, Local Y Local coordinates of site LocalProjection Spatial reference system of local coordinates PosAccuracy_m Positional Accuracy State, e.g. Utah County, e.g. Cache

24 1 1 CouplingTable SiteID HydroID Sites SiteID SiteCode SiteName Latitude Longitude … Observations Data Model 1 1 OR Independent of, but can be coupled to Geographic Representation ODM Arc Hydro

25 Variable attributes VariableName, e.g. discharge VariableCode, e.g. NWIS:0060 SampleMedium, e.g. water ValueType, e.g. field observation, laboratory sample IsRegular, e.g. Yes for regular or No for intermittent TimeSupport (averaging interval for observation) DataType, e.g. Continuous, Instantaneous, Categorical GeneralCategory, e.g. Climate, Water Quality NoDataValue, e.g. -9999 m 3 /s Flow Cubic meters per second

26 Scale issues in the interpretation of data The scale triplet From: Blöschl, G., (1996), Scale and Scaling in Hydrology, Habilitationsschrift, Weiner Mitteilungen Wasser Abwasser Gewasser, Wien, 346 p. a) Extentb) Spacing c) Support length or time quantity length or time quantity length or time quantity

27 From: Blöschl, G., (1996), Scale and Scaling in Hydrology, Habilitationsschrift, Weiner Mitteilungen Wasser Abwasser Gewasser, Wien, 346 p. The effect of sampling for measurement scales not commensurate with the process scale (b) extent too small – trend (c) support too large – smoothing out (a) spacing too large – noise (aliasing)

28 Discharge, Stage, Concentration and Daily Average Example

29 Data Types Continuous (Frequent sampling - fine spacing) Sporadic (Spot sampling - coarse spacing) Cumulative Incremental Average Maximum Minimum Constant over Interval Categorical

30 Incomplete or Inexact daily total occurring. Value is not a true 24-hour amount. One or more periods are missing and/or an accumulated amount has begun but not ended during the daily period. 15 min Precipitation from NCDC

31 Irregularly sampled groundwater level

32 Offset OffsetValue Distance from a datum or control point at which an observation was made OffsetType defines the type of offset, e.g. distance below water level, distance above ground surface, or distance from bank of river

33 Water Chemistry from a profile in a lake

34 Groups and Derived From Associations

35 Stage and Streamflow Example

36 Daily Average Discharge Example Daily Average Discharge Derived from 15 Minute Discharge Data

37 Methods and Samples Method specifies the method whereby an observation is measured, e.g. Streamflow using a V notch weir, TDS using a Hydrolab, sample collected in auto-sampler SampleID is used for observations based on the laboratory analysis of a physical sample and identifies the sample from which the observation was derived. This keys to a unique LabSampleID (e.g. bottle number) and name and description of the analytical method used by a processing lab.

38 Water Chemistry from Laboratory Sample

39 ValueAccuracy A numeric value that quantifies measurement accuracy defined as the nearness of a measurement to the standard or true value. This may be quantified as an average or root mean square error relative to the true value. Since the true value is not known this may should be estimated based on knowledge of the method and measurement instrument. Accuracy is distinct from precision which quantifies reproducibility, but does not refer to the standard or true value. Accurate Low Accuracy, but precise Low Accuracy ValueAccuracy

40 Data Quality Qualifier Code and Description provides qualifying information about the observations, e.g. Estimated, Provisional, Derived, Holding time for analysis exceeded QualityControlLevel records the level of quality control that the data has been subjected to. - Level 0. Raw Data - Level 1. Quality Controlled Data - Level 2. Derived Products - Level 3. Interpreted Products - Level 4. Knowledge Products

41 Series of Observations A “Data Series” is a set of all the observations of a particular variable at a site. The SeriesCatalog is programmatically generated to provide users with the ability to do data discovery (i.e. what data is available and where) without formulating complex queries or hitting the DataValues table which can get very large.

42 Outline HIS data publication system WaterML and WaterOneFlow web services Observations data model (ODM) Data loading Data editing and quality control Controlled vocabularies HIS central registration and tagging

43 Loading data into ODM Interactive OD Data Loader (OD Loader) –Loads data from spreadsheets and comma separated tables in simple format Scheduled Data Loader (SDL) –Loads data from datalogger files on a prescribed schedule. –Interactive configuration SQL Server Integration Services (SSIS) –Microsoft application accompanying SQL Server useful for programming complex loading or data management functions OD Data Loader SDL SSIS

44 Observations Database (ODM) Base Station Computer ODM Streaming Data Loader Internet Sensor Network Remote Monitoring Sites Data discovery, visualization, and analysis through Internet enabled applications Internet Radio Repeaters ApplicationsCentral Observations Database From Jeff Horsburgh

45 ODM Streaming Data Text Files Base Station Computer(s) ODM SDL manages the periodic insertion of the streaming data into the ODM database using the mappings stored in the XML configuration file. ODM SDL Import Application XML Config File ODM SDL Mapping Wizard Automate the data loading process via scheduled updates Map datalogger files to the ODM schema and controlled vocabularies ODM Streaming Data Loader Loading the Little Bear Sensor Data Into ODM From Jeff Horsburgh

46 CUAHSI Observations Data Model http://www.cuahsi.org/his/odm.html 1 23 Work from Out to In 4 5 6 7 At last … And don’t forget …

47 Managing Data Within ODM - ODM Tools Query and export – export data series and metadata Visualize – plot and summarize data series Edit – delete, modify, adjust, interpolate, average, etc.

48 Outline HIS data publication system WaterML and WaterOneFlow web services Observations data model (ODM) Data loading Data editing and quality control Controlled vocabularies HIS central registration and tagging

49 Syntactic Heterogeneity ODM Observations Database ODM Observations Database Excel Files Excel Files Access Files Access Files Text Files Text Files Data Logger Files Data Logger Files Multiple Data Sources With Multiple Formats From Jeff Horsburgh

50 Semantic Heterogeneity General Description of AttributeUSGS NWIS a EPA STORET b Structural Heterogeneity Code for location at which data are collected"site_no""Station ID" Name of location at which data are collected"Site" OR "Gage""Station Name" Code for measured variable"Parameter"?c?c Name of measured variable"Description""Characteristic Name" Time at which the observation was made"datetime""Activity Start" Code that identifies the agency that collected the data"agency_cd""Org ID" Contextual Semantic Heterogeneity Name of measured variable"Discharge""Flow" Units of measured variable"cubic feet per second""cfs" Time at which the observation was made"2008-01-01""2006-04-04 00:00:00" Latitude of location at which data are collected"41°44'36""41.7188889" Type of monitoring site"Spring, Estuary, Lake, Surface Water""River/Stream" a United States Geological Survey National Water Information System (http://waterdata.usgs.gov/nwis/).http://waterdata.usgs.gov/nwis/ b United States Environmental Protection Agency Storage and Retrieval System (http://www.epa.gov/storet/).http://www.epa.gov/storet/ c An equivalent to the USGS parameter code does not exist in data retrieved from EPA STORET. From Jeff Horsburgh

51 Overcoming Semantic Heterogeneity ODM Controlled Vocabulary System –ODM CV central database –Online submission and editing of CV terms –Web services for broadcasting CVs Variable Name Investigator 1:“Temperature, water” Investigator 2:“Water Temperature” Investigator 3:“Temperature” Investigator 4:“Temp.” ODM VariableNameCV Term … Sunshine duration Temperature Turbidity … From Jeff Horsburgh

52 Dynamic controlled vocabulary moderation system Local ODM Database Master ODM Controlled Vocabulary ODM Website ODM Controlled Vocabulary Moderator ODM Data Manager ODM Controlled Vocabulary Web Services ODM Tools Local Server XML http://his.cuahsi.org/mastercvreg.html From Jeff Horsburgh

53 Outline HIS data publication system WaterML and WaterOneFlow web services Observations data model (ODM) Data loading Data editing and quality control Controlled vocabularies HIS central registration and tagging

54 Registering Web Services with HIS Central Listing of all public data services Enables applications like Hydroseek to discover data

55 Tagging Variables for Data Discovery Through a Metadata Catalog Ontology: A hierarchy of concepts Each Variable in your data is connected to a corresponding Concept From Michael Piasecki

56 Department of Civil, Architectural & Environmental Engineering6/16/2015Department of Civil, Architectural & Environmental Engineering 56 Tagging variables in Ontology WATERS Network Information System Steps 1.The WSDL for a set of ODM web services is registered in the WSDL Registry 2.The “harvester” jumps into action and trawls through the web services at the WSDL to find and identify new variables 3.It returns i) data updating information and ii) variable names used and compares these to those used by HydroSeek. From Michael Piasecki

57 Department of Civil, Architectural & Environmental Engineering6/16/2015Department of Civil, Architectural & Environmental Engineering 57 Mapping onto Ontology Steps contd. 4.New variables are manually mapped onto appropriate ontology concept. 5.HydroSeek catalogue is updated. From Michael Piasecki

58 Hydroseek http://www.hydroseek.org http://www.hydroseek.org Supports search by location and type of data across multiple observation networks including NWIS, Storet, and university data

59 Summary Generic method for publishing observational data –Supports many types of point observational data –Overcomes syntactic and semantic heterogeneity using a standard data model and controlled vocabularies –Supports a national network of observatory test beds but can grow! Web services provide programmatic machine access to data –Work with the data in your data analysis software of choice Internet-based applications provide user interfaces for the data and geographic context for monitoring sites


Download ppt "Sharing and publishing data using CUAHSI HIS Outline HIS data publication system WaterML and WaterOneFlow web services Observations data model (ODM) Data."

Similar presentations


Ads by Google