Download presentation
Presentation is loading. Please wait.
Published byJanel Berry Modified over 9 years ago
1
A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah L. McGuinness Tetherless World Senior Constellation Chair Professor of Computer and Cognitive Science Rensselaer Polytechnic Institute Troy, NY, USA
2
Introduction Real Life Motivation Example: –In 2009, in Bristol County, Rhode Island, Children start getting sick with symptom like diarrhea. The cause was found to be polluted water. –Public concerns: “When did the contamination begin?”, “How did this happen?”, “How can we keep it from happening again?” –We need an environmental informatics systems that can automatically integrate and analyze water quality.
3
Challenges 1.Raw data from multiple sources and in different format – difficult to integrate and query. 2.Semantics of the water quality data are not explicitly encoded in the data – machine can’t process data automatically. 3.Large amount of data due to large spatial region, long time span, and large number of pollutants and regulated limit – analysis can be time consuming and complex.
4
TWC-SWQP Identify point sources of water pollution, including water sites monitored by USGS and polluting facilities regulated by EPA. Demonstrates the effectiveness of semantic web technologies in addressing the challenges faced by environmental informatics systems. Enable/Enpower citizens & scientists to better explore water related information.
5
System Architecture access Virtuoso
6
SemantAQUA Workflow Archive CSV2RDF4LOD Enhance CSV2RDF4LOD Enhance derive integrate archive Publish CSV2RDF4LOD Direct CSV2RDF4LOD Direct visualize
7
Ontology Core TWC Water ontology –Extends existing best practice ontologies, e.g. SWEET, OWL-Time. –Includes terms for relevant pollution concepts –Can use to conclude: “any water source that has a measurement outside of its allowable range” is a polluted water source. Portion of the TWC Water Ontology.
8
Ontology Regulation Ontology –model the federal and state water quality regulations for drinking water sources –Can use to define: for example, in California, “any measurement has value 0.01 mg/L is the limit for Arsenic” –Combine with core ontology, we can infer “any water source contains 0.01 mg/L of Arsenic is a polluted water source.” Portion of Cal. Regulation Ontology.
9
Provenance Preserves provenance in the Proof Markup Language (PML). Data Source Level Provenance: –The captured provenance data are used to support provenance-based queries. Reasoning level provenance: –When water source been marked as polluted, user can access supporting provenance data for the explanations including the URLs of the source data, intermediate data and the converted data.
10
Visualization 1.Presents analyzed results with Google Map 2.Presents explanation of water source pollution 3.Presents possible health effect of contaminant 4.Presents “Facet” type filter to select type of data 5.Presents link to the authority, where user can report problems. 1 2 3 http://was.tw.rpi.edu/swqp/map.html 4 5
11
Visualization Time series Visualization: –Presents data in time series visualization for user to explore and analyze the data Limit value: 15 Violation, measured value: 50 http://was.tw.rpi.edu/swqp/trend/epaTrend.html?state=RI&county=3&site=http%3A%2F%2Ftw2.tw.rpi.edu%2Fzhengj3%2Fow l%2Fepa.owl%23facility-110000312135
12
Demo
13
Data EPA Data: –Provides measurements of pollutants in the water discharged by the facilities, and also the threshold values for up to five test types for each pollutant. USGS Data: –Provides measurements of substances contained in water samples collected at USGS data-collection stations Regulation Data: –Provides lists of pollutants and their maximum contaminant level
14
Selected Follow-up options Limit Violation
15
Results Semantic Data Integration provides an effective and low cost approach for integrating data from various sources. SWQP integrates data from various sources, including EPA, USGS, and state governments. Linking to external data: “twcwater:Arsenic”, linked to “dbpedia:Arsenic” using owl:sameAs. We have generated 89.58 million triples for the USGS datasets and 105.99 million triples for the EPA datasets. Requires only 2-person days.
16
Results Query and reasoning supported by semantic technologies improves responsiveness and simplifies the development of web applications. SPARQL queries narrows down the data, we can reason over only the relevant data on one selected regulation. Reasoning eases the complexity of queries a developer needs to write for software applications.
17
Results Provenance information encoded using semantic web technology supports transparency and trust. SWQP provides detailed provenance information: –Original data, intermediate data, data source “What if” Senario: user may trust data from certain authorities only. –User can apply a stricter regulation from another state to a local water source.
18
Discussion Future Work –Expand SWQP to support all 50 states. –Add flood/weather information, and their effect on water sources –model the health effects from exposure to the excessive pollutants in water and support reasoning over these effects. –Expand SWQP to other environmental topics: soil quality, air quality –Get community involved: user can put comment on each water source, or report problem to the authorities.
19
Conclusion SWQP is a web portal that allows citizens and professionals to easily explore water quality information. SWQP illustrated benefits of applying semantic web technologies to water quality research. –Data integration, provenance, automatic reasoning. Architecture of SWQP can be easily apply to other environment topics –Air quality, soil quality, etc.
20
Questions? http://tw.rpi.edu/web/project/SemantAQUA http://inference-web.org/wiki/Semantic_Water_Quality_Portal
21
BACKUP SLIDES
22
Related work Other work focuses on facilitating water quality management [13, 14] and wastewater treatment [15] via knowledge sharing and reuse. –[13] presents system that integrates water quality data from multiple sources and retrieves data using semantic relationships among data. –[14] presented an ontology-based Knowledge Management system (KMS) that can be integrated into the numerical flow and water quality modeling to provide assistance on the selection of a model and its pertinent parameters –[15] is an environmental decision-support system for wastewater management, which augments classic rule-based and case-based reasoning with a domain ontology. SWQP: –SWQP differs from these projects in that it supports provenance based query. –SWQP is built upon standard semantic technologies (e.g. OWL, SPARQL, Pellet, Virtuoso) and thus can be easily replicated or expanded.
23
Queries for result 2 SELECT * WHERE { ?watersource twcwater:hasMeasurement ?measurement. ?measurement twcwater:hasValue ?value; twcwater:hasCharacteristic ?charactericsitc; twcwater:hasUnit ?unit. (1) ?regulation twcwater:hasValue ?limit; twcwater:hasCharacteristic ?characteristic; twcwater:hasUnit ?unit. ?watersource geo:lat ?lat; geo:long ?long. FILTER( ?value > limit ) } SELECT * WHERE { ?watersource rdf:type twcwater:pollutedWaterSource. geo:lat ?lat; (2) geo:long ?long. }
24
New Ontology New Regulation ontology –Reuse sweet:Measurement instead of use owl:sameAs –Defines cardinality restriction –Defines Datatype restriction Portion of the EPA regulation ontology
25
New Ontology TWC Environment Monitoring Ontology –Can be extended to use different regulation –Uses sweet ontology –More general ontology: aim for not just monitoring water, but anything relate to environment: air quality.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.