A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Slides:



Advertisements
Similar presentations
Towards Next Generation Integrative Mobile Semantic Health Information Assistants Evan W. Patton John Sheehan Yue.
Advertisements

Addressing the Challenges of Multi-Domain Data Integration with the SemantEco Framework Evan W. Patton, Patrice Seyed, Deborah L. McGuinness Presented.
Water Quality Portal Semantic e-Science Evan Patton Jin Guang Zheng Ping Wang Theodora Kampelou 11/22/2010.
Presenting Provenance Based on User Roles Experiences with a Solar Physics Data Ingest System Patrick West, James Michaelis, Peter Fox, Stephan Zednik,
Next Generation Environmental Informatics as exemplified by the Tetherless World Semantic Water Quality Portal Ping Wang 1 Jin Guang.
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
A Semantic Sommelier as an Ontology-powered Mobile Social Application and a Pedagogical Tool Deborah L. McGuinness and Evan W. Patton.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Alvaro Graves, James R. Michaelis, Xian Li, Deborah.
Knowledge Provenance in Semantic Wikis Li Ding, Jie Bao, and Deborah McGuinness Tetherless World Constellation, Rensselaer Polytechnic Institute Troy,
Information Fusion: Moving from domain independent to domain literate approaches Professor Deborah L. McGuinness Tetherless World Constellation, Rensselaer.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
TWC Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies Xiaogang (Marshall) Ma Tetherless World Constellation.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
ToolMatch: Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Patrick West 1 Nancy Hoebelheinrich.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
The Linked Government Data Landscape Today data.gov and TWC LOGD Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation Rensselaer.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Provenance-Aware Faceted Search Deborah L. McGuinness 1,2 Peter Fox 1 Cynthia Chang 1 Li Ding 1.
Free Open-Source, Open- Platform System for Information Mash-Up and Exploration in Earth Science Tawan Banchuen, Will Smart, Brandon Whitehead, Mark Gahegan,
Mash-up of Linked Government Data from Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation, Rensselaer Polytechnic.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,
References: [1] [2] [3] Acknowledgments:
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
References: [1] Branch, B.D., Fosmire, M., The role of interdisciplinary GIS and data curation librarians in enhancing authentic scientific research.
Discovering accessibility, display, and manipulation of data in a data portal Nancy Hoebelheinrich Patrick West 2
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Future Learning Landscapes Yvan Peter – Université Lille 1 Serge Garlatti – Telecom Bretagne.
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
1 Advanced Semantic Technologies Prof. Deborah McGuinness and Dr. Patrice Seyed CSCI CSCI ITWS ITWS TA: Justin.
Prof. Peter #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
1 Semantic Provenance and Integration Peter Fox and Deborah L. McGuinness Joint work with Stephan Zednick, Patrick West, Li Ding, Cynthia Chang, … Tetherless.
Resource Discovery for Extreme Scale Collaboration Benno Lee Patrick West 1 William Smith 2
Linking Open Government Data (TWC LOGD) Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation Rensselaer Polytechnic Institute.
Tetherless World Constellation Semantic Web Science Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
TWC-SWQP: A Semantically-Enabled Provenance-Aware Water Quality Portal Ping Wang, Jin Guang Zheng, Linyun Fu, Evan W. Patton, Timothy Lebo, Li Ding, Joanne.
Linked Open Government Data: What’s Next? Li Ding, James A. Hendler, and Deborah L. McGuinness With thanks to the entire RPI Tetherless World LOGD team:
VIVO Conference 2013 Panel on VIVO Use-Cases for Collaborative Science: From Researcher Networks to Semantic User Interfaces for Data Patrick West – Tetherless.
References: [1] Lebo, T., Sahoo, S., McGuinness, D. L. (eds.), PROV-O: The PROV Ontology. Available via: [2]
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
A Semantic Web Approach for the Third Provenance Challenge Tetherless World Rensselaer Polytechnic Institute James Michaelis, Li Ding,
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Semantic Web Portal: A Platform for Better Browsing and Visualizing Semantic Data Ying Ding et al. Jin Guang Zheng, Tetherless World Constellation.
TMO Review Jin Guang Zheng, Tetherless World Constellation.
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Prizms for Data Publication and Management Katie Chastain May 9, 2014.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Open Government Data Dominic DiFranzo PhD Student/Research Assistant Rensselaer Polytechnic Institute Tetherless World Constellation.
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Panel: OWL Leaves the Nest Knowledge Integration for Ubiquitous Agents Harry Chen Image Matters LLC First International.
Scaling the Wall: Experiences adapting a Semantic Web application to utilize social networks on mobile devices Evan W. Patton 1 ( ) &
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Existing Designs and Prototypes at RPI
Modeling Data Set Versioning Operations
Foundations VI: Provenance
Modeling Data Set Versioning Operations
Chaitali Gupta, Madhusudhan Govindaraju
Presentation transcript:

A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah L. McGuinness Tetherless World Senior Constellation Chair Professor of Computer and Cognitive Science Rensselaer Polytechnic Institute Troy, NY, USA

Introduction Real Life Motivation Example: –In 2009, in Bristol County, Rhode Island, Children start getting sick with symptom like diarrhea. The cause was found to be polluted water. –Public concerns: “When did the contamination begin?”, “How did this happen?”, “How can we keep it from happening again?” –We need an environmental informatics systems that can automatically integrate and analyze water quality.

Challenges 1.Raw data from multiple sources and in different format – difficult to integrate and query. 2.Semantics of the water quality data are not explicitly encoded in the data – machine can’t process data automatically. 3.Large amount of data due to large spatial region, long time span, and large number of pollutants and regulated limit – analysis can be time consuming and complex.

TWC-SWQP Identify point sources of water pollution, including water sites monitored by USGS and polluting facilities regulated by EPA. Demonstrates the effectiveness of semantic web technologies in addressing the challenges faced by environmental informatics systems. Enable/Enpower citizens & scientists to better explore water related information.

System Architecture access Virtuoso

SemantAQUA Workflow Archive CSV2RDF4LOD Enhance CSV2RDF4LOD Enhance derive integrate archive Publish CSV2RDF4LOD Direct CSV2RDF4LOD Direct visualize

Ontology Core TWC Water ontology –Extends existing best practice ontologies, e.g. SWEET, OWL-Time. –Includes terms for relevant pollution concepts –Can use to conclude: “any water source that has a measurement outside of its allowable range” is a polluted water source. Portion of the TWC Water Ontology.

Ontology Regulation Ontology –model the federal and state water quality regulations for drinking water sources –Can use to define: for example, in California, “any measurement has value 0.01 mg/L is the limit for Arsenic” –Combine with core ontology, we can infer “any water source contains 0.01 mg/L of Arsenic is a polluted water source.” Portion of Cal. Regulation Ontology.

Provenance Preserves provenance in the Proof Markup Language (PML). Data Source Level Provenance: –The captured provenance data are used to support provenance-based queries. Reasoning level provenance: –When water source been marked as polluted, user can access supporting provenance data for the explanations including the URLs of the source data, intermediate data and the converted data.

Visualization 1.Presents analyzed results with Google Map 2.Presents explanation of water source pollution 3.Presents possible health effect of contaminant 4.Presents “Facet” type filter to select type of data 5.Presents link to the authority, where user can report problems

Visualization Time series Visualization: –Presents data in time series visualization for user to explore and analyze the data Limit value: 15 Violation, measured value: 50 l%2Fepa.owl%23facility

Demo

Data EPA Data: –Provides measurements of pollutants in the water discharged by the facilities, and also the threshold values for up to five test types for each pollutant. USGS Data: –Provides measurements of substances contained in water samples collected at USGS data-collection stations Regulation Data: –Provides lists of pollutants and their maximum contaminant level

Selected Follow-up options Limit Violation

Results Semantic Data Integration provides an effective and low cost approach for integrating data from various sources. SWQP integrates data from various sources, including EPA, USGS, and state governments. Linking to external data: “twcwater:Arsenic”, linked to “dbpedia:Arsenic” using owl:sameAs. We have generated million triples for the USGS datasets and million triples for the EPA datasets. Requires only 2-person days.

Results Query and reasoning supported by semantic technologies improves responsiveness and simplifies the development of web applications. SPARQL queries narrows down the data, we can reason over only the relevant data on one selected regulation. Reasoning eases the complexity of queries a developer needs to write for software applications.

Results Provenance information encoded using semantic web technology supports transparency and trust. SWQP provides detailed provenance information: –Original data, intermediate data, data source “What if” Senario: user may trust data from certain authorities only. –User can apply a stricter regulation from another state to a local water source.

Discussion Future Work –Expand SWQP to support all 50 states. –Add flood/weather information, and their effect on water sources –model the health effects from exposure to the excessive pollutants in water and support reasoning over these effects. –Expand SWQP to other environmental topics: soil quality, air quality –Get community involved: user can put comment on each water source, or report problem to the authorities.

Conclusion SWQP is a web portal that allows citizens and professionals to easily explore water quality information. SWQP illustrated benefits of applying semantic web technologies to water quality research. –Data integration, provenance, automatic reasoning. Architecture of SWQP can be easily apply to other environment topics –Air quality, soil quality, etc.

Questions?

BACKUP SLIDES

Related work Other work focuses on facilitating water quality management [13, 14] and wastewater treatment [15] via knowledge sharing and reuse. –[13] presents system that integrates water quality data from multiple sources and retrieves data using semantic relationships among data. –[14] presented an ontology-based Knowledge Management system (KMS) that can be integrated into the numerical flow and water quality modeling to provide assistance on the selection of a model and its pertinent parameters –[15] is an environmental decision-support system for wastewater management, which augments classic rule-based and case-based reasoning with a domain ontology. SWQP: –SWQP differs from these projects in that it supports provenance based query. –SWQP is built upon standard semantic technologies (e.g. OWL, SPARQL, Pellet, Virtuoso) and thus can be easily replicated or expanded.

Queries for result 2 SELECT * WHERE { ?watersource twcwater:hasMeasurement ?measurement. ?measurement twcwater:hasValue ?value; twcwater:hasCharacteristic ?charactericsitc; twcwater:hasUnit ?unit. (1) ?regulation twcwater:hasValue ?limit; twcwater:hasCharacteristic ?characteristic; twcwater:hasUnit ?unit. ?watersource geo:lat ?lat; geo:long ?long. FILTER( ?value > limit ) } SELECT * WHERE { ?watersource rdf:type twcwater:pollutedWaterSource. geo:lat ?lat; (2) geo:long ?long. }

New Ontology New Regulation ontology –Reuse sweet:Measurement instead of use owl:sameAs –Defines cardinality restriction –Defines Datatype restriction Portion of the EPA regulation ontology

New Ontology TWC Environment Monitoring Ontology –Can be extended to use different regulation –Uses sweet ontology –More general ontology: aim for not just monitoring water, but anything relate to environment: air quality.