Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Complexity must become Linear or Decrease Smart data infrastructure: The sixth generation of mediation for data science Peter Fox 1
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
A Semantic Sommelier as an Ontology-powered Mobile Social Application and a Pedagogical Tool Deborah L. McGuinness and Evan W. Patton.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis ( ), Deborah L. McGuinness
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
ToolMatch: Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Patrick West 1 Nancy Hoebelheinrich.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
Provenance-Aware Faceted Search Deborah L. McGuinness 1,2 Peter Fox 1 Cynthia Chang 1 Li Ding 1.
Beyond a Data Portal: A Collaborative Environment for the Deep Carbon Science Communities Han Wang, Yu Chen, Patrick West, John Erickson, Xiaogang Ma,
Configurable User Interface Framework for Cross-Disciplinary and Citizen Science Presented by: Peter Fox Authors: Eric Rozell, Han Wang, Patrick West,
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Global Change Information System: Information Model and Semantic Application Prototypes (GCIS-IMSAP) Status 01/08/2013 Stephan Zednik 1, Curt Tilmes 2,
Provenance Capture in Data Access And Data Manipulation Software Patrick West 1 Peter Fox
An Example in The DCO Data Portal Formal Specification of Data Types in the Deep Carbon Observatory Data Portal Xiaogang (Marshall) Ma
References: [1] [2] [3] Acknowledgments:
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Semantic Cyberinfrastructure for Knowledge and Information Discovery (SCiKID) Proposal Principle Investigator: Eric Rozell Tetherless World Constellation.
References: [1] Branch, B.D., Fosmire, M., The role of interdisciplinary GIS and data curation librarians in enhancing authentic scientific research.
Discovering accessibility, display, and manipulation of data in a data portal Nancy Hoebelheinrich Patrick West 2
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
Motivations and Challenges: Proper data management hinges on recording and maintaining “steps” applied to create data. Consumers require methods to assess.
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Local global disambiguation of terms and concepts The BCO-DMO metadata database uses controlled vocabularies to record many of the important pieces of.
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
DOAP – Description of a Project Ontology DOAP provides us with the ability to represent software, software projects, releases of software, licensing information,
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
TWC Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Observatory Community Xiaogang (Marshall) Ma, Yu Chen, Han Wang, Patrick West,
Prof. Peter #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive.
Applying Provenance Extensions to OPeNDAP Framework Patrick West, James Michaelis, Tim Lebo, Deborah L. McGuinness Rensselaer Polytechnic Institute Tetherless.
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West 1 Nancy Hoebelheinrich.
Resource Discovery for Extreme Scale Collaboration Benno Lee Patrick West 1 William Smith 2
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
VIVO Conference 2013 Panel on VIVO Use-Cases for Collaborative Science: From Researcher Networks to Semantic User Interfaces for Data Patrick West – Tetherless.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
 Key integrating concepts  Groups  Formal Community Groups  Ad-hoc special purpose/ interest groups  Fine-grained access control and membership 
DCO-DS: Moving Forward DCO Synthesis Meeting. Oct , 2015 DCO-DS = DCO Data Science.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
Supported by ESIP Semantic Web Cluster A service based on community-built semantic web applications Provide users with the means to match their datasets.
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Deep Carbon Observatory Data Science and Data Management Infrastructure Overview and Demonstration Patrick West – Tetherless World Constellation Rensselaer.
Application of RDF-OWL in the ESG Ontology Sylvia Murphy: Julien Chastang: Luca Cinquini:
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Social and Personal Factors in Semantic Infusion Projects Patrick West 1 Peter Fox 1 Deborah McGuinness 1,2
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
A Framework for Earth Science Search Interface Development Design and Implementation of S2S Presented by: Stephan Zednik, Tetherless World Constellation.
Poster: EGU Glossary: USGCRP – United States Global Change Research Program NCA – National Climate Assessment GCIS – Global Change Information.
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Stephan Zednik, Patrick West, Peter Fox Tetherless World Constellation
Stephan Zednik, Patrick West, Peter Fox Tetherless World Constellation
CMSP / OCM Vocabulary Services rpi
Data types and persistent identifiers in
Modeling Data Set Versioning Operations
ToolMatch Service: Finding Tools for Your Data & Data for Your Tools ESIP Summer 2014 A Collaboration between ESIP’s: Semantic Web Cluster & Product &
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West1 Nancy
Adoption of RDA DTR and PIT in the Deep Carbon Observatory Data Portal
Modeling Data Set Versioning Operations
Presentation transcript:

Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2 William Smith Jesse Weaver Alan Chappell Patrick West Peter Fox ( 1 Rensselaer Polytechnic Institute Troy, NY, United States) ( 2 Pacific Northwest National Laboratory Richland, VA, United Poster: IN33C-3785 Glossary: RDESC – Resource Discovery for Extreme Scale Collaboration RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute Acknowledgments: Eric Rozell – RPI Master’s Student now with Microsoft Sponsors: Department of Energy The volume and variety of data generated in science is rapidly increasing. Geophysical science is no exception in that various independent projects produce disparate, heterogeneous datasets. While researchers typically make this data available to others, there is a need to make these valuable resources more discoverable and understandable to user communities in order to accelerate scientific research. The cost of making data discoverable and understandable depends on how the original data was curated, transformed, generated, and published. User interfaces and visualizations that support exploration and interaction with the data further enhance understanding of available content. This presentation describes research and development conducted under the Resource Discovery for Extreme Scale Collaboration (RDESC) project. As part of RDESC we curate, clean, publish, and visualize scientific data following Linked Data principles. Towards enabling discovery and understandability, we curated data from multiple, interdisciplinary science domains and represented the metadata using standard Semantic Web and Web technologies. As a result of this transformation, we generated some 1.4 billion RDF triples that describe these previously existing data resources. These efforts led to our formulation of a number of suggested best practices for data publishers to reduce the cost and barriers to making data discoverable and understandable to research communities. Additionally, we developed a set of tools that provide scalable visualizations of this large-scale metadata to enhance the understandability for prospective users of the data resources. Abstract Resource splash pages dynamically generated using the twsparql module TWC S2S Faceted browser interface allowing search for collected resources First attempt at curating information from various sources, crawling OPeNDAP Hyrax installations to grab resources. Overall architecture of RDESC, curating information, trying different systems of curating, translating the information into semantic representation, different triple stores to store semantic information, and different ways of visualizing the information. RDESC Information Model utilizing already existing models Foaf – friend of a friend DC – Dublin Core terms Schema.org – common set of schemas for structured data and markup for the web RDESC web site using simple, standard web technologieshttp://rdesc.org Total number of triples currently being used 230,743,316 Total number of triples available Web Presence RDESC Ontology resolvable at Virtuoso StarDog Take Away: Multiple sources of data curated into a seamless Semantic Knowledge Store for searching, browsing, and visualization Information represented in common semantic information model using RDFs Research into the use of various semantic technologies with billions of triples – storage, search, browse, visualization Best practices showing the importance of providing rich information, context and experience with existing metadata. Future Work: Trying different content management systems with the large number of triples Distributed/Federated system Semantically represented information flattened and pushed into Apache SOLR (left). Or retrieved directly from the RDESC Knowledge Store (right). From either SOLR or S2S Faceted browser, resources displayed within content management system. Showing the difference in limited provided information (left) vs. semantically rich information (right) And/or