Towards Executable Provenance Graphs for Reported Results in Research Publications Linyun Fu (ful2@rpi.edu), Xiaogang Ma (max7@rpi.edu), Patrick West (pwest@rpi.edu)

Slides:



Advertisements
Similar presentations
Towards a Common Provenance Model for Research Publications Linyun Fu Xiaogang Ma Patrick West Stace Beaulieu.
Advertisements

Complexity must become Linear or Decrease Smart data infrastructure: The sixth generation of mediation for data science Peter Fox 1
Global Change Information System Curt Tilmes, USGCRP/NASA Brian Duggan, Steve Aulenbach, Justin Goldstein, USGCRP/UCAR Andrew Buddenberg, NCA/TSU, NOAA/NCDC/CICS.
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
Evolving the BCO-DMO search interface - experience with semantic and smart search Cyndy Chandler (WHOI) Peter Fox (RPI and WHOI) Robert Groman, Dicky Allison.
Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis ( ), Deborah L. McGuinness
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
ToolMatch: Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Patrick West 1 Nancy Hoebelheinrich.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Provenance-Aware Faceted Search Deborah L. McGuinness 1,2 Peter Fox 1 Cynthia Chang 1 Li Ding 1.
Beyond a Data Portal: A Collaborative Environment for the Deep Carbon Science Communities Han Wang, Yu Chen, Patrick West, John Erickson, Xiaogang Ma,
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Global Change Information System: Information Model and Semantic Application Prototypes (GCIS-IMSAP) Status 01/08/2013 Stephan Zednik 1, Curt Tilmes 2,
Provenance Capture in Data Access And Data Manipulation Software Patrick West 1 Peter Fox
An Example in The DCO Data Portal Formal Specification of Data Types in the Deep Carbon Observatory Data Portal Xiaogang (Marshall) Ma
References: [1] [2] [3] Acknowledgments:
What has been lacking, until recently, is a successful method to develop, implement and sustain informatics solutions to modern application problems, such.
Persistent Identification of Agents and Objects of Global Change: Progress in the Global Change Information System Peter Fox, RPI Curt Tilmes, NASA Xiaogang.
Discovering accessibility, display, and manipulation of data in a data portal Nancy Hoebelheinrich Patrick West 2
Global Change Information System (GCIS) ESIP Federation Winter Meeting,
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
Motivations and Challenges: Proper data management hinges on recording and maintaining “steps” applied to create data. Consumers require methods to assess.
Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013 ECO-OP is supported by NSF Grant # PIs: Peter Fox.
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Local global disambiguation of terms and concepts The BCO-DMO metadata database uses controlled vocabularies to record many of the important pieces of.
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
TWC Experience in ontology engineering with the Global Change Information System Xiaogang (Marshall) Ma Tetherless World Constellation Rensselaer Polytechnic.
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
TWC Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Observatory Community Xiaogang (Marshall) Ma, Yu Chen, Han Wang, Patrick West,
 The many scales encountered in assessing and managing large marine ecosystems (LMEs) presents a level of diversity and heterogeneity, or complexity,
Applying Provenance Extensions to OPeNDAP Framework Patrick West, James Michaelis, Tim Lebo, Deborah L. McGuinness Rensselaer Polytechnic Institute Tetherless.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
Beginning with an NSF INTEROP project whose goal is to facilitate the deployment of an Integrated Ecosystem Approach (IEA) to management in the Northeast.
TWC Ontology Development for Provenance Tracing in National Climate Assessment of the US Global Change Research Program Xiaogang Ma a, Jin Guang Zheng.
Resource Discovery for Extreme Scale Collaboration Benno Lee Patrick West 1 William Smith 2
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
References: [1] Lebo, T., Sahoo, S., McGuinness, D. L. (eds.), PROV-O: The PROV Ontology. Available via: [2]
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
Facilitating Next Generation Science Collaboration: Marine Ecosystems Status Reports and Assessments June 24, 2014 IMBER – D2 Peter Fox (RPI/ Tetherless.
Toward verifiable science: iPython meets PROV-O (Semantics in Ecosystems Assessments). April 16, 2014 ERRT Peter Fox (RPI/ Tetherless World Constellation.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
 Key integrating concepts  Groups  Formal Community Groups  Ad-hoc special purpose/ interest groups  Fine-grained access control and membership 
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
TWC A use case-driven iterative method for building a provenance-aware GCIS ontology Xiaogang Ma a, Jin Guang Zheng a, Justin Goldstein b,c, Linyun Fu.
How Environmental Informatics is Preparing Us for the Era of Big Data AGU FM 2013 GC11F-01 December 09, 2013, MW 3001 Peter
A SCIENTIFIC PAPER INCLUDES: Introduction: What question was studied and why? Methods: How was the problem studied? Results: What were the findings? and.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
NOAA's Northeast Shelf Ecosystem Status Report: collaborating with IPython Notebooks for reproducibility July 2013 ECO-OP is supported by NSF Grant #
Social and Personal Factors in Semantic Infusion Projects Patrick West 1 Peter Fox 1 Deborah McGuinness 1,2
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
Worked example: Global Change Information System Peter Fox, and … others Xinformatics 4400/6400 Week 11, April 19, 2016.
Poster: EGU Glossary: USGCRP – United States Global Change Research Program NCA – National Climate Assessment GCIS – Global Change Information.
Scaling the Wall: Experiences adapting a Semantic Web application to utilize social networks on mobile devices Evan W. Patton 1 ( ) &
Scientific Literature and Communication Unit 3- Investigative Biology b) Scientific literature and communication.
British Oceanographic Data Centre, Liverpool, England, United Kingdom
Get the poster at Semantic Visualization Provenance Records:
Provenance Capture in Data Access And Data Manipulation Software
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Chapter 10: Process Implementation with Executable Models
Ontology Evolution: A Methodological Overview
Deep Carbon Observatory Data Science Platform
CMSP / OCM Vocabulary Services rpi
Data types and persistent identifiers in
Modeling Data Set Versioning Operations
Ecosystem Status Report: collaborating with IPython Notebooks
Modeling Data Set Versioning Operations
Presentation transcript:

Towards Executable Provenance Graphs for Reported Results in Research Publications Linyun Fu (ful2@rpi.edu), Xiaogang Ma (max7@rpi.edu), Patrick West (pwest@rpi.edu) , Stace Beaulieu (sbeaulieu@whoi.edu), Massimo Di Stefano (distem@rpi.edu), Peter Fox (pfox@cs.rpi.edu) Rensselaer Polytechnic Institute 110 8th St., Troy, NY 12180, United States Woods Hole Oceanographic Institute 86 Water St., Woods Hole, MA 02543, United States Motivation Past Experience Approach Results in research publications often are quite separated from the underlying collection and analysis of data. The grand goal of keeping track of provenance is to enable the readers to understand the process the authors have gone through to produce the reported results from the collected data. Provenance describes the lineage of the source data and the changing processes leading to the final results for readers to correctly interpret report content. Provenance also enables readers to evaluate the credibility of the reported results by digging into the software in use, source data and responsible agents. We are working towards using a provenance model to replicate the process from data transformation to reporting of results in research publications and even to validate the scientific conclusions by allowing readers to adapt existing experiments reported in the papers and carry out their own studies. General provenance ontologies such as PROV-O, the new W3C standard adopted in 2013 shown below, cannot record provenance detailed enough for repeating the described process in order to replicate the reported results. We have been doing ontology specialization work based on PROV-O for two past projects, namely GCIS-IMSAP and ECOOP. GCIS-IMSAP models and captures provenance information for the recent National Climate Assessment (NCA) draft report of the US Global Change Research Program (USGCRP). A sample provenance sequence is shown below. ECOOP models and captures provenance information for the Ecosystem Status Report (ESR) of the Northeast Fisheries Science Center (NEFSC). A sample provenance sequence is shown below. Instances of classes “prov:Entity” and “prov:Activity” are shown in yellow and blue colors, respectively. For both projects, we directly used the "prov:Activity" class and its related properties in PROV-O to model the processes leading to data products. The provenance graphs turn out to be too general to execute. We define our provenance ontology for research publications, called PROV-PUB-O, by specializing the "activity" class and the "used" property in PROV-O to make the ontology suitable for capturing executable provenance in research publications. Interesting activities in the process of preparing research papers are all the changes of data, which can be classified into the following three categories. Physical changes such as data download, copying, or sharing. Syntactical changes such as XML to JSON conversion. Semantic changes such as data analysis and transformation. Each of the above changes corresponds to a certain way of data usage. The specialized ontology is not only helpful in describing the provenance, but it also enables the construction of executable provenance graphs to preserve the data product preparing process at a level that is detailed enough to be replicable. Poster: ESIP2015WINTER-PROV-PUB-O Glossary: PROV-O – The W3C Provenance Ontology GCIS-IMSAP – Global Change Information System: Information Model and Semantic Application Prototypes ECOOP – An INTEROP proposal for the management in the Northeast and California Current Large Marine Ecosystems Sponsors: Acknowledgements: We acknowledge Jin Zheng from RPI, Justin Goldstein, Brian Duggan and Steve Aulenbach from USGCRP, and Curt Tilmes from NASA for their support in modeling and capturing provenance in the GCIS-IMSAP project.