Annotating and Embedding Provenance in Science Data Repositories to Enable Next Generation Science Applications Deborah L. McGuinness.

Slides:



Advertisements
Similar presentations
Geoinformatics 2008 Fox Semantic Provenance 1 Semantic Provenance for Image Data Processing Peter Fox (HAO/ESSL/NCAR) Deborah McGuinness (RPI) Jose Garcia,
Advertisements

Complexity must become Linear or Decrease Smart data infrastructure: The sixth generation of mediation for data science Peter Fox 1
K S L W i n e A g e n t : Testbed Application for Semantic Web Technologies Deborah McGuinness Eric Hsu Jessica Jenkins Rob McCool Sheila McIlraith Paulo.
Explanation in GILA 2 Stanford -> RPI McGuinness, Ding January 15, 2008.
Presenting Provenance Based on User Roles Experiences with a Solar Physics Data Ingest System Patrick West, James Michaelis, Peter Fox, Stephan Zednik,
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
A Semantic Sommelier as an Ontology-powered Mobile Social Application and a Pedagogical Tool Deborah L. McGuinness and Evan W. Patton.
Provenance-aware faceted search Peter Fox Stephan Zednik Patrick West Tetherless World Constellation, RPI EGU 2010.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Jiao Tao, Li Ding, Deborah L. McGuinness Tetherless World Constellation Rensselaer Polytechnic Institute Troy, NY, USA Instance Data Evaluation on the.
Knowledge Provenance in Semantic Wikis Li Ding, Jie Bao, and Deborah McGuinness Tetherless World Constellation, Rensselaer Polytechnic Institute Troy,
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis ( ), Deborah L. McGuinness
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Linked Data and the Provenance Explosion Deborah L. McGuinness Tetherless World Constellation Chair Professor of Computer Science and Cognitive Science.
ToolMatch: Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Patrick West 1 Nancy Hoebelheinrich.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Web Explanations for Semantic Heterogeneity Discovery Pavel Shvaiko 2 nd European Semantic Web Conference (ESWC), 1 June 2005, Crete, Greece work in collaboration.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Peter Fox CSCI Week 9, October 27, 2008.
Provenance-Aware Faceted Search Deborah L. McGuinness 1,2 Peter Fox 1 Cynthia Chang 1 Li Ding 1.
Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study Deborah L. McGuinnessDeborah L. McGuinness, Co-Director Knowledge.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,
Provenance Capture in Data Access And Data Manipulation Software Patrick West 1 Peter Fox
References: [1] [2] [3] Acknowledgments:
Usage of `provenance’: A Tower of Babel Luc Moreau.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Joanne Luciano With Peter Fox and Li Ding CSCI Week 10, November.
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Discovering accessibility, display, and manipulation of data in a data portal Nancy Hoebelheinrich Patrick West 2
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
Motivations and Challenges: Proper data management hinges on recording and maintaining “steps” applied to create data. Consumers require methods to assess.
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
DOAP – Description of a Project Ontology DOAP provides us with the ability to represent software, software projects, releases of software, licensing information,
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
TWC Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Observatory Community Xiaogang (Marshall) Ma, Yu Chen, Han Wang, Patrick West,
Introduction to Tetherless World RPI by Jie Bao Slides will be available from:
1 Semantic Provenance and Integration Peter Fox and Deborah L. McGuinness Joint work with Stephan Zednick, Patrick West, Li Ding, Cynthia Chang, … Tetherless.
Applying Provenance Extensions to OPeNDAP Framework Patrick West, James Michaelis, Tim Lebo, Deborah L. McGuinness Rensselaer Polytechnic Institute Tetherless.
WDO-It! 101 Workshop: Creating an abstraction of a process UTEP’s Trust Laboratory NDR HP MP.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
WDO-It! 102 Workshop: Using an abstraction of a process to capture provenance UTEP’s Trust Laboratory NDR HP MP.
1 Foundations VI: Provenance Deborah McGuinness and Peter Fox CSCI Week 12, November 30, 2009.
The VIRTUAL SOLAR-TERRESTRIAL OBSERVATORY - Exploring paradigms for interdisciplinary data-driven science Peter Fox 1 Don Middleton 2,
123 Jiao Tao 1, Li Ding 2, Deborah L. McGuinness 3 Tetherless World Constellation Rensselaer Polytechnic Institute Troy, NY, USA 1 PhD Student 2 Postdoctoral.
Enabling Explanations: The Inference Web and PML Approach Deborah McGuinness, Paulo Pinheiro da Silva, Li Ding Knowledge Systems Laboratory Stanford University.
VIVO Conference 2013 Panel on VIVO Use-Cases for Collaborative Science: From Researcher Networks to Semantic User Interfaces for Data Patrick West – Tetherless.
References: [1] Lebo, T., Sahoo, S., McGuinness, D. L. (eds.), PROV-O: The PROV Ontology. Available via: [2]
Supported in part by the National Science Foundation under Grant No. HRD Any opinions, findings, and conclusions or recommendations expressed.
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
A Semantic Web Approach for the Third Provenance Challenge Tetherless World Rensselaer Polytechnic Institute James Michaelis, Li Ding,
Explainable Adaptive Assistants Deborah L. McGuinness, Tetherless World Constellation, RPI Alyssa Glass, Stanford University Michael Wolverton, SRI International.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
Supported in part by the National Science Foundation under Grant No. HRD Any opinions, findings, and conclusions or recommendations expressed.
Explainable Adaptive Assistants Deborah L. McGuinness, Tetherless World Constellation, RPI Alyssa Glass, Stanford University Michael Wolverton, SRI International.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
Social and Personal Factors in Semantic Infusion Projects Patrick West 1 Peter Fox 1 Deborah McGuinness 1,2
VisKo: Enabling Visualization Generation Over the Web Nicholas Del Rio – UTEP Paulo Pinheiro - PNNL 1
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
The Semantic eScience Framework AGU FM10 IN22A-02 Deborah McGuinness and Peter Fox (RPI) Tetherless World Constellation.
Poster: EGU Glossary: USGCRP – United States Global Change Research Program NCA – National Climate Assessment GCIS – Global Change Information.
Get the poster at Semantic Visualization Provenance Records:
Encoding Extraction as Inferences
improve the efficiency, collaborative potential, and
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Data types and persistent identifiers in
Foundations VI: Provenance
Adoption of RDA DTR and PIT in the Deep Carbon Observatory Data Portal
Presentation transcript:

Annotating and Embedding Provenance in Science Data Repositories to Enable Next Generation Science Applications Deborah L. McGuinness 1,2, Peter Fox 1,4, Paulo Pinheiro da Silva 3, Stephan Zednick 1,4, Nick del Rio 3, Li Ding 1, Patrick West 1,4, Cynthia Chang 1 1 Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, NY McGuinness Associates, Latham, NY & Stanford, CA University of Texas, El Paso, TX 79968, 4 HAO/NCAR, Boulder, CO Partially supported by NSF/OCI award OCI and NSF CyberShARE - HRD Figure 1: Documentation of the Coronal Helium I Imaging Photometer (CHIP) data processing pipeline, including the identification of specific artifacts, processes and people involved. Figure 3: Probe-It! Screen shot displaying the main screen – full provenance trace is in the lower left corner (inset) and the main screen shows the zoomed view of the lower portion (see zoom rectangle in lower left inset) and provenance viewer (right) with formatted text description of items extracted from the observer log. Figure 2: Provenance workflow for initial Quicklook deployment. Fact: Scientific data services are increasing in usage and scope, and with these increases comes growing need for access to provenance information. Science Project Goal: design and implement an extensible provenance solution that is deployed at the science data ingest time. Provenance Infrastructure Goal: to design a reusable, interoperable provenance infrastructure. Outcome: implemented provenance solution in one science setting AND operational specification for other scientific data applications. Additional Team members: Leo Salayandia 3, Aida Gandara 3, Jiao Tao 1, Honglei Zeng 1 Example Use Cases 1.What was the cloud cover and atmospheric seeing conditions during the local morning of January 26, 2005 at MLSO? 2.Find all good images on March 21, Why are the Quicklook images from March 21, 2008, 1900UT missing? 4.Why does this image look bad? From the Data Providers Data is coming faster, in greater volumes and outstripping our ability to perform adequate quality control. Data is being used in new ways and we frequently do not have sufficient information on what happened to the data along the processing stages to determine if it is suitable for a use we did not envision. We often fail to capture, represent and propagate manually generated information that need to go with the data flows Each time we develop a new instrument, we develop a new data ingest procedure and collect different metadata and organize it differently. It is then hard to use with previous projects Building blocks  The proof markup language (PML) provides an interlingua for capturing the information agents need to understand results and to justify why they should believe the results.  The Inference Web toolkit provides a suite of tools for manipulating, presenting, summarizing, analyzing, and searching PML in efforts to provide a set of tools that will let end users understand information and its derivation, thereby facilitating trust in and reuse of information. Benefits/Next Steps: Use of declarative provenance interlingua (PML). Supports integration of automatic and human-generated content into machine- driven processes/ workflows. Multiple presentation views of provenance built using same infrastructure. Implementation of structured presentation of results use case. Next: application to science images and engineering quality control use cases, expansion of search capabilities. Figure 4: InferenceWeb Search with structured query for CHIP image by time span. Figure 5: IWBrowse results explaining answers, their origins, and dependencies