Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ecosystem Status Report: collaborating with IPython Notebooks

Similar presentations


Presentation on theme: "Ecosystem Status Report: collaborating with IPython Notebooks"— Presentation transcript:

1 Ecosystem Status Report: collaborating with IPython Notebooks
NOAA's Northeast Shelf Ecosystem Status Report: collaborating with IPython Notebooks for reproducibility July 2013 ECO-OP is supported by NSF Grant # PIs: Peter Fox (RPI) and Andrew Maffei (WHOI) NEFSC Collaborators: Jon Hare and Mike Fogarty Software programmer: Massimo Di Stefano Informatics and metadata: Stace Beaulieu This lightning talk is for the demo at the rear of this room. I’ll give a brief overview here, and then please come see me or ?Mike for more info during the demo. SAY TITLE, SAY Pis and Massimo

2 Adopting a provenance model for a collaborative report
What is provenance? Lineage, or the history of a data or information product, including how was it processed, who processed it, and where is it stored Another title for our latest work could be SAY TITLE. Provenance refers to the history of a data product, including how was it processed, who processed it, and where is it stored/archived. Data provenance from Wikipedia: “Scientific research is generally held to be of good provenance when it is documented in detail sufficient to allow reproducibility.[24] Scientific workflows assist scientists and programmers with tracking their data through all transformations, analyses, and interpretations.”

3 Northeast Shelf Large Marine Ecosystem Ecosystem Status Report
Use Case: Northeast Shelf Large Marine Ecosystem Ecosystem Status Report Goal: Our use case is READ TITLE. RE-ITERATE GOALS. “traceability, repeatability, explanation, verification, and validation” for ecosystem data and information products in the NEFSC Ecosystem Status Report (ESR)

4 Section on Climate Forcing
Page from 2009 ESR Section on Climate Forcing Figures available for download as PDF or image files – but without access to data or metadata Note: NOAA directive for ISO metadata, which includes lineage This is page from the 2009 report. No need to read the details – just note that the report provides figures and text, in this case for climate indices – North Atlantic Oscillation upper left. READ RIGHT.

5 Software design to track data provenance
Output of data pipeline is the figure for NAO from report. Our software captures this plus all the metadata describing data processing. But provenance is also who did it, when, and where?

6 PROV Data Model and PROV-O ontology
W3C Recommendation 30 April 2013 Core Structures (types and relations) Entity may be a single data product, or a chapter containing several data products READ RED. Activity is the data processing that generated the figure or chapter of the report, and Agent is the person who provided the figure or chapter to the report. Workflow provenance (e.g., how to put together the collaborative report)

7 Code in Python, Matlab, R, other
Screenshot of IPython Notebook used to track both data and workflow provenance Here I am showing a screenshot of an IPython Notebook that will output the climate forcing chapter of the 2013 Ecosystem Status report. Provenance is tracked for each data product because the acquisition, processing, and plotting may all be conducted within this one environment. Code in Python, Matlab, R, other

8 Notebook can be shared, or output as script, HTML, PDF,
Screenshot of IPython Notebook used to track both data and workflow provenance Notebook can be shared, or output as script, HTML, PDF, other READ RED. The code may also include an output to PDF or to HTML for the final report.

9 PDF output of IPython Notebook with clickable links to data and code
As an example of this work that is being conducted in parallel to this year’s ESR, if you click on ‘data’ …

10 Screenshot of csv file at GitHub
Access not only to the data that are plotted, but also to provenance metadata for reproducibility READ RED

11

12 Data provenance: from environmental data (left) to marine ecosystem indicator (right)


Download ppt "Ecosystem Status Report: collaborating with IPython Notebooks"

Similar presentations


Ads by Google