What can provenance do for me?

Slides:



Advertisements
Similar presentations
Opening the Research Data Lifecycle Workshop Capturing and Sharing Research Data Simon Coles School of Chemistry, University of Southampton, U.K.
Advertisements

A centre of expertise in digital information management UKOLN is supported by: Open Science at Genome Scale Dr Liz Lyon, Director, UKOLN,
A centre of expertise in digital information management UKOLN is supported by: Acting as Advocate? Seven steps for libraries in the data.
Cambridge Semantic Web Gathering , Cambridge, MA, USA Ivan Herman W3C, Semantic Web Activity Lead.
Open Provenance Model Tutorial Session 2: OPM Overview and Semantics Luc Moreau University of Southampton.
PROV-Ontology (PROV-O) Satya Sahoo, Khalid Belhajjame, James Cheney, Daniel Garijo, Timothy Lebo, Deborah McGuinness, Stephan Zednik, Stian Soiland-Reyes.
A BRIEF INTRO TO THE PROV DATA MODEL Simon Miles The entire W3C Provenance Working Group.
Healthcare Privacy and Security Classification System (HCS) Guide
Global Change Information System Curt Tilmes NASA GSFC USGCRP ESIP Federation Winter Meeting 2013
The Data Curation Profile IASSIST 2010 Jake Carlson Data Research Scientist Purdue University Libraries.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Sean Making Metadata Work, ISKO London, 23 rd June 2014 Metadata for Research Objects 1.
RAMS Overview: An update on the research workflow tool James Dalziel Professor of Learning Technology, and Director, Macquarie E-Learning Centre Of Excellence.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
References: [1] [2] [3] Acknowledgments:
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
MyExperiment Research Objects: Beyond Workflows and Packs Stian Soiland-Reyes myGrid, University of Manchester BOSC 2013, ISMB, Berlin, This.
Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich,
E-Science for the SKA WF4Ever: Supporting Reuse and Reproducibility in Experimental Science Lourdes Verdes-Montenegro* AMIGA and Wf4Ever teams Instituto.
MyExperiment 2.0 – Preserving digital Research Objects using the Wf4Ever architecture EGI/SHIWA Workshops on e-Science Workflows Budapest, Stian.
Wf4Ever: Annotating research objects Stian Soiland-Reyes, Sean Bechhofer myGrid, University of Manchester Open Annotation Rollout, Manchester,
Cloud Computing Standards W3C Advisory Committee Meeting, March 2010, Cambridge MA Chris Dagdigian Chris Dagdigian.
Software Sustainability Institute Software Attribution can we improve the reusability and sustainability of scientific software?
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
How to write a scientific article Nikolaos P. Polyzos M.D. PhD.
RDA Data Foundation and Terminology (DFT) WG: Overview  Prepared for Collab Chairs Meeting, NIST, Nov 13-14, 2014  Gary Berg-Cross, Raphael Ritz, Peter.
UKOLN is supported by: Digital Preservation Benefits Tools Project Dissemination Workshop Dr Liz Lyon, Associate Director, UK Digital Curation Centre Director,
Rational Unified Process Fundamentals Module 4: Core Workflows II - Concepts Rational Unified Process Fundamentals Module 4: Core Workflows II - Concepts.
What is Science? SECTION 1.1. What Is Science and Is Not  Scientific ideas are open to testing, discussion, and revision  Science is an organize way.
How Environmental Informatics is Preparing Us for the Era of Big Data AGU FM 2013 GC11F-01 December 09, 2013, MW 3001 Peter
The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft, Carole Goble.
Data Foundation IG DF Organizing Chairs: Gary Berg-Cross & Peter Wittenburg.
Experimental Context, Publishing and Research Objects Brian Matthews STFC.
W ORKFLOW -C ENTRIC R ESEARCH O BJECTS : F IRST C LASS C ITIZENS IN S CHOLARLY D ISCOURSE Khalid Belhajjame, Oscar Corcho, Daniel Garijo, Jun Zhao, Paolo.
Unless otherwise specified these slides are made available by OASPA under a CC BY 4.0 License Attribution Webinar 5 th May 2016 With thanks to Copyright.
Training Course on Data Management for Information Professionals and In-Depth Digitization Practicum September 2011, Oostende, Belgium Concepts.
Aleksandra Pawlik Alan Williams University of Manchester.
Research Objects Preserving scientific data and methods Stian Soiland-Reyes, Khalid Belhajjame School of Computer Science, Univ of Manchester myGrid NIHBI.
Laurence Horton, Alexia Katsanidou International Data Infrastructures
Initial Observation & Research
Requirements for the Course
Incorporating W3C’s DQV and PROV in CISER’s Data Quality Review and
BIO1130 Lab 2 Scientific literature
Scufl2 – because a workflow is more than its definition
Data Reuse Fitness Assessment Using Provenance
Agreeing about agreements: modelling social contracts, people and data
Deconstruct Three Research Methods
Seminar in Bioinformatics (236818)
Initial Observation & Research
Persistent Identifiers Implementation in EOSDIS
Active Data Management in Space 20m DG
Publishing software and data
Introduction to electronic resources management
Alan Williams, Donal Fellows, Finn Bacall,
SCIENCE FAIR Mini-Lesson #5
Introduction to science
How to publish your research
E-resource evaluation tips
MEDICAL CERTIFICATION OF Cause of death THE ROLE OF THE REVIEW COMMITTEE Samoa 2017.
The structure of a scientific paper:
The Science Fair Project
Create PT: Complete the Task
An ontology for e-Research
BIO1130 Lab 2 Scientific literature
Everything you wanted to know about Creative Commons Licenses
PROV Cunxin Jia.
Literature Reviews.
Linked Data Reuse in the Language Services Industry
Putting Together the Science Fair Logbook
Presentation transcript:

What can provenance do for me? Stian Soiland-Reyes myGrid, University of Manchester Ocean Sampling Day planning Bremen 2013-03-21 This work is licensed under a  Creative Commons Attribution 3.0 Unported License

Provenance of Stian Soiland-Reyes Developer/researcher in myGrid team, School of Computer Science, University of Manchester since 2006 Involved with: Taverna - Scientific workflow system myExperiment – sharing workflows and artefacts Wf4Ever - digital preservation (of workflows and workflow runs) W3C Provenance WG – standards for describing provenance Open Annotation – standard for tracking who said what about something What can provenance do for me? http://soiland-reyes.com/stian/work/

Overview What is provenance? Aggregating and sharing Attribution Derivation Activities PROV model Aggregating and sharing Why you want provenance What can provenance do for me?

What is provenance? Attribution who did it? Abstraction levels shallots, sign, photo or flickr page? Activity what happens to it? Date and tool when was it made? using what? Derivation how did it change? Origin where is it from? Aggregation what is it part of? Attributes what is it? Annotations what do others say about it? Licensing can I use it? By Dr Stephen Dann licensed under Creative Commons Attribution-ShareAlike 2.0 Generic http://www.flickr.com/photos/stephendann/3375055368/

Attribution Who collected this sample? Who helped? Which lab performed the sequencing? Who did the data analysis? Who curated the results? Who produced the raw data this analysis is based on? Who wrote the analysis workflow? Why do I need this? To be recognized for my work Who should I give credits to? Who should I complain to? Can I trust them? Who should I make friends with? Alice The lab Data wasAttributedTo actedOnBehalfOf What can provenance do for me? Roles Agent types prov:wasAttributedTo prov:actedOnBehalfOf dct:creator dct:publisher pav:authoredBy pav:contributedBy pav:curatedBy pav:createdBy pav:importedBy pav:providedBy ... Person Organization SoftwareAgent

Derivation Which sample was this metagenome sequenced from? Which meta-genomes was this sequence extracted from? Which sequence was the basis for the results? What is the previous revision of the new results? Why do I need this? To verify consistency (did I use the correct sequence?) To find the latest revision To backtrack where a diversion appeared after a change To credit work I depend on Auditing and defence for peer review Sample wasDerivedFrom Meta -genome What can provenance do for me? wasQuotedFrom Sequence wasInfluencedBy wasDerivedFrom wasRevisionOf Old results New results

Activities What happened? When? Who? What was used and generated? wasAssociatedWith Alice hadRole Lab technician Activities Sample used What happened? When? Who? What was used and generated? Why was this workflow started? Which workflow ran? Where? Why do I need this? To see which analysis was performed To find out who did what What was the metagenome used for? To understand the whole process “make me a Methods section” To track down inconsistencies Sequencing "2012-06-21" wasStartedAt wasInformedBy wasStartedBy Workflow run wasGeneratedBy Metagenome wasAssociatedWith Workflow server What can provenance do for me? hadPlan Workflow definition Sequencing machines: illumina wasGeneratedBy Results Results

Core PROV model Entity Agent Activity Provenance Working Group Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved.  What can provenance do for me? W3C standardisation body HTML http://www.w3.org/TR/prov-primer/ Agent Entity Activity wasDerivedFrom

Gathering everything Research Objects (RO) aggregate related resources, their provenance and annotations Conveys “everything you need to know” about a study/experiment/analysis/dataset/workflow Shareable, evolvable, contributable, citable ROs have their own provenance and lifecycles What can provenance do for me? Provenance Hypothesis http://purl.org/wf4ever/model Research Object Raw data aggregates Annotations Workflow Analysis tools Results http://purl.org/wf4ever/model Paper Reference literature

Research Objects Why do I need them? Paper Reference literature Results Raw data Workflow Analysis tools Hypothesis Annotations Provenance aggregates Research Object Why do I need them? To share your research materials (RO as a social object) To facilitate reproducibility and reuse of methods To be recognized and cited (even for constituent resources) To preserve results and prevent decay (curation of workflow definition; using provenance for partial rerun) What can provenance do for me? http://purl.org/wf4ever/model

myExperiment Research Objects What can provenance do for me?

Why you want provenance To acknowledge sources you have based your work on Receive credit when others uses your work Build trust (who did it?) and verify consistency (was it done correctly?) To audit and defend for peer review Keep track of resources that change over time (versioning) Investigate and compare data (where did that strange value come from?) Gather everything you need for that Methods section Facilitate reproducibility by tracking activities and their outcomes To prevent decay by aggregating related resources and their descriptions What can provenance do for me?

Thank you Questions? Twitter: @soilandreyes Skype: soiland http://soiland-reyes.com/stian/work/ http://www.wf4ever-project.org/ What can provenance do for me?