Presentation is loading. Please wait.

Presentation is loading. Please wait.

Brief Introduction to Provenance "As data becomes plentiful, verifiable truth becomes scarce

Similar presentations


Presentation on theme: "Brief Introduction to Provenance "As data becomes plentiful, verifiable truth becomes scarce"— Presentation transcript:

1 Brief Introduction to Provenance "As data becomes plentiful, verifiable truth becomes scarce http://go-to-hellman.blogspot.com/2010/02/named-graphs-argleton-and- truth-economy.html For JISC KeepIt course on Digital Preservation Tools for Repository Managers Module 3, Primer on preservation workflow, formats and characterisation Westminster-Kingsway College, London, 2 March 2010

2 Provenance: example The following excerpt and slides are taken with permission from Moreau, L. The Open Provenance Model: Towards inter-operability of Provenance Systems http://users.ecs.soton.ac.uk/lavm/talks/iam09.pdfhttp://users.ecs.soton.ac.uk/lavm/talks/iam09.pdf Example The provenance of a bottle of wine includes: Grapes from which it is made Where those grapes grew Process in the wines preparation How the wine was stored Between which parties the wine was transported, e.g. producer to distributer to retailer Where it was auctioned

3 Provenance Definition Oxford English Dictionary: – the fact of coming from some particular source or quarter; origin, derivation – the history or pedigree of a work of art, manuscript, rare book, etc.; – concretely, a record of the passage of an item through its various owners. The provenance of a piece of data is the process that led to that piece of data

4 The Science Lifecycle scientists Local Web Repositories Graduate Students Undergraduate Students Virtual Learning Environment Technical Reports Reprints Peer- Reviewed Journal & Conference Papers Preprints & Metadata Certified Experimental Results & Analyses experimentation Data, Metadata, Provenance, Scripts, Workflows, Services, Ontologies, Blogs,... Digital Libraries Next Generation Researchers Adapted from David De Rouresslides

5 scientists Local Web Repositories Graduate Students Undergraduate Students Virtual Learning Environment Technical Reports Reprints Peer- Reviewed Journal & Conference Papers Preprints & Metadata Certified Experimental Results & Analyses experimentation Data, Metadata, Provenance, Scripts, Workflows, Services, Ontologies, Blogs,... Digital Libraries Next Generation Researchers Finding the Provenance of research outputs across all the systems data transited through

6 Open Provenance Model (OPM) Allows us to express all the causes of an item Allow for process-oriented and dataflow oriented views Based on a notion of annotated causality graph Moreau, L., et al. v1.00 (Dec 2007), OPM v1.01 (Jul 2008), OPM v1.1 (Dec 2009)

7 OPM Requirements To allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model. To allow developers to build and share tools that operate on such provenance model. To define the model in a precise, technology- agnostic manner. To define bindings to XML/RDF separately To support a digital representation of provenance for any thing, whether produced by computer systems or not

8 OPM Serialisation OPM is an abstract data model to represent past execution and what causes data and processes to occur OPM can be serialised in different formats, referred to as technology bindings or serializations OPM XML schema (http://openprovenance.org/model/v1.01.a)http://openprovenance.org/model/v1.01.a OPM RDF schema OPM OWL ontology Effort underway to ensure full equivalence of representations

9 Nodes Artifact: Immutable piece of state, which may have a physical embodiment in a physical object, or a digital representation in a computer system. Process: Action or series of actions performed on or caused by artifacts, and resulting in new artifacts. Agent: Contextual entity acting as a catalyst of a process, enabling, facilitating, controlling, affecting its execution. A P Ag

10 Edges A1 A2 P1 P2 wasTriggeredBy wasDerivedFrom A P used(R) AP wasGeneratedBy(R) AgP wasControlledBy(R) Edge labels are in the past to express that these are used to describe past executions

11 Illustration Process used artifacts and generated artifact Edge roles indicate the function of the artifact with respect to the process (akin to function parameters) Edges and nodes can be typed Causation chain: P was caused by A1 and A2 A3 and A4 were caused by P Does it mean that A3 and A4 were caused by A1 and A2? P A1 A2 A3 A4 used(divisor)used(dividend) wasGeneratedBy(rest)wasGeneratedBy(quotient) type=division

12 Time Constraints A P used(R) A wasGeneratedBy(R) Ag wasControlledBy(R) start: T2 end: T5 T4T3 T1<T3 (artifact must exist before being used) T2<T3 (process must have started before using artifacts) T3<T5 (process uses artifacts before it ends) T2<T4 (process must have started before generating artifacts) T4<T5 (process generates artifacts before it ends) T4<T6 (artifact must exist before being used) T2<T5 (process must have started before ending) no constraint between t3 and t4 wasGeneratedBy(R) T1 used(R) T6

13 Dublin Core Profile (draft) To many people, provenance is primarily about attribution, citation, bibliographic information DC provides terms to relate resources to such information DC profile aims to use of Dublin Core terms to OPM concepts and graph patterns with Simon Miles and Joe Futrelle

14 DC to OPM example: dc:publisher A2 A1 P publish wasSameResourceAs state=published Ag wasActionOf state=unpublished person name=Luc used wasGeneratedBy

15 What have we learned about provenance? Provenance: describes and records the results of processes on objects over time OPM represents provenance as XML OPM can be serialised in different formats RDF, Semantic Web OPM is a work in progress By working with an open standard model, that can pass information as XML and in standard serialisation formats (e.g. RDF), it should be possible to build provenance services into repository environments


Download ppt "Brief Introduction to Provenance "As data becomes plentiful, verifiable truth becomes scarce"

Similar presentations


Ads by Google