Wf4Ever: Annotating research objects Stian Soiland-Reyes, Sean Bechhofer myGrid, University of Manchester Open Annotation Rollout, Manchester, 2013-06-24.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Open Provenance Model Tutorial Session 6: Interoperability.
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
Global Change Information System Curt Tilmes NASA GSFC USGCRP ESIP Federation Winter Meeting 2013
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
University of Southampton, U.K.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Tutorial 3: Adding and Formatting Text. 2 Objectives Session 3.1 Type text into a page Copy text from a document and paste it into a page Check for spelling.
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Sean Making Metadata Work, ISKO London, 23 rd June 2014 Metadata for Research Objects 1.
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
January, 23, 2006 Ilkay Altintas
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
Interoperability Scenario Producing summary versions of compound multimedia historical documents.
® IBM Software Group © 2009 IBM Corporation Rational Publishing Engine RQM Multi Level Report Tutorial David Rennie, IBM Rational Services A/NZ
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Defining Digital Forensic Examination & Analysis Tools Brian Carrier.
An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012.
South Africa Data Warehouse for PEPFAR Presented by: Michael Ogawa Khulisa Management Services
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
© 2008 IBM Corporation ® IBM Cognos Business Viewpoint Miguel Garcia - Solutions Architect.
MyExperiment Research Objects: Beyond Workflows and Packs Stian Soiland-Reyes myGrid, University of Manchester BOSC 2013, ISMB, Berlin, This.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich,
E-Science for the SKA WF4Ever: Supporting Reuse and Reproducibility in Experimental Science Lourdes Verdes-Montenegro* AMIGA and Wf4Ever teams Instituto.
MyExperiment 2.0 – Preserving digital Research Objects using the Wf4Ever architecture EGI/SHIWA Workshops on e-Science Workflows Budapest, Stian.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Software Sustainability Institute Software Attribution can we improve the reusability and sustainability of scientific software?
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar April 2012 José Manuel Gómez Pérez, iSOCO
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Technology behind using Taverna in caGrid caGrid user meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
Data in the NEES Data Repository Conditions for Current and Future Use and Re-Use Quake Summit 2012, Boston, Massachusetts July 12, 2012 Stanislav Pejša.
Google Refine for Data Quality / Integrity. Context BioVeL Data Refinement Workflow Synonym Expansion / Occurrence Retrieval Data Selection Data Quality.
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
Electronic labnotes Mari Wigham COMMIT/. Information WUR  Organising, sharing, finding and reusing data  Expertise in: ● Modelling data.
Digitization – Basics and Beyond workshop Interoperability of cultural and academic resources New services for digitized collections Muriel Foulonneau.
Recording the Context of Action for Process Documentation Ian Wootten Cardiff University, UK
Jan Christoph Meister University of Hamburg
GOOGLE FUSION TABLES: WEB- CENTERED DATA MANAGEMENT AND COLLABORATION HectorGonzalez, et al. Google Inc. Presented by Donald Cha December 2, 2015.
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
How Environmental Informatics is Preparing Us for the Era of Big Data AGU FM 2013 GC11F-01 December 09, 2013, MW 3001 Peter
SHIWA Desktop Cardiff University, Budapest, 3 rd July 2012.
The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft, Carole Goble.
Experimental Context, Publishing and Research Objects Brian Matthews STFC.
RDFa Primer Bridging the Human and Data webs Presented by: Didit ( )
W ORKFLOW -C ENTRIC R ESEARCH O BJECTS : F IRST C LASS C ITIZENS IN S CHOLARLY D ISCOURSE Khalid Belhajjame, Oscar Corcho, Daniel Garijo, Jun Zhao, Paolo.
Timothy W. Cole, Jacob Jett, Thomas G. Habing The University Library & the Center for Informatics Research in Science and Scholarship, GSLIS Open Annotation.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Linked Data Publishing on the Semantic Web Dr Nicholas Gibbins
SHIWA Desktop Cardiff University David Rogers, Ian Harvey, Ian Taylor, Andrew Jones.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Aleksandra Pawlik University of Manchester. Something that can be put into a workflow Well described - what the component does Behaves “well” - conforms.
Aleksandra Pawlik Alan Williams University of Manchester.
Research Objects Preserving scientific data and methods Stian Soiland-Reyes, Khalid Belhajjame School of Computer Science, Univ of Manchester myGrid NIHBI.
Scufl2 – because a workflow is more than its definition
“Real Simple Syndication” (RSS)
What can provenance do for me?
API Documentation Guidelines
Alan Williams, Donal Fellows, Finn Bacall,
An ontology for e-Research
Rational Publishing Engine RQM Multi Level Report Tutorial
Presentation transcript:

Wf4Ever: Annotating research objects Stian Soiland-Reyes, Sean Bechhofer myGrid, University of Manchester Open Annotation Rollout, Manchester, This work is licensed under a Creative Commons Attribution 3.0 Unported License 1

Motivation: Scientific workflows Coordinated execution of services and linked resources Dataflow between services Web services (SOAP, REST) Command line tools Scripts User interactions Components (nested workflows) Method becomes: Documented visually Shareable as single definition Reusable with new inputs Repurposable other services Reproducible?

But workflows are complex machines Outputs Inputs Configuration Components Will it still work after a year? 10 years? Expanding components, we see a workflow involves a series of specific tools and services which Depend on datasets, software libraries, other tools Are often poorly described or understood Over time evolve, change, break or are replaced User interactions are not reproducible But can be tracked and replayed 3

Electronic Paper Not Enough HypothesisExperiment Result Analysis Conclusions Investigation Data Electronic paper Publish Open Research movement: Openly share the data of your experiments 4

RESEARCH OBJECT (RO) Research objects goal: Openly share everything about your experiments, including how those things are related 5

What is in a research object? A Research Object bundles and relates digital resources of a scientific experiment or investigation: Data used and results produced in experimental study Methods employed to produce and analyse that data Provenance and settings for the experiments People involved in the investigation Annotations about these resources, that are essential to the understanding and interpretation of the scientific outcomes captured by a research object 6

Gathering everything Research Objects (RO) aggregate related resources, their provenance and annotations Conveys “everything you need to know” about a study/experiment/analysis/dataset/workflow Shareable, evolvable, contributable, citable ROs have their own provenance and lifecycles 7

Why Research Objects? i.To share your research materials (RO as a social object) ii.To facilitate reproducibility and reuse of methods iii.To be recognized and cited (even for constituent resources) iv.To preserve results and prevent decay (curation of workflow definition; using provenance for partial rerun) 8

A Research object 9

10

Quality Assessment of a research object 11

Quality Monitoring 12

Annotations in research objects Types: “This document contains an hypothesis” Relations: “These datasets are consumed by that tool” Provenance: “These results came from this workflow run” Descriptions: “Purpose of this step is to filter out invalid data” Comments: “This method looks useful, but how do I install it?” Examples: “This is how you could use it” 13

Annotation guidelines – which properties? Descriptions: dct:title, dct:description, rdfs:comment, dct:publisher, dct:license, dct:subject Provenance: dct:created, dct:creator, dct:modified, pav:providedBy, pav:authoredBy, pav:contributedBy, roevo:wasArchivedBy, pav:createdAt Provenance relations: prov:wasDerivedFrom, prov:wasRevisionOf, wfprov:usedInput, wfprov:wasOutputFrom Social networking: oa:Tag, mediaont:hasRating, roterms:technicalContact, cito:isDocumentedBy, cito:isCitedBy Dependencies: dcterms:requires, roterms:requiresHardware, roterms:requiresSoftware, roterms:requiresDataset Typing: wfdesc:Workflow, wf4ever:Script, roterms:Hypothesis, roterms:Results, dct:BibliographicResource 14

What is provenance? By Dr Stephen Dann licensed under Creative Commons Attribution-ShareAlike 2.0 Generic Stephen Dann Attribution who did it? Derivation how did it change? Activity what happens to it? Licensing can I use it? Attributes what is it? Origin where is it from? Annotations what do others say about it? 15 Aggregation what is it part of? Date and tool when was it made? using what?

Attribution Who collected this sample? Who helped? Which lab performed the sequencing? Who did the data analysis? Who curated the results? Who produced the raw data this analysis is based on? Who wrote the analysis workflow? Why do I need this? i.To be recognized for my work ii.Who should I give credits to? iii.Who should I complain to? iv.Can I trust them? v.Who should I make friends with? prov:wasAttributedTo prov:actedOnBehalfOf dct:creator dct:publisher pav:authoredBy pav:contributedBy pav:curatedBy pav:createdBy pav:importedBy pav:providedBy... Roles Person Organization SoftwareAgent Agent types Alice The lab Data wasAttributedTo actedOnBehalfOf 16

Derivation Which sample was this metagenome sequenced from? Which meta-genomes was this sequence extracted from? Which sequence was the basis for the results? What is the previous revision of the new results? Why do I need this? i.To verify consistency (did I use the correct sequence?) ii.To find the latest revision iii.To backtrack where a diversion appeared after a change iv.To credit work I depend on v.Auditing and defence for peer review wasDerivedFrom wasQuotedFrom Sequenc e New results wasDerivedFrom Sample Meta - genome Old results wasRevisionOf wasInfluencedBy 17

Activities What happened? When? Who? What was used and generated? Why was this workflow started? Which workflow ran? Where? Why do I need this? i.To see which analysis was performed ii.To find out who did what iii.What was the metagenome used for? iv.To understand the whole process “make me a Methods section” v.To track down inconsistencies used wasGeneratedBy wasStartedAt " " Metagenom e Sample wasAssociatedWith Workflow server wasInformedBy wasStartedBy Workflo w run wasGeneratedBy Results Sequencing wasAssociatedWith Alice hadPlan Workflow definition hadRole Lab technician Results 18

PROV model Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. Entity Agen t Activity wasDerivedFrom Provenance Working Group 19

Provenance of what? Who made the (content of) research object? Who maintains it? Who wrote this document? Who uploaded it? Which CSV was this Excel file imported from? Who wrote this description? When? How did we get it? What is the state of this RO? (Live or Published?) What did the research object look like before? (Revisions) – are there newer versions? Which research objects are derived from this RO? 20

Research object model at a glance 21 Researc h Object Resourc e Annotatio n oa:hasTarget Resource Annotation graph oa:hasBody ore:aggregates «ore:Aggregation» «ro:ResearchObject» «oa:Annotation» «ro:AggregatedAnnotation» «trig:Graph» «ore:AggregatedResource» «ro:Resource» Manifest «ore:ResourceMap» «ro:Manifest» ore:isDescribedBy

Wf4Ever architecture 22 Blob store Graph store Resourc e Uploaded to Manifest Annotation graph Resear ch object Annotation Redirects to ORE Proxy External referenc e Redirects to If RDF, import as named graph SPARQ L queries REST resources project.org/wiki/display/docs/RO+API+6

Where do RO annotations come from? Imported from uploaded resources, e.g. embedded in workflow-specific format (creator: unknown!) Created by users filling in Title, Description etc. on website By automatically invoked software agents, e.g.: A workflow transformation service extracts the workflow structure as RDF from the native workflow format Provenance trace from a workflow run, which describes the origin of aggregated output files in the research object 23

How we are using the OA model Multiple oa:Annotation contained within the manifest RDF and aggregated by the RO. Provenance (PAV, PROV) on oa:Annotation (who made the link) and body resource (who stated it) Typically a single oa:hasTarget, either the RO or an aggregated resource. oa:hasBody to a trig:Graph resource (read: RDF file) with the “actual” annotation as RDF: dct:title "The wonderful workflow". Multiple oa:hasTarget for relationships, e.g. graph body: roterms:inputSelected. 24

What should we also be using? Motivations myExperiment: commenting, describing, moderating, questioning, replying, tagging – made our own vocabulary as OA did not exist Selectors on compound resources E.g. description on processors within a workflow in a workflow definition. How do you find this if you only know the workflow definition file? Currently: Annotations on separate URIs for each component, described in workflow structure graph, which is body of annotation targeting the workflow definition file Importing/referring to annotations from other OA systems (how to discover those?) 25

What is the benefit of OA for us? Existing vocabulary – no need for our project to try to specify and agree on our own way of tracking annotations. Potential interoperability with third-party annotation tools E.g. We want to annotate a figure in a paper and relate it to a dataset in a research object – don’t want to write another tool for that! Existing annotations (pre research object) in Taverna and myExperiment map easily to OA model 26

History lesson (AO/OAC/OA) When forming the Wf4Ever Research Object model, we found: Open Annotation Collaboration (OAC) Annotation Ontology (AO) What was the difference? Technically, for Wf4Ever’s purposes: They are equivalent Political choice: AO – supported by Utopia (Manchester) We encouraged the formation of W3C Open Annotation Community Group and a joint model Next: Research Object model v0.2 and RO Bundle will use the OA model – since we only used 2 properties, mapping is 1:1 27

Saving a research object: RO bundle Single, transferrable research object Self-contained snapshot Which files in ZIP, which are URIs? (Up to user/application) Regular ZIP file, explored and unpacked with standard tools JSON manifest is programmatically accessible without RDF understanding Works offline and in desktop applications – no REST API access required Basis for RO-enabled file formats, e.g. Taverna run bundle Exchanged with myExperiment and RO tools

Workflow Results Bundle workflowrun.prov.ttl (RDF) outputA.txt outputC.jpg outputB/ intermediates/ 1.txt 2.txt 3.txt de/def2e58b-50e fd a.txt inputA.txt workflow URI reference s attribution execution environment Aggregating in Research Object ZIP folder structure (RO Bundle) mimetype application/vnd.wf4ever.robundle+zip.ro/manifest.jso n

RO Bundle What is aggregated? File In ZIP or external URI Who made the RO? When? Who? External URIs placed in folders Embedded annotation External annotation, e.g. blogpost JSON-LD context  RDF RO provenance.ro/manifest.json Format Note: JSON "quotes" not shown above for brevity

31 Common Motifs in Scientific Workflows: An Empirical Analysis <body resource=" typeOf="ore:Aggregation ro:ResearchObject"> Research Object as RDFa <a property="ore:aggregates" href="t2_workflow_set_eSci2012.v.0.9_FGCS.xls" typeOf="ro:Resource">Analytics for Taverna workflows <a property="ore:aggregates" href="WfCatalogue-AdditionalWingsDomains.xlsx“ typeOf="ro:Resource">Analytics for Wings workflows <span property="dc:creator prov:wasAttributedTo" resource="

W3C community group for RO 32