Research Objects Preserving scientific data and methods Stian Soiland-Reyes, Khalid Belhajjame School of Computer Science, Univ of Manchester myGrid NIHBI meet-up Manchester
2 Agenda » Preserving digital science » The Research Object » Anatomy » Lifecycle » Wf4Ever Tools » Future developments
3 Computation Processes in Today’s Research » Research is being conducted in increasingly digital and online environment » This has led to the emergence of new digital artifacts » In some respects, these objects can be regarded as data » However, some objects include the description of the research method that is captured as a computational process » Such processes encapsulate the knowledge related to the generation, (re)use and general transformation of data in experimental sciences Raw data Computational process Results
4 Scientific Workflow »A scientific workflow is a precise, executable description of a scientific procedure - a series of analysis operations connected using data links »Each operation represents the execution of a computational process »Can be supplied by independently developed web services »Can also use existing data sources that are accessible on the Web In this work, we focus on a particular kind of computational processes called scientific workflows
5 Preservation Challenges »Changes by 3 rd parties »Workflow may produce different lists at different times »Workflow may become inoperable Challenges deal with their executable aspects and their vulnerability to the volatility of the resources required for their execution »Workflow decay – The execution of the workflow may fail or yield different results, due to dependencies on resources and services subject to independent changes, e.g., EMBL-EBI. Even workflows that depend on local resources are vulnerable.
Laboratory Instruments Methods Materials Publication Models, Techniques, Algorithms Data Laboratory Instruments Methods Materials Provenance Attribution Credit Provenance Attribution Credit Context Investigation Study Experiment Context Investigation Study Experiment Replicate / Repeat Exactly replicate the original experiment and experimental conditions. Eliminate change. Observe. Reproduce Run experiment with differences in experimental conditions.. Compare to test for same result. Observe. Capture Curate Discover Use Reuse Preserve Reproduce Between Labs Repeat Within Lab
RO Architecture is Hourglass ROs structured packages Provenance, Versioning, Mim services Viewing, collaboration services/protocols Astronomy, Biology, services/protocols Exchange services (media specific) Storage services (media specific)
8 Research Object Datasets Results Scientists Hypothesis Experiments Annotations Provenance Electronic paper Workflows From Electronic papers to Research objects
9
10 Research Object: A user scenario
11 Why research objects? A research object aggregates all elements deemed necessary to understand research investigations Promote reuse, sharing Enable the verification of reproducibility of the results Trackable, versionable, referenceable
12 Anatomy of a research object ro:Resource ro:ResearchObject ro:Manifest ore:aggregates ore:describes ro:Folder ro:FolderEntry ore:proxyFor ore:proxyIn Subclass of ro:SemanticAnnotation ore:aggregates ro:annotatesAggregatedResource RDF file ao:body
Grounding Workflow-centric Research Objects Using Semantic Technologies Workflow-centric research objects are encoded using RDF, according to a set of ontologies that are publicly available Research objects extend the Object Exchange and Reuse (ORE) model, to represent aggregation. 13 ORE
We use the Annotation Ontology (AO) to annotate research object resources and their relationships. 14 Grounding Workflow-centric Research Objects Using Semantic Technologies
15 Relating resources in research object Results Logs Results Metadata Paper Slides Feeds into produces Included in produces Published in produces Included in Published in Workflow_16 Workflow_13 Common pathways QTL The provenance of the RO elements is key to understanding, comparing and debugging scientific workflows and to verifying the validity of a claim made within the context of a RO
16 Scientist Live RO RO snapshot > Identified by a URI Some metadata Some curation Mostly private (for my group) RO snapshot > Identified by a URI Some metadata Some curation Mostly private (for my group and for paper reviewers) Librarian/Curator Scientist My supervisor calls me to report my work My supervisor calls me again and we decide to publish our RO+paper > Archived RO > Identified by a URI Good metadata and curation Mostly public Reviews received and final version published > A new PhD student continues my work > Evolution of a research object
17 PROV standard - Basis for evolution model Candidate Recommendation
18 Customizable preservability checklists Wf4Ever Tools
19 Portal: Browsing and annotating Wf4Ever Tools
20 Command line tools, Client libraries Wf4Ever Tools
21 Specifications and APIs Wf4Ever Tools
22 Current Status and Ongoing Work 22 [3] Models/spec v0.1 public: - Upcoming revision v0.2: (Q1 2013) Minor additions to workflow model terms “RO Terms” – Upper user level view of RO: hypothesis, results – many are “shortcuts” for structured model - TODO: Update annotation model to Open Annotation Data Model (OAC) - TODO: PAV for detailed authorship provenance Showing, managing and sharing of Research Objects through myExperiment web site
23 Open Annotation Data Model “Almost final” spec: Roll out meeting in Manchester: March 2013 Community Draft
24 myExperiment RO support
Thank you! /